1
组件介绍
Apache Dolphin Scheduler是一个分布式易扩展的可视化DAG工作流任务调度系统。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中开箱即用。
官网
https://dolphinscheduler.apache.org/en-us/
github
https://github.com/apache/incubator-dolphinscheduler
Dolphin Scheduler 1.2.0是ds发布的第一个Apache版本,目前也是社区推荐的版本。引入了跨项目依赖,Flink&http组件等特性,具体Release Notes请见:
https://github.com/apache/incubator-dolphinscheduler/releases
2
安装包准备
github clone Dolphin Scheduler代码,本地切换到1.2.0-release分支
3
修改配置
# 创建部署目录
mkdir -p /opt/dolphinscheduler
# 解压tar包
tar -zxvf dolphinscheduler-1.2.0-backend-bin.tar.gz -C /opt/dolphinscheduler/
# 修改安装包权限和所属用户,这里部署用户依然采用1.1.0的escheduler
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop
export SPARK_HOME1=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_HOME2=/opt/cloudera/parcels/SPARK2/lib/spark2
export PYTHON_HOME=/usr/local/anaconda3/bin/python
export JAVA_HOME=/usr/java/jdk1.8.0_131
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export FLINK_HOME=/opt/soft/flink
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$PATH
# 需要特别注意的install.sh参数
# for example postgresql or mysql ...
dbtype="mysql"
# db config
# db address and port
dbhost="192.168.xx.xx:3306"
# db name
dbname="escheduler"
# db username
username="escheduler"
# db passwprd
# Note: if there are special characters, please use the \ transfer character to transfer
passowrd="escheduler"
# conf/config/install_config.conf config
# Note: the installation path is not the same as the current path (pwd)
installPath="/opt/ds_120"
# deployment user
# Note: the deployment user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled, the root directory needs to be created by itself
deployUser="escheduler"
# hdfs root path, the owner of the root path must be the deployment user.
# versions prior to 1.1.0 do not automatically create the hdfs root directory, you need to create it yourself.
hdfsPath="/escheduler"
# common config
# Program root path
programPath="/tmp/escheduler"
# download path
downloadPath="/tmp/escheduler/download"
# task execute path
execPath="/tmp/escheduler/exec"
# api config
# api server port
apiServerPort="12345"
# api session timeout
apiServerSessionTimeout="7200"
# api server context path
apiServerContextPath="/dolphinscheduler/"
4
数据库升级&组件升级
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://xxxx:3306/dolphinscheduler?characterEncoding=UTF-8
spring.datasource.username=xxxxx
spring.datasource.password=xxxxx
特别注意
升级完成之后,需要在ds的元数据库中在执行一条ddl语句,修改任务实例表中的app_link字段长度,否则运行多阶段的hive-ql会导致任务状态不正确。报错信息:
data too long for field 'app_link'
执行ddl语句
Mysql:
alter table t_ds_task_instance modify column app_link varchar(5999);
Pg:
alter table t_ds_task_instance alter column app_link character varying(5999);
关键数据核查
vi /etc/nginx/conf.d/escheduler.conf#重启nginx
systemctl restart nginx
任务流测试
升级成功!
欢迎试用Dolphin Scheduler!!!