1
组件介绍
Apache Dolphin Scheduler是一个分布式易扩展的可视化DAG工作流任务调度系统。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中开箱即用。
官网
https://dolphinscheduler.apache.org/en-us/
github
https://github.com/apache/incubator-dolphinscheduler
2
ds-1.2.0 目录解读
(讲解配置文件的作用,具体配置在install.sh部署文件中完成)

bin目录下比较重要的是dolphinscheduler-daemon文件,之前版本中极容易出现的找不到jdk问题来源,当前版本的jdk已经export了本机的$JAVA_HOME,再也不用担心找不到jdk了。

非常重要的配置文件目录!!!
非常重要的配置文件目录!!!
非常重要的配置文件目录!!!

(特别注意这个隐藏文件,需要ls -al)
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop
#可以注释掉,也可以配置为SPARK_HOME2
#export SPARK_HOME1=/opt/cloudera/parcels/SPARK2/lib/spark2
export SPARK_HOME2=/opt/cloudera/parcels/SPARK2/lib/spark2
export PYTHON_HOME=/usr/local/anaconda3/bin/python
export JAVA_HOME=/usr/java/jdk1.8.0_131
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export FLINK_HOME=/opt/soft/flink
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$PATH敲黑板,重点!!!ds的元数据库配置,在ds-1.2.0中默认的数据库是pg,如果要使用mysql,需要将mysql的jdbc包放到lib目录下。
ds的定时由quartz框架完成,特别注意里边有quartz的数据库配置!!!
3
install.sh详解
install.sh部署脚本是ds部署中的重头戏,下面将参数分组进行分析。
# for example postgresql or mysql ...
dbtype="postgresql"
# db config
# db address and port
dbhost="192.168.xx.xx:5432"
# db name
dbname="dolphinscheduler"
# db username
username="xx"
# db passwprd
# Note: if there are special characters, please use the \ transfer character to transfer
passowrd="xx"# conf/config/install_config.conf config
# Note: the installation path is not the same as the current path (pwd)
installPath="/data1_1T/dolphinscheduler"
# deployment user
# Note: the deployment user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled, the root directory needs to be created by itself
deployUser="dolphinscheduler"# zk cluster
zkQuorum="192.168.xx.xx:2181,192.168.xx.xx:2181,192.168.xx.xx:2181"
# install hosts
# Note: install the scheduled hostname list. If it is pseudo-distributed, just write a pseudo-distributed hostname
ips="ark0,ark1,ark2,ark3"
# conf/config/run_config.conf config
# run master machine
# Note: list of hosts hostname for deploying master
masters="ark0,ark1"
# run worker machine
# note: list of machine hostnames for deploying workers
workers="ark2,ark3"
# run alert machine
# note: list of machine hostnames for deploying alert server
alertServer="ark3"
# run api machine
# note: list of machine hostnames for deploying api server
apiServers="ark1"
# zk config
# zk root directory
zkRoot="/dolphinscheduler"
# used to record the zk directory of the hanging machine
zkDeadServers="$zkRoot/dead-servers"
# masters directory
zkMasters="$zkRoot/masters"
# workers directory
zkWorkers="$zkRoot/workers"
# zk master distributed lock
mastersLock="$zkRoot/lock/masters"
# zk worker distributed lock
workersLock="$zkRoot/lock/workers"
# zk master fault-tolerant distributed lock
mastersFailover="$zkRoot/lock/failover/masters"
# zk worker fault-tolerant distributed lock
workersFailover="$zkRoot/lock/failover/workers"
# zk master start fault tolerant distributed lock
mastersStartupFailover="$zkRoot/lock/failover/startup-masters"
# zk session timeout
zkSessionTimeout="300"
# zk connection timeout
zkConnectionTimeout="300"
# zk retry interval
zkRetrySleep="100"
# zk retry maximum number of times
zkRetryMaxtime="5"#QQ邮箱配置
# alert config
# mail protocol
mailProtocol="SMTP"
# mail server host
mailServerHost="smtp.qq.com"
# mail server port
mailServerPort="465"
# sender
mailSender="783xx8369@qq.com"
# user
mailUser="783xx8369@qq.com"
# sender password
mailPassword="邮箱授权码"
# TLS mail protocol support
starttlsEnable="false"
sslTrust="smtp.qq.com"
# SSL mail protocol support
# note: The SSL protocol is enabled by default.
# only one of TLS and SSL can be in the true state.
sslEnable="true"
# download excel path
xlsFilePath="/tmp/xls"
# alert port
alertPort=7789# api config
# api server port
apiServerPort="12345"
# api session timeout
apiServerSessionTimeout="7200"
# api server context path
apiServerContextPath="/dolphinscheduler/"
# spring max file size
springMaxFileSize="1024MB"
# spring max request size
springMaxRequestSize="1024MB"
# api max http post size
apiMaxHttpPostSize="5000000"# resource Center upload and select storage method:HDFS,S3,NONE
resUploadStartupType="NONE"
# if resUploadStartupType is HDFS,defaultFS write namenode address,HA you need to put core-site.xml and hdfs-site.xml in the conf directory.
# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://mycluster:8020"
# if S3 is configured, the following configuration is required.
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"
# resourcemanager HA configuration, if it is a single resourcemanager, here is yarnHaIps=""
yarnHaIps="192.168.xx.xx,192.168.xx.xx"
# if it is a single resourcemanager, you only need to configure one host name. If it is resourcemanager HA, the default configuration is fine.
singleYarnIp="ark1"
# hdfs root path, the owner of the root path must be the deployment user.
# versions prior to 1.1.0 do not automatically create the hdfs root directory, you need to create it yourself.
hdfsPath="/dolphinscheduler"
# have users who create directory permissions under hdfs root path /
# Note: if kerberos is enabled, hdfsRootUser="" can be used directly.
hdfsRootUser="hdfs"# development status, if true, for the SHELL script, you can view the encapsulated SHELL script in the execPath directory.
# If it is false, execute the direct delete
devState="true"# master config
# master execution thread maximum number, maximum parallelism of process instance
masterExecThreads="100"
# the maximum number of master task execution threads, the maximum degree of parallelism for each process instance
masterExecTaskNum="20"
# master heartbeat interval
masterHeartbeatInterval="10"
# master task submission retries
masterTaskCommitRetryTimes="5"
# master task submission retry interval
masterTaskCommitInterval="100"
# master maximum cpu average load, used to determine whether the master has execution capability
#masterMaxCpuLoadAvg="10"
# master reserve memory to determine if the master has execution capability
masterReservedMemory="1"
# master port
masterPort=5566
# worker config
# worker execution thread
workerExecThreads="100"
# worker heartbeat interval
workerHeartbeatInterval="10"
# worker number of fetch tasks
workerFetchTaskNum="3"
# worker reserve memory to determine if the master has execution capability
workerReservedMemory="1"
# master port
workerPort=7788