导读:本文由社区用户刘思林老师带来的实践小分享——Dinky 整库同步 Mysql 至 StarRocks。
GitHub 地址
https://github.com/DataLinkDC/dinky
https://gitee.com/DataLinkDC/Dinky
欢迎大家为 Dinky 送上小星星~
一、说明
组件 | 版本 |
---|---|
Dinky | 0.7.3 |
Apache Flink | 1.14.6 |
StarRocks | 2.3.16 |
二、环境部署
参考:https://docs.starrocks.io/zh-cn/2.3/quick_start/Deploy, 一步步操作完全能成功。
参考官网:http://www.dlink.top/docs/next/deploy_guide/build
dinky 依赖包 plugins 目录:
[root@DESKTOP-UPCE76A plugins]# pwd
/luis/dlink-release-0.7.3/plugins
[root@DESKTOP-UPCE76A plugins]# ll
total 80668
drwxr-xr-x 3 root root 4096 Jul 25 21:17 flink1.11
drwxr-xr-x 3 root root 4096 Jul 25 21:17 flink1.12
drwxr-xr-x 3 root root 4096 Jul 25 21:17 flink1.13
drwxr-xr-x 3 root root 4096 Sep 7 00:09 flink1.14
drwxr-xr-x 3 root root 4096 Jul 25 21:17 flink1.15
drwxr-xr-x 3 root root 4096 Jul 25 21:17 flink1.16
drwxr-xr-x 3 root root 4096 Jul 25 21:26 flink1.17
-rwxr-xr-x 1 root root 59604787 Jul 25 21:29 flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar
-rwxr-xr-x 1 root root 22968127 Jul 27 22:42 flink-sql-connector-mysql-cdc-2.3.0.jar
dinky 依赖包 flink1.14 目录:
[root@DESKTOP-UPCE76A flink1.14]# pwd
/luis/dlink-release-0.7.3/plugins/flink1.14
[root@DESKTOP-UPCE76A flink1.14]# ll
total 196216
drwxr-xr-x 2 root root 4096 Jul 25 21:17 dinky
-rwxr-xr-x 1 root root 14858919 Sep 7 00:09 flink-connector-starrocks-1.2.7_flink-1.14_2.12.jar
-rw-r--r-- 1 root root 85586 Sep 7 00:07 flink-csv-1.14.6.jar
-rw-r--r-- 1 root root 136097427 Sep 7 00:07 flink-dist_2.12-1.14.6.jar
-rw-r--r-- 1 root root 153148 Sep 7 00:07 flink-json-1.14.6.jar
-rw-r--r-- 1 root root 7709731 Sep 7 00:07 flink-shaded-zookeeper-3.4.14.jar
-rw-r--r-- 1 root root 39669327 Sep 7 00:07 flink-table_2.12-1.14.6.jar
-rw-r--r-- 1 root root 208006 Sep 7 00:07 log4j-1.2-api-2.17.1.jar
-rw-r--r-- 1 root root 301872 Sep 7 00:07 log4j-api-2.17.1.jar
-rw-r--r-- 1 root root 1790452 Sep 7 00:07 log4j-core-2.17.1.jar
-rw-r--r-- 1 root root 24279 Sep 7 00:07 log4j-slf4j-impl-2.17.1.jar
Flink 依赖包目录:
[root@DESKTOP-UPCE76A lib]# ll
total 218900
-rwxr-xr-x 1 root root 164780 Sep 7 00:25 dlink-client-1.14-0.7.3.jar
-rwxr-xr-x 1 root root 16857 Sep 7 00:23 dlink-client-base-0.7.3.jar
-rwxr-xr-x 1 root root 70544 Sep 7 00:22 dlink-common-0.7.3.jar
-rwxr-xr-x 1 root root 14858919 Sep 7 00:58 flink-connector-starrocks-1.2.7_flink-1.14_2.12.jar
-rw-r--r-- 1 502 games 85586 Sep 10 2022 flink-csv-1.14.6.jar
-rw-r--r-- 1 502 games 136097427 Sep 10 2022 flink-dist_2.12-1.14.6.jar
-rw-r--r-- 1 502 games 153148 Sep 10 2022 flink-json-1.14.6.jar
-rw-r--r-- 1 502 games 7709731 Jun 9 2022 flink-shaded-zookeeper-3.4.14.jar
-rwxr-xr-x 1 root root 22968127 Sep 7 00:22 flink-sql-connector-mysql-cdc-2.3.0.jar
-rw-r--r-- 1 502 games 39669327 Sep 10 2022 flink-table_2.12-1.14.6.jar
-rw-r--r-- 1 502 games 208006 Jun 9 2022 log4j-1.2-api-2.17.1.jar
-rw-r--r-- 1 502 games 301872 Jun 9 2022 log4j-api-2.17.1.jar
-rw-r--r-- 1 502 games 1790452 Jun 9 2022 log4j-core-2.17.1.jar
-rw-r--r-- 1 502 games 24279 Jun 9 2022 log4j-slf4j-impl-2.17.1.jar
说明:
三、数据准备
-- 创建数据库
CREATE DATABASE `flinkcdc` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
-- 使用指定数据库
use `flinkcdc` ;
-- 创建表
CREATE TABLE `t_user` (
`id` bigint NOT NULL AUTO_INCREMENT,
`user_name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL,
`age` int DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1;
-- 插入示例数据
INSERT INTO `t_user` (`id`, `user_name`, `age`) VALUES (1, 'hello3', 12);
INSERT INTO `t_user` (`id`, `user_name`, `age`) VALUES (2, 'abc', 1);
INSERT INTO `t_user` (`id`, `user_name`, `age`) VALUES (3, 'dsd', 23);
StarRocks数据准备
-- 创建数据库
CREATE DATABASE `flinkcdc`;
-- 创建表
CREATE TABLE `t_user` (
`id` bigint NOT NULL,
`user_name` varchar(255) DEFAULT NULL,
`age` int DEFAULT NULL
)
PRIMARY KEY(`id`)
DISTRIBUTED BY HASH(id) BUCKETS 3
PROPERTIES
(
"replication_num" = "1"
);
四、实战演示
mysql2starrocks 数据开发脚本开始使用的官方文档:http://www.dlink.top/docs/next/data_integration_guide/cdcsource_statements#%E6%95%B4%E5%BA%93%E5%90%8C%E6%AD%A5%E5%88%B0-starrocks一直不能成功,修改后的脚本为:
EXECUTE CDCSOURCE jobname WITH (
'connector' = 'mysql-cdc',
'hostname' = '192.168.96.1',
'port' = '3306',
'username' = 'root',
'password' = '123456',
'checkpoint' = '3000',
'scan.startup.mode' = 'initial',
'parallelism' = '1',
'database-name' = 'flinkcdc',
'table-name' = 'flinkcdc\.t_user',
'sink.connector' = 'starrocks',
'sink.jdbc-url' = 'jdbc:mysql://192.168.103.111:9030',
'sink.load-url' = '192.168.103.111:8030',
'sink.username' = 'root',
'sink.password' = '',
'sink.database-name' = '${schemaName}',
'sink.table-name' = '${tableName}',
'sink.sink.properties.format' = 'json',
'sink.sink.properties.strip_outer_array' = 'true',
'sink.sink.max-retries' = '10',
'sink.sink.buffer-flush.interval-ms' = '15000',
'sink.sink.parallelism' = '1'
)
配置中心 -> 系统信息 -> logs:
TaskManager 日志:
首次全量同步:
新增记录:
删除记录:
修改记录:
使用感受
部署流程:官方文档很清晰,对着一遍走下来,完整跑通。
问题解答:社区群很活跃,提出的问题,只有描述清晰(组件版本,软件环境,报错现象,故障日志),有大佬帮忙回答。
新功能:正在紧锣密鼓开发中...期待。