大家好,我是 JiekeXu,很高兴又和大家见面了,今天和大家一起来看看 OGG 19c Bug 35446910 报错 Mapping error 错误解决办法,欢迎点击上方蓝字“JiekeXu DBA之路”关注我的公众号,标星或置顶,更多干货第一时间到达!
这个月我们同事对原来生产环境的 19c RAC 进行 schema 级别的拆分,将其从 RHEL7.6 RU 19.15 的两个 schema 拆分迁移到 RHEL7.9 RU 19.21 的 RAC 上,迁移前后表空间和 schema 名不变,唯一改变的则是 RU 补丁版本和数据库名及 IP 地址,同时也配置了 Oracle 到 Kafka 的数据同步,所以需要重新部署 OGG 来同步新迁移的数据,Oracle 到 Kafka 的数据同步可以参考我这篇使用 OGG12.3 同步 Oracle 部分表到 Kafka。
新环境源端 OGG 版本:Linux, x64, 64bit (optimized) Version 19.1.0.0.4 OGGCORE_19.1.0.0.0_PLATFORMS_191017.1054_FBO 目标端 OGG for Kafka 版本:Linux, x64, 64bit (optimized) Oracle GoldenGate for Big Data Version 12.3.2.1.1 (Build 005),可见,OGG 19.1 没有打过任何补丁。
--源端
$ ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 19.1.0.0.4 OGGCORE_19.1.0.0.0_PLATFORMS_191017.1054_FBO
Linux, x64, 64bit (optimized), Oracle 19c on Oct 17 2019 21:16:29
Operating system character set identified as US-ASCII.
Copyright (C) 1995, 2019, Oracle and/or its affiliates. All rights reserved.
GGSCI 1> info all
-- for kafka 目标端
$ ggsci
Oracle GoldenGate for Big Data
Version 12.3.2.1.1 (Build 005)
Oracle GoldenGate Command Interpreter
Version 12.3.0.1.2 OGGCORE_OGGADP.12.3.0.1.2_PLATFORMS_180712.2305
Linux, x64, 64bit (optimized), Generic on Jul 13 2018 00:46:09
Operating system character set identified as UTF-8.
Copyright (C) 1995, 2018, Oracle and/or its affiliates. All rights reserved.
问题现象
当目标端 OGG 运行一段时候后,rep 复制进程中的几个表就会报错 Mapping error,剩余其他几个表是正常同步的。view report rep1_yy 具体报错如下:
2023-12-19 14:50:01 WARNING OGG-01431 Aborted grouped transaction on CC_YY.T_CI_UKEY_APPLY, Mapping error.
2023-12-19 14:50:01 WARNING OGG-01003 Repositioning to rba 5159 in seqno 0.
2023-12-19 14:50:01 WARNING OGG-01151 Error mapping from CC_YY.T_CI_UKEY_APPLY to CC_YY.T_CI_UKEY_APPLY.
Source Context :
SourceModule : [er.replicat.errors]
SourceID : [/scratch/aime/adestore/views/aime_adc4150560/oggcore/OpenSys/src/app/er/replicat/reperrors.cpp]
SourceMethod : [repError]
SourceLine : [1673]
ThreadBacktrace : [15] elements
: [/goldengate/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7f6893d290ae]]
: [/goldengate/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x6ac) [0x7f6893d199bc]]
: [/goldengate/libgglog.so(_MSG_QualTableName_QualTableName(CSourceContext*, int, ggs::gglib::ggapp::CQualDBObjName<(DBObjType)1> const&, ggs::ggl
ib::ggapp::CQualDBObjName<(DBObjType)1> const&, CMessageFactory::MessageDisposition)+0x74) [0x7f6893d10e84]]
: [/goldengate/replicat(ggs::er::ReplicatContext::repError(short, int, char const*, extr_ptr_def*, ggs::gglib::gglcr::CommonLCR const*, std_rec_hd
r_def*, char*, ObjectMetadata*, bool)+0x2821) [0x63a461]]
: [/goldengate/replicat(ggs::er::ReplicatContext::replicate_io(ggs::gglib::gglcr::CommonLCR const*, ggs::gglib::gglcr::CommonLCR*, extr_ptr_def*,
bool, int&, RepCsn&, bool*)+0x29b9) [0x6eb929]]
: [/goldengate/replicat(ggs::er::ReplicatContext::processRecord(ggs::gglib::gglcr::CommonLCR const*, ggs::gglib::gglcr::CommonLCR*, extr_ptr_def*&
, extr_ptr_def*&, bool&, int&, bool, RepCsn&)+0x318) [0x6f98d8]]
: [/goldengate/replicat(ggs::er::ReplicatContext::processReplicatLoop()+0x4614) [0x631494]]
: [/goldengate/replicat(ggs::er::ReplicatContext::run()+0x45) [0x610545]]
: [/goldengate/replicat() [0x6e506f]]
: [/goldengate/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x5e) [0x7e2d8e]]
: [/goldengate/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*)+0x173) [0x7e7153]]
: [/goldengate/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0x140) [0x7e79c0]]
: [/goldengate/replicat(main+0x3b) [0x6e7e0b]]
: [/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f688c3933d5]]
: [/goldengate/replicat() [0x550831]]
2023-12-19 14:50:01 ERROR OGG-01296 Error mapping from CC_YY.T_CI_UKEY_APPLY to CC_YY.T_CI_UKEY_APPLY.
通过 stats rep1_yy 查看统计信息,查看复制进程处理的记录数。通过报告看到 Mapping 错误的表有 insert 操作,但是没有 update 操作,正常复制的表 也是只有 insert 操作,没有 update 操作。
GGSCI 3> stats rep1_yy
Sending STATS request to REPLICAT rep1_yy ...
Start of Statistics at 2023-12-21 14:02:05.
Replicating from CC_YY.T_AUTH_CA_AUTH_TEMP to CC_YY.T_AUTH_CA_AUTH_TEMP:
*** Total statistics since 2023-12-19 16:50:35 ***
Total inserts 2.00
Total updates 0.00
Total befores 0.00
Total deletes 0.00
Total discards 0.00
Total operations 2.00
*** Daily statistics since 2023-12-21 00:00:00 ***
Total inserts 2.00
Total updates 0.00
Total befores 0.00
Total deletes 0.00
Total discards 0.00
Total operations 0.00
*** Hourly statistics since 2023-12-21 10:00:00 ***
No database operations have been performed.
*** Latest statistics since 2023-12-19 16:50:35 ***
Total inserts 2.00
Total updates 0.00
Total befores 0.00
Total deletes 0.00
Total discards 0.00
Total operations 2.00
然后去 dirrpt 目录下看日志 rep1_yy.dsc 发现了错误,当表做 update 更新的时候 SEQUENCE_NO 主键丢失,所以就 ABEND 了,也说明了前面的 insert 可以成功,当遇到 update 就 ABEND 了,则可以确定就是两边的表结构不一致导致 ABEND。那么到底是什么原因导致的呢?接着往下看!
more /goldengate/dirrpt/rep1_yy.dsc
Oracle GoldenGate Delivery process started, group rep1_yy discard file opened: 2023-12-19 14:48:52.006938
Key column SEQUENCE_NO (0) is missing from update on table CC_YY.T_CI_UKEY_APPLY
Missing 1 key columns in update for table CC_YY.T_CI_UKEY_APPLY.
Current time: 2023-12-19 14:48:52
Discarded record from action ABEND on error 0
Aborting transaction on /goldengate/dirdat/yy beginning at seqno 0 rba 5159
error at seqno 0 rba 5159
Problem replicating CC_YY.T_CI_UKEY_APPLY to CC_YY.T_CI_UKEY_APPLY
Mapping problem with compressed update record (target format)...
*
PK_APPLY = f3fba1ad-8e9f-41d4-af8f-fb32b5074eba
000000: 66 33 66 62 61 31 61 64 2d 38 65 39 66 2d 34 31 |f3fba1ad-8e9f-41|
000010: 64 34 2d 61 66 38 66 2d 66 62 33 32 62 35 30 37 |d4-af8f-fb32b507|
000020: 34 65 62 61 |4eba |
UPDATE_TIME = 2023-12-19 11:21:39
000000: 32 30 32 33 2d 31 32 2d 31 39 20 31 31 3a 32 31 |2023-12-19 11:21|
000010: 3a 33 39 |:39 |
FK_USER_UPDATE = 0397fc36-0074-4fb3-992b-9bd60a48595e
000000: 30 33 39 37 66 63 33 36 2d 30 30 37 34 2d 34 66 |0397fc36-0074-4f|
000010: 62 33 2d 39 39 32 62 2d 39 62 64 36 30 61 34 38 |b3-992b-9bd60a48|
000020: 35 39 35 65 |595e |
USER_NAME_UPDATE = 张三
000000: e7 94 98 e9 9c 96 |...... |
APPROVE_STATE = APPROVEING_1ST
000000: 41 50 50 52 4f 56 45 49 4e 47 5f 31 53 54 |APPROVEING_1ST |
*
Process Abending : 2023-12-19 14:48:52
我们去排查目标端 param 参数是否出现中文字符或者中文引号,缺少分号等情况,但是检查下来都没有问题,如下显示都是正常配置,接下来我们去检查 trandata 信息吧。
GGSCI 52> view params rep1_yy
REPLICAT rep1_yy
sourcedefs /goldengate/dirdef/goldengate.def
TARGETDB LIBFILE libggjava.so SET property=dirprm/kafka_prod.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
GETUPDATEBEFORES
HANDLECOLLISIONS
REPLACEBADCHAR SUBSTITUTE ? FORCECHECK
MAP CC_YY.T_AUTH_CA_AUTH_TEMP,target CC_YY.T_AUTH_CA_AUTH_TEMP;
map CC_YY.T_CI_OPERATE_FLOW_HIS,target CC_YY.T_CI_OPERATE_FLOW_HIS;
map CC_YY.T_CI_RN_PROGRESS,target CC_YY.T_CI_RN_PROGRESS;
map CC_YY.T_CI_UKEY_APPLY,target CC_YY.T_CI_UKEY_APPLY;
map CC_YY.T_CI_CORP,target CC_YY.T_CI_CORP;
map CC_YY.T_CI_CORP_TEMP,target CC_YY.T_CI_CORP_TEMP;
--mapexclude CC_YY.T_CI_CORP_TEMP;
我们回到源库,去检查报错表 T_CI_UKEY_APPLY 的 trandata 信息,也没有发现异常,尝试重新添加 trandata 信息。甚至直接添加整个 schema 的 trandata 信息也是无果。然后在 SQL 级别和 OGG 级别尝试开启所有列补充日志也是没有效果。
--查看,删除,重新添加 trandata
GGSCI > info trandata CC_YY.T_CI_UKEY_APPLY
GGSCI > delete trandata CC_YY.T_CI_UKEY_APPLY
GGSCI > add trandata CC_YY.T_CI_UKEY_APPLY
GGSCI > add schematrandata <your_schema_name>
--- from source SQL
alter table CC_YY.T_CI_UKEY_APPLY drop supplemental log data (all) columns;
alter table CC_YY.T_CI_UKEY_APPLY add supplemental log data (all) columns;
--- from source GGSCI
GGSCI > dblogin USERIDALIAS alias_ogg
GGSCI > Add Trandata CC_YY.T_CI_UKEY_APPLY allcols
2023-12-19 13:56:53 INFO OGG-15131 Logging of supplemental redo log data is already enabled for table CC_YY.T_CI_UKEY_APPLY.
2023-12-19 13:56:53 WARNING OGG-00706 Failed to add supplemental log group on table CC_YY.T_CI_UKEY_APPLY due to ORA-32588: supplemental logging attribute all column exists
SQL ALTER TABLE "CC_YY"."T_CI_UKEY_APPLY" ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS /* GOLDENGATE_DDL_REPLICATION */.
2023-12-19 13:56:53 INFO OGG-10471 ***** Oracle Goldengate support information on table CC_YY.T_CI_UKEY_APPLY *****
Oracle Goldengate support native capture on table CC_YY.T_CI_UKEY_APPLY.
Oracle Goldengate marked following column as key columns on table CC_YY.T_CI_UKEY_APPLY: SEQUENCE_NO.
GGSCI 7> info trandata CC_YY.T_CI_UKEY_APPLY
最终在 MOS 上找到了一篇 (Doc ID 2963276.1) Replicat Abends With Key Column <COL_NAME> (0) Is Missing From Update On Table <SCHEMA.TABLENAME> 一样报错的文章,文中描述此问题属于 Bug 35446910,刚好命中此版本 V19.1, Oracle GoldenGate - Version 19.1.0.0.2 and later。
描述中说当我们删除 trandata 后,必须添加 trandata 才能启用 LOGICAL_REPLICATION(PS:实践中 add trandata 也不行),add schemarandata 无法启用 LOGICAL_REPLICATION,因此出现了此问题。删除 trandata 后,它将 LOGICAL_REPLICATION 设置为 DISABLED,添加 schemarandata 之后,由于这没有使关键 cols 进入踪迹,则一直保持为 DISABLED,这是预期的行为。
与 MOS 文章不同的是说 add trandata–>delete trandata–>add schematrandata 这样无法将 LOGICAL_REPLICATION 设置为 ENABLED,确实这样无法变为 ENABLED,那么我后面继续使用 add trandata 也没有将其变为 ENABLED,所以就出现了上面遇到 update 就 ABENDED 的问题。这里顺便说下 LOGICAL_REPLICATION,视图 ALL_ALL_TABLES/DBA_TABLES.LOGICAL_REPLICATION 是 Oracle 19c 新增的一个 VARCHAR2(8) 类型字段,指示是否为表启用逻辑复制 (ENABLED)或(DISABLED),如果启用了数据库范围的列数据补充日志记录,则忽略此设置。
SQL> select LOGICAL_REPLICATION from dba_tables where TABLE_NAME='<TABLENAME>';
LOGICAL_
--------
DISABLED
删除 transdata 可能会禁用逻辑复制,如果表的逻辑复制被禁用,即使为模式添加了 schemarandata,也不会为表记录任何补充日志数据,实践表明在 19.21 中,删除 trandata 后 add trandata 也不会启用表的逻辑复制(恰巧上周迁移时新建 OGG 环境后刚好执行了 delete trandata 命令),那么我们则可以按照 MOS 文章中的临时解决办法通过 SQL 语句为表启用 LOGICAL REPLICATION。
SELECT LOGICAL_REPLICATION FROM DBA_TABLES
WHERE TABLE_NAME='T_CI_UKEY_APPLY' AND OWNER='CC_YY';
ALTER TABLE CC_YY.T_CI_UKEY_APPLY ENABLE LOGICAL REPLICATION;
SELECT LOGICAL_REPLICATION FROM DBA_TABLES WHERE TABLE_NAME='T_CI_UKEY_APPLY' AND OWNER='CC_YY';
当对这几个表启用逻辑复制后,重新启动 rep 复制进程正常,使用 stats 查看统计信息相关表的 update 已经有数据进入了,观察了两天没有再次 ABENDED 说明问题得到了解决。
Doc ID 2963276.1 Replicat Abends With Key Column <COL_NAME> (0) Is Missing From Update On Table <SCHEMA.TABLENAME> (Doc ID 2963276.1) Bug 35446910 : Extract Does Not Captured Key Cols For An Update After Re-Adding The Schema Trandata。 Bug 35289372 : Extract Does Not Captured Key Cols For An Update After Re-Adding The Trandata https://community.oracle.com/mosc/discussion/3954985/replicat-abend-missing-1-key-columns-in-update-for-table 19c 官方文档:ALL_ALL_TABLES 列信息描述 https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/ALL_ALL_TABLES.html#GUID-B8CF1D2A-9AA0-4C94-BBBA-4672C7CF735F