这是一个老师被学生吊打的案例!
遇到的现象:
#/usr/local/mysql/bin/mysql -h127.0.0.1 -P9004 -udefaultce Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 17 Server version: 21.6.5.37-ClickHouse Copyright (c) 2000, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respectiveowners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. Segmentation fault (core dumped)
这是一个非常特别的问题,很多时间感觉就是拼人品,Mybe别人的是:
mysql --version mysql Ver 8.0.25-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu)) mysql --host 127.0.0.1 --user default --port 9004 --comments Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 21.7.1.7113-ClickHouse Copyright (c) 2000, 2021, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
我也换个MySQL 8.0.25试一下,一样的碰到:Segmentation fault (core dumped)
如果把这个问题又定义成RP问题,感觉有点亏啊,因为我遇到的问题实在太多了。那就来分析一下吧!来自我谭浩强《C语言程序设计》考了100分的自信,搞定他。 最后经历了一个上午的奋斗,终于:
[root@gz191zst 14:55:43 /data/mysql-8.0.20/build] #./bin/mysql -h 127.0.0.1 -P9004 -udefault Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 43 Server version: 21.6.5.37-ClickHouse Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respectiveowners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. DatafuseLabs|知数堂 20210624 default@127.0.0.1 [(none)]>show databases;
name |
---|
default |
system |
wubx |
zst |
4 rows in set (0.00 sec) Read 4 rows, 603.00 B in 0.000592399 sec., 6752 rows/sec., 994.04 KiB/sec.
最后我把分析步骤整理一下,给大家参考一下,也算出一个遇到段错误的分析教程。
#strace /usr/local/mysql/bin/mysql -h127.0.0.1 -P9004 -udefault
munmap(0x7f5dbb4f5000, 4096) = 0 sendto(3, "\16\0\0\0\3select USER()", 18, 0, NULL, 0) = 18 recvfrom(3, "\1\0\0\1\1#\0\0\2\3def\0\0\0\rcurrentUser()\0\f"..., 16384, 0, NULL, NULL) = 131 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} --- +++ killed by SIGSEGV +++ Segmentation fault
最后出错的位置大概在执行select user()后报错,初步定位是mysql client在clickhouse执行select user()后报错。
调整系统core输出限制,先执行命令:
ulimit -c unlimited
再次运行 #/usr/local/mysql/bin/mysql -h127.0.0.1 -P9004 -udefault 形成Core文件,通过gdb阅读core输出
gdb /usr/local/mysql/bin/mysql ./core.8326 (gdb) bt #0 0x00007ff84abb5721 in __strlen_sse2_pminub () from /lib64/libc.so.6 #1 0x0000000000440bcc in my_strdup(unsigned int, char const*, int) () #2 0x000000000040bc34 in init_username() () #3 0x0000000000413ce0 in read_and_execute(bool) () #4 0x0000000000409fb1 in main ()
看到出错位置:my_strdup, init_username , 如果你的二进制strip过,只能gg吧。现在使用mysql客户端源码,主函数在mysql.cc中从源码中找到 client/mysql.cc 中相应的函数:
static void init_username() {
my_free(full_username);
my_free(part_username);
MYSQL_RES *result = nullptr;
if (!mysql_query(&mysql, "select USER()") &&
(result = mysql_use_result(&mysql))) {
MYSQL_ROW cur = mysql_fetch_row(result);
full_username = my_strdup(PSI_NOT_INSTRUMENTED, cur[0], MYF(MY_WME));
part_username =
my_strdup(PSI_NOT_INSTRUMENTED, strtok(cur[0], "@"), MYF(MY_WME));
(void)mysql_fetch_row(result); // Read eof
}
}
my_strdup在my_sys下一个工具函数
char *my_strdup(PSI_memory_key key, const char *from, myf my_flags) {
char *ptr;
size_t length = strlen(from) + 1;
if ((ptr = (char *)my_malloc(key, length, my_flags)))
memcpy(ptr, from, length);
return ptr;
}
最终可以看到出错的位置是:my_strdup(PSI_NOT_INSTRUMENTED, strtok(cur[0], "@"), MYF(MY_WME)); 按理论上讲:
PSI_NOT_INSTRUMENTED = 0
MY_WME=16
cur[0]=default; strtok(cur[0], "@") = default
感觉属于一个非常简单的问题,但实际只看代码也看不出来啥,需要在运行中看看debug出来的信息。
源码编译mysql略(比较花时间的过程)
./bin/mysql -h127.0.0.1 -P9004 -uwubx -pdefault
一样报错,这样的core文件读取就丰富多了
gdb ./bin/mysql ./core.18564 (gdb) bt #0 0x00007ffff620e721 in __strlen_sse2_pminub () from /lib64/libc.so.6 #1 0x000000000043ebcc in my_strdup (key=key@entry=0, from=0x0, my_flags=my_flags@entry=16) at /data/mysql-8.0.20/mysys/my_malloc.cc:295 #2 0x000000000040c234 in init_username () at /data/mysql-8.0.20/client/mysql.cc:5228 #3 init_username () at /data/mysql-8.0.20/client/mysql.cc:5219 #4 0x0000000000414450 in construct_prompt () at /data/mysql-8.0.20/client/mysql.cc:5131 #5 read_and_execute(bool) () at /data/mysql-8.0.20/client/mysql.cc:2216 #6 0x000000000040a357 in main () at /data/mysql-8.0.20/client/mysql.cc:1432 #7 0x00007ffff60c1555 in __libc_start_main () from /lib64/libc.so.6 #8 0x000000000040add0 in _start () at /opt/rh/devtoolset-8/root/usr/include/c++/8/new:169
通过stack信息中#1 0x000000000043ebcc in my_strdup (key=key@entry=0, from=0x0, my_flags=my_flags@entry=16) 可以看出来:
PSI_NOT_INSTRUMENTED = 0,
MY_WME=16
这两个值是对的。但cur[0]居然是空值 ~~~~
搞个程序测试一下:
#cat db.c
#include
#include
#include
struct connection_details
{
char *server;
char *user;
char *password;
int port;
char *database;
};
MYSQL *mysql_connection_setup(struct connection_details mysql_details)
{
variables within
MYSQL *connection = mysql_init(NULL);
if (!mysql_real_connect(connection, mysql_details.server, mysql_details.user, mysql_details.password, mysql_details.database, mysql_details.port, NULL, 0))
{
printf("Conection error : %s\n", mysql_error(connection));
exit(1);
}
return connection;
}
MYSQL_RES *mysql_perform_query(MYSQL *connection, char *sql_query)
{
if (mysql_query(connection, sql_query))
{
printf("MySQL query error : %s\n", mysql_error(connection));
exit(1);
}
return mysql_use_result(connection);
}
int main()
{
MYSQL *conn;
MYSQL_RES *res;
MYSQL_ROW row;
struct connection_details mysqlD;
mysqlD.server = "127.0.0.1";
mysqlD.user = "default";
mysqlD.password = "";
mysqlD.database = "";
mysqlD.port = 9004;
conn = mysql_connection_setup(mysqlD);
res = mysql_perform_query(conn, "select user()");
printf("current users:\n");
while ((row = mysql_fetch_row(res)) != NULL)
printf("%s\n", row[0]);
printf("end\n");
mysql_free_result(res);
mysql_close(conn);
return 0;
}
编译:
gcc -I /usr/local/mysql/include/ -L /usr/local/mysql/lib -lmysqlclient -lcrypto -lssl -g -o debug db.c
./debug
current users:
end
果真没有数值,那这里究竟是Clickhouse是返回了没处理,还是说clickhouse就没返回呢?来吧,抓包分析
这里可以使用到wirhshark, tcpdump找熟悉的上吧。阅读工具使用wirshark,我这里在本机测试,直接wirhshark了, 通过抓包分析,看到确实没有正确的值返回
这块可以用tcp follow 或是直接看每个packet的data,和我认识的扫地僧王斌大佬确认,这个可能是clickhouse的问题,没有正确的回包。
再骚扰一下Clickhouse的Maintainer 也是我在Datafuse Labs的同事Sundyli,上面那个能登录的提示就是他的,看样子他人品比较好。经过Sundyli的排查发现Clickhouse的mysql协议中 context->getClientInfo是没有初始化。到此为止发现能登录不上去是Bug(接锅,不是RP问题)。目前我们的项目Datafuse也在紧张开发中,不好意思太占用大佬的时间,另外也觉得这个问题比较简单,周末找时间给Clickhouse提个patch。
接下来知道问题了就自已来吧。
到这里明确下来问题是clickhouse对MySQL的客户端的select user()没处理,造成不能工作,这里比较直接采用中国式教育小孩的办法,遇事先收拾自家孩子。把init_username中的返回值hardcode一下:
static void init_username() {
my_free(full_username);
my_free(part_username);
MYSQL_RES *result = nullptr;
if (!mysql_query(&mysql, "select USER()") &&
(result = mysql_use_result(&mysql))) {
MYSQL_ROW cur = mysql_fetch_row(result);
put_info(cur[0], INFO_INFO);
put_info("DatafuseLabs|知数堂 20210624", INFO_INFO);
char user[16]="default";
full_username = my_strdup(PSI_NOT_INSTRUMENTED, user, MYF(MY_WME));
part_username =
my_strdup(PSI_NOT_INSTRUMENTED, strtok(user, "@"), MYF(MY_WME));
(void)mysql_fetch_row(result); // Read eof
}
}
然后在build,发现可以工作了。Oh yeah~~~, 先让学生用着吧。 这个修复也有一个不完美的地方:
用户名不会变了,虽然用的wubx这个帐号登录的clickhouse, 但一样不会变。先这样吧,总比不能登录,用不起来强吧。如果不想编译mysql client可以下载:https://github.com/3306pai/3306Pai-presentations/tree/main/misc
到些为止这个问题不是很完美的到一段落,先实现能用了。祝各位好运,如果你也是Clickhouse的重度用户,以后多交流。
最新更新20210625:https://github.com/ClickHouse/ClickHouse/pull/25697/files Sundyli对clickhouse侧的修复。
本文分享自 MySQLBeginner 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!