我有个mysql表,包含中文列名,spark版本从3.0.3 升级到3.3.1之后
如果对中文列做数据过滤,会导致spark sql解析异常,但是在3.0.3中是正常的
且中文列名是包含反引号的
spark:3.3.1
dataset.filter(" ( (name = 'name1') ) ")
== Parsed Logical Plan ==
'Filter ('name = name1)
+- Project [人员#541, name#542, 1 AS col1#547]
+- Project [人员#541, name#542]
+- Project [cast(人员#537 as string) AS 人员#541, cast(name#538 as string) AS name#542]
+- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]
== Analyzed Logical Plan ==
string, name: string, col1: int
Filter (name#542 = name1)
+- Project [人员#541, name#542, 1 AS col1#547]
+- Project [人员#541, name#542]
+- Project [cast(人员#537 as string) AS 人员#541, cast(name#538 as string) AS name#542]
+- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]
== Optimized Logical Plan ==
Project [人员#537, name#538, 1 AS col1#547]
+- Filter (isnotnull(name#538) AND (name#538 = name1))
+- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]
== Physical Plan ==
*(1) Project [人员#537, name#538, 1 AS col1#547]
+- *(1) Scan JDBCRelation(`test1111`) [numPartitions=1] [人员#537,name#538] PushedFilters: [*IsNotNull(name), *EqualTo(name,name1)], ReadSchema: struct<人员:string,name:string>
spark:3.3.1
dataset.filter(" ( (`人员` = '111') ) ")
== Parsed Logical Plan ==
'Filter ('人员 = 111)
+- Project [人员#576, name#577, 1 AS col1#582]
+- Project [人员#576, name#577]
+- Project [cast(人员#572 as string) AS 人员#576, cast(name#573 as string) AS name#577]
+- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]
== Analyzed Logical Plan ==
人员: string, name: string, col1: int
Filter (人员#576 = 111)
+- Project [人员#576, name#577, 1 AS col1#582]
+- Project [人员#576, name#577]
+- Project [cast(人员#572 as string) AS 人员#576, cast(name#573 as string) AS name#577]
+- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]
== Optimized Logical Plan ==
Project [人员#572, name#573, 1 AS col1#582]
+- Filter (isnotnull(人员#572) AND (人员#572 = 111))
+- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]
== Physical Plan ==
org.apache.spark.sql.catalyst.parser.ParseException:
Syntax error at or near '人'(line 1, pos 0)
== SQL ==
人员
^^^
如果是3.0.3版本,可以正常运行,升级到3.3.1上来之后就不行了
spark:3.0.3
== Parsed Logical Plan ==
'Filter (('name = name1) AND ('人员 = 111))
+- Project [人员#74, name#75, 1 AS col1#80]
+- Project [人员#74, name#75]
+- Project [cast(人员#70 as string) AS 人员#74, cast(name#71 as string) AS name#75]
+- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]
== Analyzed Logical Plan ==
人员: string, name: string, col1: int
Filter ((name#75 = name1) AND (人员#74 = 111))
+- Project [人员#74, name#75, 1 AS col1#80]
+- Project [人员#74, name#75]
+- Project [cast(人员#70 as string) AS 人员#74, cast(name#71 as string) AS name#75]
+- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]
== Optimized Logical Plan ==
Project [人员#70, name#71, 1 AS col1#80]
+- Filter (((isnotnull(name#71) AND isnotnull(人员#70)) AND (name#71 = name1)) AND (人员#70 = 111))
+- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]
== Physical Plan ==
*(1) Project [人员#70, name#71, 1 AS col1#80]
+- *(1) Scan JDBCRelation(`test1111`) [numPartitions=1] [人员#70,name#71] PushedFilters: [*IsNotNull(name), *IsNotNull(人员), *EqualTo(name,name1), *EqualTo(人员,111)], ReadSchema: struct<人员:string,name:string>
请问有人有清楚原因或者解决思路么
相似问题