问Spark dataset.filter 对中文列名做过滤升级到 3.3.1版本物理解析异常？

提问于 2024-07-25 18:25:06

回答 0关注 0查看 30

我有个mysql表，包含中文列名，spark版本从3.0.3 升级到3.3.1之后

如果对中文列做数据过滤，会导致spark sql解析异常，但是在3.0.3中是正常的

且中文列名是包含反引号的

spark:3.3.1
dataset.filter(" ( (name = 'name1') ) ")

== Parsed Logical Plan ==
'Filter ('name = name1)
+- Project [人员#541, name#542, 1 AS col1#547]
   +- Project [人员#541, name#542]
      +- Project [cast(人员#537 as string) AS 人员#541, cast(name#538 as string) AS name#542]
         +- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]

== Analyzed Logical Plan ==
string, name: string, col1: int
Filter (name#542 = name1)
+- Project [人员#541, name#542, 1 AS col1#547]
   +- Project [人员#541, name#542]
      +- Project [cast(人员#537 as string) AS 人员#541, cast(name#538 as string) AS name#542]
         +- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]

== Optimized Logical Plan ==
Project [人员#537, name#538, 1 AS col1#547]
+- Filter (isnotnull(name#538) AND (name#538 = name1))
   +- Relation [人员#537,name#538] JDBCRelation(`test1111`) [numPartitions=1]

== Physical Plan ==
*(1) Project [人员#537, name#538, 1 AS col1#547]
+- *(1) Scan JDBCRelation(`test1111`) [numPartitions=1] [人员#537,name#538] PushedFilters: [*IsNotNull(name), *EqualTo(name,name1)], ReadSchema: struct<人员:string,name:string>

spark:3.3.1
dataset.filter(" ( (`人员` = '111') ) ")

== Parsed Logical Plan ==
'Filter ('人员 = 111)
+- Project [人员#576, name#577, 1 AS col1#582]
   +- Project [人员#576, name#577]
      +- Project [cast(人员#572 as string) AS 人员#576, cast(name#573 as string) AS name#577]
         +- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]

== Analyzed Logical Plan ==
人员: string, name: string, col1: int
Filter (人员#576 = 111)
+- Project [人员#576, name#577, 1 AS col1#582]
   +- Project [人员#576, name#577]
      +- Project [cast(人员#572 as string) AS 人员#576, cast(name#573 as string) AS name#577]
         +- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]

== Optimized Logical Plan ==
Project [人员#572, name#573, 1 AS col1#582]
+- Filter (isnotnull(人员#572) AND (人员#572 = 111))
   +- Relation [人员#572,name#573] JDBCRelation(`test1111`) [numPartitions=1]

== Physical Plan ==
org.apache.spark.sql.catalyst.parser.ParseException: 
Syntax error at or near '人'(line 1, pos 0)

== SQL ==
人员
^^^

如果是3.0.3版本，可以正常运行，升级到3.3.1上来之后就不行了

spark:3.0.3

== Parsed Logical Plan ==
'Filter (('name = name1) AND ('人员 = 111))
+- Project [人员#74, name#75, 1 AS col1#80]
   +- Project [人员#74, name#75]
      +- Project [cast(人员#70 as string) AS 人员#74, cast(name#71 as string) AS name#75]
         +- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]

== Analyzed Logical Plan ==
人员: string, name: string, col1: int
Filter ((name#75 = name1) AND (人员#74 = 111))
+- Project [人员#74, name#75, 1 AS col1#80]
   +- Project [人员#74, name#75]
      +- Project [cast(人员#70 as string) AS 人员#74, cast(name#71 as string) AS name#75]
         +- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]

== Optimized Logical Plan ==
Project [人员#70, name#71, 1 AS col1#80]
+- Filter (((isnotnull(name#71) AND isnotnull(人员#70)) AND (name#71 = name1)) AND (人员#70 = 111))
   +- Relation[人员#70,name#71] JDBCRelation(`test1111`) [numPartitions=1]

== Physical Plan ==
*(1) Project [人员#70, name#71, 1 AS col1#80]
+- *(1) Scan JDBCRelation(`test1111`) [numPartitions=1] [人员#70,name#71] PushedFilters: [*IsNotNull(name), *IsNotNull(人员), *EqualTo(name,name1), *EqualTo(人员,111)], ReadSchema: struct<人员:string,name:string>

请问有人有清楚原因或者解决思路么

spark