| 0| 0.9183605955607251|+---+-------------------+df.withColumn('v', plus_one(df.v)).agg(count(col('v'))).show()简而言之,UDF查询占用~16秒,而普通python比较一下,
df.selectExpr("id", "v+1 as v
因此,使用我的python脚本,我可以轻松地替换一个字符: >>> print("I want to replace \u00c6 with AE and \u00d5 with O".replace(unicode_table[key])
print(result) 输出: [puppet@damageinc python]$ python replace_unicode.py "\u00