在PySpark中,可以通过split函数将逗号分隔的字符串转换为列表,并将其用作查询条件。
以下是具体的步骤:
from pyspark.sql import SparkSession
from pyspark.sql.functions import split
spark = SparkSession.builder.getOrCreate()
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])
input_string = "Alice,Charlie"
input_list = input_string.split(",")
df_filtered = df.filter(df.Name.isin(input_list))
df_filtered.show()
完整代码示例:
from pyspark.sql import SparkSession
from pyspark.sql.functions import split
spark = SparkSession.builder.getOrCreate()
data = [("Alice", 25), ("Bob", 30), ("Charlie", 35)]
df = spark.createDataFrame(data, ["Name", "Age"])
input_string = "Alice,Charlie"
input_list = input_string.split(",")
df_filtered = df.filter(df.Name.isin(input_list))
df_filtered.show()
这样,你就可以使用逗号分隔的字符串作为查询条件从PySpark中获取数据了。
领取专属 10元无门槛券
手把手带您无忧上云