在循环中创建一个pyspark DataFrame可以通过以下步骤实现:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
spark = SparkSession.builder.appName("DataFrameCreation").getOrCreate()
schema = StructType([
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)
])
df = spark.createDataFrame([], schema)
for i in range(5):
name = "Person " + str(i)
age = i * 10
row = (name, age)
df = df.union(spark.createDataFrame([row], schema))
在上述代码中,我们通过循环迭代创建了5个Person对象的数据,并将每个Person对象的姓名和年龄添加到DataFrame中。
完整的代码示例如下:
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
spark = SparkSession.builder.appName("DataFrameCreation").getOrCreate()
schema = StructType([
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)
])
df = spark.createDataFrame([], schema)
for i in range(5):
name = "Person " + str(i)
age = i * 10
row = (name, age)
df = df.union(spark.createDataFrame([row], schema))
df.show()
这样,我们就在循环中成功创建了一个pyspark DataFrame。
领取专属 10元无门槛券
手把手带您无忧上云