在将下面的列表转换为时遇到问题。
lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']
Desired output:
+----------+----------+----------+
| col1 | col2 | col3 |
+----------+----------+----------+
| 1 | A | aa |
+----------+----------+----------+
| 2 | B | bb |
+----------+----------+----------+
| 3 | C | cc |
+----------+----------+----------+
我本质上是在寻找相当于:
df = pd.DataFrame(data=lst,columns=cols)
发布于 2022-05-24 22:51:14
如果您安装了熊猫包,那么就可以使用spark.createDataFrame将数据导入到pyspark。
import pandas as pd
from pyspark.sql import SparkSession
lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']
df = pd.DataFrame(data=lst,columns=cols)
#Create PySpark SparkSession
spark = SparkSession.builder \
.master("local[1]") \
.appName("spark") \
.getOrCreate()
#Create PySpark DataFrame from Pandas
sparkDF=spark.createDataFrame(df)
sparkDF.printSchema()
sparkDF.show()
或者,你也可以不养熊猫。
from pyspark.sql import SparkSession
lst = [[1, 'A', 'aa'], [2, 'B', 'bb'], [3, 'C', 'cc']]
cols = ['col1', 'col2', 'col3']
#Create PySpark SparkSession
spark = SparkSession.builder \
.master("local[1]") \
.appName("spark") \
.getOrCreate()
df = spark.createDataFrame(lst).toDF(*cols)
df.printSchema()
df.show()
https://stackoverflow.com/questions/72370147
复制相似问题