我有一栏“会议记录”。我希望在PySpark中将列更改为hh:mm:ss格式
输入:
minutes(string type)
10
20
70
90
产出:
minutes(string type) min_change
10 00:10:00
20 00:20:00
70 01:10:00
90 01:30:00
发布于 2020-10-15 09:54:07
添加一个带有lit("00:00:00")
的列,并将其转换为timestamp
。将minutes
转换为秒,并将其添加到时间戳列。最后,使用date_format()
获取所需的格式:
from pyspark.sql.functions import *
from pyspark.sql import functions as F
df.withColumn("minutes", col("minutes").cast("int"))\
.withColumn("min_change", lit("00:00:00").cast("timestamp"))\
.withColumn("min_change", (F.unix_timestamp("min_change") + F.col("minutes")*60).cast('timestamp'))\
.withColumn("min_change", date_format("min_change",'HH:mm:ss')).show()
+-------+----------+
|minutes|min_change|
+-------+----------+
| 10| 00:10:00|
| 20| 00:20:00|
| 70| 01:10:00|
| 90| 01:30:00|
+-------+----------+
https://stackoverflow.com/questions/64374123
复制