腾讯云

文章/答案/技术大牛

发布

社区首页 >问答首页 >inserInto在覆盖模式下是附加的，而不是覆盖分区。

问inserInto在覆盖模式下是附加的，而不是覆盖分区。
EN

Stack Overflow用户

提问于 2022-03-01 06:18:39

回答 1查看 789关注 0票数 0

我是一个数据工程师，我正在研究spark 2.3，我遇到了一些问题：

像下面这样的函数inserInto并不是在覆盖中插入，而是附加在后面，即使我将spark.conf更改为“动态”

spark = spark_utils.getSparkInstance()
spark.conf.set('spark.sql.sources.partitionOverwriteMode', 'dynamic')

df\
.write\
.mode('overwrite')\
.format('orc')\
.option("compression","snappy")\
.insertInto("{0}.{1}".format(hive_database , src_table ))

每次我运行作业时，都会在分区中附加行，而不是重写任何通过这个问题的行吗？谢谢

apache-spark

pyspark

apache-spark-sql

spark-streaming

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-03-01 08:00:38

我试图再现错误，从文档中，您必须在insertInto中重写为true。

    def insertInto(self, tableName, overwrite=False):
        """Inserts the content of the :class:`DataFrame` to the specified table.

        It requires that the schema of the class:`DataFrame` is the same as the
        schema of the table.

        Optionally overwriting any existing data.
        """
        self._jwrite.mode("overwrite" if overwrite else "append").insertInto(tableName)

因此，将其应用于您的代码将是：

df\
.write\
.mode('overwrite')\
.format('orc')\
.option("compression","snappy")\
.insertInto("{0}.{1}".format(hive_database , src_table ), overwrite=True))

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71309899

复制

相似问题

问inserInto在覆盖模式下是附加的，而不是覆盖分区。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问inserInto在覆盖模式下是附加的，而不是覆盖分区。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问inserInto在覆盖模式下是附加的，而不是覆盖分区。
EN