我有一只熊猫数据,我想从它创建一个BigQuery表。我知道有很多帖子询问这个问题,但到目前为止,我能找到的所有答案都需要明确地指定每一列的模式。例如:
from google.cloud import bigquery as bq
client = bq.Client()
dataset_ref = client.dataset('my_dataset', project = 'my_project')
table_ref = dataset_ref.table('my_table')
job_config = bq.LoadJobConfig(
schema=[
bq.SchemaField("a", bq.enums.SqlTypeNames.STRING),
bq.SchemaField("b", bq.enums.SqlTypeNames.INT64),
bq.SchemaField("c", bq.enums.SqlTypeNames.FLOAT64),
]
)
client.load_table_from_dataframe(my_df, table_ref, job_config=job_config).result()但是,有时我有许多列(例如,100列)的数据,指定所有列实际上是不重要的。有有效的方法吗?
顺便说一句,我发现这篇文章中有类似的问题:Efficiently write a Pandas dataframe to Google BigQuery,但似乎不存在bq.Schema.from_dataframe:
AttributeError: module 'google.cloud.bigquery' has no attribute 'Schema'发布于 2020-08-02 12:13:57
下面是一个代码片段,用于将DataFrame加载到BQ中:
import pandas as pd
from google.cloud import bigquery
# Example data
df = pd.DataFrame({'a': [1,2,4], 'b': ['123', '456', '000']})
# Load client
client = bigquery.Client(project='your-project-id')
# Define table name, in format dataset.table_name
table = 'your-dataset.your-table'
# Load data to BQ
job = client.load_table_from_dataframe(df, table)如果您只想指定架构的一个子集,并且仍然导入所有列,则可以用
# Define a job config object, with a subset of the schema
job_config = bigquery.LoadJobConfig(schema=[bigquery.SchemaField('b', 'STRING')])
# Load data to BQ
job = client.load_table_from_dataframe(df, table, job_config=job_config)发布于 2020-08-04 16:20:31
以下是工作代码:
from google.cloud import bigquery
import pandas as pd
bigqueryClient = bigquery.Client()
tableRef = bigqueryClient.dataset("dataset-name").table("table-name")
dataFrame = pd.read_csv("file-name")
bigqueryJob = bigqueryClient.load_table_from_dataframe(dataFrame, tableRef)
bigqueryJob.result()https://stackoverflow.com/questions/63200201
复制相似问题