我有一个带有id和name的表
id | name
----------
1 | apple
1 | banana
2 | carrot
3 | lemon
3 | orange在Legacy Sql中,可以编写如下语句
SELECT
id,
LAST(name) AS last_record_of_name,
FROM
[project:table]
GROUP BY 1结果将会是
id | last_record_of_name
----------
1 | banana
2 | carrot
3 | orange这利用了函数LAST https://cloud.google.com/bigquery/docs/reference/legacy-sql#last
如果在BigQuery中使用标准sql,是否有类似的功能?
发布于 2017-10-20 06:53:23
来自遗留SQL的LAST的问题是,您给出的示例的输出是未定义的;它没有指定在这种情况下如何确定"last“,因为不能保证输入表扫描具有任何特定的顺序。正如所写的,您可以使用BigQuery中的标准SQL中的ANY_VALUE来表达相同的查询,例如:
WITH SampleInput AS (
SELECT 1 AS id, 'apple' AS name UNION ALL
SELECT 1, 'banana' UNION ALL
SELECT 2, 'carrot' UNION ALL
SELECT 3, 'lemon' UNION ALL
SELECT 3, 'orange'
)
SELECT
id,
ANY_VALUE(name) AS last_record_of_name
FROM SampleInput
GROUP BY id;
+----+---------------------+
| id | last_record_of_name |
+----+---------------------+
| 1 | apple |
| 2 | carrot |
| 3 | lemon |
+----+---------------------+不过,我不认为这是您想要的;如果目标是根据某些条件获得“最后”值,例如name值的排序顺序,那么您可以将ARRAY_AGG与ORDER BY和LIMIT 1一起使用,例如:
WITH SampleInput AS (
SELECT 1 AS id, 'apple' AS name UNION ALL
SELECT 1, 'banana' UNION ALL
SELECT 2, 'carrot' UNION ALL
SELECT 3, 'lemon' UNION ALL
SELECT 3, 'orange'
)
SELECT
id,
ARRAY_AGG(name ORDER BY name DESC LIMIT 1)[OFFSET(0)] AS last_record_of_name
FROM SampleInput
GROUP BY id;
+----+---------------------+
| id | last_record_of_name |
+----+---------------------+
| 1 | banana |
| 2 | carrot |
| 3 | orange |
+----+---------------------+查询的行为是定义良好的,它根据您的样本输入和输出给出所需的结果。
发布于 2018-10-12 04:12:41
Array_Agg ()与Python BigQuery的工作解决方案:
!pip安装-U google-cloud-bigquery
将熊猫作为pd导入
从google.cloud导入bigquery
strvalue =“SELECT users ARRAY_AGG(STRUCT(session,page )) as hasComp FROM <datasetname>.<tableName> WHERE Group by users order by users limit 100”
bigquery_client = bigquery.Client(project="")
dataset = bigquery_client.dataset("")
表= dataset.table('')
table.view_query_legacy_sql = False
query_job = bigquery_client.query(str_value)
df = query_job.to_dataframe()
打印(Df)
https://stackoverflow.com/questions/46839882
复制相似问题