Bigquery中有任何方法可以根据值范围而不是绝对值来分组吗?
我有一个查询,在一个4种不同数字组的产品表中,我要寻找的是一种有效的分组方式,例如:按"A±1000“等进行分组或按"A±10%ofA”分组。
提前谢谢,
发布于 2019-10-21 09:28:12
您可以生成列作为“命名范围”,然后按列分组。作为A+-1000
案例的一个例子:
with data as (
select 100 as v union all
select 200 union all
select 2000 union all
select 2100 union all
select 2200 union all
select 4100 union all
select 8000 union all
select 8000
)
select count(v), ARRAY_AGG(v), ranges
FROM data, unnest([0, 2000, 4000, 6000, 8000]) ranges
WHERE data.v >= ranges - 1000 AND data.v < ranges + 1000
GROUP BY ranges
输出:
+-----+------------------------+--------+
| f0_ | f1_ | ranges |
+-----+------------------------+--------+
| 2 | ["100","200"] | 0 |
| 3 | ["2000","2100","2200"] | 2000 |
| 1 | ["4100"] | 4000 |
| 2 | ["8000","8000"] | 8000 |
+-----+------------------------+--------+
发布于 2019-10-21 09:58:00
下面的示例用于BigQuery标准SQL
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
)
SELECT
CASE
WHEN price > 0 AND price <= 100 THEN ' 0 - 100'
WHEN price > 100 AND price <= 200 THEN '100 - 200'
ELSE '200+'
END AS range_group,
COUNT(1) AS cnt
FROM `project.dataset.example`
GROUP BY range_group
-- ORDER BY range_group
有结果
Row range_group cnt
1 0 - 100 2
2 100 - 200 3
3 200+ 1
如您所见,在上面的解决方案中,您需要构造CASE语句来反映您的范围--如果您有多个--这可能很无聊--所以下面是更通用(但更详细)的解决方案--它使用了最近引入的桶,桶函数。
#standardSQL
WITH `project.dataset.example` AS (
SELECT * FROM
UNNEST([STRUCT<id INT64, price FLOAT64>
(1, 15), (2, 50), (3, 125), (4, 150), (5, 175), (6, 250)
])
), ranges AS (
SELECT [100.0, 200.0] ranges_array
), temp AS (
SELECT OFFSET, IF(prev_val = val, CONCAT(prev_val, ' - '), CONCAT(prev_val, ' - ', val)) rng FROM (
SELECT OFFSET, IFNULL(CAST(LAG(val) OVER(ORDER BY OFFSET) AS STRING), '') prev_val, CAST(val AS STRING) AS val
FROM ranges, UNNEST(ARRAY_CONCAT(ranges_array, [ARRAY_REVERSE(ranges_array)[OFFSET(0)]])) val WITH OFFSET
)
)
SELECT
RANGE_BUCKET(price, ranges_array) range_group,
rng,
COUNT(1) AS cnt
FROM `project.dataset.example`, ranges
JOIN temp ON RANGE_BUCKET(price, ranges_array) = OFFSET
GROUP BY range_group, rng
-- ORDER BY range_group
有结果
Row range_group rng cnt
1 0 - 100 2
2 1 100 - 200 3
3 2 200 - 1
如您所见,在第二个解决方案中,您需要在ranges
中将您的范围定义为简单数组,将您的边界定义为SELECT [100.0, 200.0] ranges_array
。
那么temp
做所有需要的计算
发布于 2019-10-21 10:14:31
您可以在GROUP BY
上执行数学操作,根据任意条件创建组。
例如:
WITH data AS (
SELECT repo.name, COUNT(*) price
FROM `githubarchive.month.201909`
GROUP BY 1
HAVING price>100
)
SELECT FORMAT('range %i-%i', MIN(price), MAX(price)) price_range, COUNT(*) c
FROM data
GROUP BY CAST(LOG(price) AS INT64)
ORDER BY MIN(price)
https://stackoverflow.com/questions/58487874
复制相似问题