现有用户订单表,请计算用户完成订单数的众数
+----------+-----------+----------------------+
| user_id | order_id | order_time |
+----------+-----------+----------------------+
| 1 | 1001 | 2024-02-25 08:00:00 |
| 1 | 1002 | 2024-02-25 09:30:00 |
| 2 | 1003 | 2024-02-25 10:45:00 |
| 2 | 1004 | 2024-02-25 12:15:00 |
| 2 | 1005 | 2024-02-25 13:45:00 |
| 3 | 1006 | 2024-02-25 15:00:00 |
| 4 | 1007 | 2024-02-25 16:30:00 |
| 4 | 1008 | 2024-02-25 18:00:00 |
| 1 | 1009 | 2024-02-26 07:45:00 |
| 1 | 1010 | 2024-02-26 09:15:00 |
| 1 | 1011 | 2024-02-26 10:30:00 |
| 2 | 1012 | 2024-02-26 12:00:00 |
| 3 | 1013 | 2024-02-26 14:00:00 |
| 3 | 1014 | 2024-02-26 15:30:00 |
| 4 | 1015 | 2024-02-26 17:00:00 |
| 4 | 1016 | 2024-02-26 18:30:00 |
| 1 | 1017 | 2024-02-27 08:15:00 |
| 1 | 1018 | 2024-02-27 09:45:00 |
| 2 | 1019 | 2024-02-27 11:00:00 |
| 2 | 1020 | 2024-02-27 12:45:00 |
+----------+-----------+----------------------+
题目要求找出用户完成订单数的众数
1)众数定义
众数(Mode)是统计学中的一个概念,它指的是在一组数据中出现次数最多的数值。在不同的数据集中,众数可能有一个或多个,甚至可能没有众数。以下是众数的一些特点:
众数是描述数据集中趋势的一种方式,它特别适用于分类数据和顺序数据。在实际应用中,众数可以帮助我们了解数据的集中趋势,尤其是在数据分布不均匀时。
2)原始题目中给出了订单明细,我们先统计出每个用户的订单数;
3)然后根据订单数据统计出每个订单数对应的人数;
4)对于单峰众数、多峰众数,我们使用rank函数进行排序,序号为1即可;
5)对于无众数情况,需要单独处理,统计rank排序序号最大是否为1,如果最大序号为1说明所有值的个数都一样,则无众数,否则有众数。
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️ |
1)先统计出每个用户完成订单数,然后根据订单数据统计“完成该单数”的用户数;
select
order_cnt,count(user_id) as user_cnt
from
(select user_id,count(order_id) as order_cnt
from t_order_043
group by user_id
) t
group by order_cnt
查询结果
2)根据上面结果,我们发现这个题目中属于是无众数的情况,先试用max()函数进行统计最大的rank_id 如果所有的id都为1这代表所有数据都一致,说明无众数。然后把这个标记放到每一行中,方便后续处理。
with tmp as (select
order_cnt,count(user_id) as user_cnt
from
(select user_id,count(order_id) as order_cnt
from t_order_043
group by user_id
) t
group by order_cnt)
select
order_cnt,
user_cnt,
rank_id,
if(max(rank_id)over()=1,0,1) as has_mode
from(
select
order_cnt,
user_cnt,
rank()over(order by user_cnt desc) as rank_id
from tmp
) t1
查询结果
3)查询出最后结果,即限定rank_id = 1,has_mode=1的数据。去除order_cnt即可
with tmp as (select
order_cnt,count(user_id) as user_cnt
from
(select user_id,count(order_id) as order_cnt
from t_order_043
group by user_id
) t
group by order_cnt)
select
order_cnt
from
(select
order_cnt,
user_cnt,
rank_id,
if(max(rank_id)over()=1,0,1) as has_mode
from(
select
order_cnt,
user_cnt,
rank()over(order by user_cnt desc) as rank_id
from tmp
) t1
)tt1
where tt1.rank_id =1
and tt1.has_mode =1
group by order_cnt
查询结果
1.给用户2增加一条记录,如此,则1,2均有7条订单,众数为7;
INSERT INTO t_order_043 VALUES
(2, 1021, '2024-02-27 12:47:00');
查询语句
with tmp as (select
order_cnt,count(user_id) as user_cnt
from
(select user_id,count(order_id) as order_cnt
from t_order_043
group by user_id
) t
group by order_cnt)
select
order_cnt
from
(select
order_cnt,
user_cnt,
rank_id,
if(max(rank_id)over()=1,0,1) as has_mode
from(
select
order_cnt,
user_cnt,
rank()over(order by user_cnt desc) as rank_id
from tmp
) t1
)tt1
where tt1.rank_id =1
and tt1.has_mode =1
group by order_cnt
查询结果
2.在给3号用户增加一条记录,这样3,4用户均有4笔订单,订单数据为4笔和7笔的分别有2个用户,无众数。
INSERT INTO t_order_043 VALUES
(3, 1022, '2024-02-27 12:48:00');
执行上面语句结果
--建表语句
CREATE TABLE IF NOT EXISTS t_order_043 (
user_id INT,
order_id INT,
order_time STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS orc;
--插入数据
INSERT INTO t_order_043 VALUES
(1, 1001, '2024-02-25 08:00:00'),
(1, 1002, '2024-02-25 09:30:00'),
(2, 1003, '2024-02-25 10:45:00'),
(2, 1004, '2024-02-25 12:15:00'),
(2, 1005, '2024-02-25 13:45:00'),
(3, 1006, '2024-02-25 15:00:00'),
(4, 1007, '2024-02-25 16:30:00'),
(4, 1008, '2024-02-25 18:00:00'),
(1, 1009, '2024-02-26 07:45:00'),
(1, 1010, '2024-02-26 09:15:00'),
(1, 1011, '2024-02-26 10:30:00'),
(2, 1012, '2024-02-26 12:00:00'),
(3, 1013, '2024-02-26 14:00:00'),
(3, 1014, '2024-02-26 15:30:00'),
(4, 1015, '2024-02-26 17:00:00'),
(4, 1016, '2024-02-26 18:30:00'),
(1, 1017, '2024-02-27 08:15:00'),
(1, 1018, '2024-02-27 09:45:00'),
(2, 1019, '2024-02-27 11:00:00'),
(2, 1020, '2024-02-27 12:45:00');