数据图
一.使用SQL脚本处理
1.1 sql group_concat函数对字段进行转置
日常工作中我们常见的这种关系型的数据如上所示,计算累加和,平均数可以通过mysql的自连接完成,同时使用行转列函数group_concat(),需要设置一下group_concat_max_len的长度 ,执行完可以看到数据已经score字段根据前面几个group by的字段聚合后,一天内多条记录的转变为一条记录
SET group_concat_max_len = 20000;
select day_id,model_id,model_name,param_type_id
,param_type
,group_concat(score*1) as scores
,sysdate() as create_time
from dbetl.etl0_score_reports_score
group by day_id,model_id,param_type_id
order by day_id,model_id,param_type_id
1.2 自连接对数值进行叠加计算
select a.*,SUM(b.scores) AS sum_scores
from (
select day_id,model_id,model_name,param_type_id
,param_type
,sum(score*1) as scores
,sysdate() as create_time
from dbetl.etl0_score_reports_score
group by day_id,model_id,param_type_id
order by day_id,model_id,param_type_id
) as a
JOIN (
select day_id,model_id,model_name,param_type_id
,param_type
,sum(score*1) as scores
,sysdate() as create_time
from dbetl.etl0_score_reports_score
group by day_id,model_id,param_type_id
order by day_id,model_id,param_type_id
) as b
ON a.day_id >= b.day_id and a.model_id = b.model_id
and a.param_type_id = b.param_type_id
GROUP BY a.day_id,a.model_id,a.param_type_id
order by a.day_id,a.model_id,a.param_type_id
上面做法从上图可以看出最后一个字段的值 代表每天每个模型每个风控模型的值累加和
二.python脚本处理
三.代码和测试数据下载
代码:https://github.com/chengcxy/Data_ETL/blob/master/sub_scores.py
据:https://github.com/chengcxy/Data_ETL/blob/master/py%E8%84%9A%E6%9C%ACsub_scores%E6%95%B0%E6%8D%AE.sql
领取专属 10元无门槛券
私享最新 技术干货