前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >这些SQL优化技巧,你知道吗?(2)

这些SQL优化技巧,你知道吗?(2)

作者头像
万能数据的小草
发布2024-07-23 15:24:51
840
发布2024-07-23 15:24:51
举报
文章被收录于专栏:万能的小草
  • 优化点:

  1. 对于数据量比较大的表,如行为埋点日志、观看日志明细表,尽量避免多次扫描相同分区数据。
  2. in子查询会出现一些奇怪的异常,不要使用in子查询,可以使用JOIN或者LEFT OUTER JOIN改写。
  • 案例讲解

优化前,成本500+多

代码语言:javascript
复制
--存在的问题:多次扫描大表不同的区间,存在子查询。
SELECT
    count(a.did) as new
    ,'one' as `stage`
from
    (
        SELECT
            did
        from
            bili_dwd.dwd_demo_log_dot_yyyymmdd   --埋点日志表,数据量大
        where
            dt >= '20230819'
            and
            dt <= '20230825'
            and
            dot_code in ('click_gift_a', 'click_gift_b', 'click_gift_c')
            and
            did not in (
                SELECT
                    did
                from
                    bili_dwd.dwd_demo_log_dot_yyyymmdd   --埋点日志表
                where
                    dt >= '20230718'
                    and
                    dt < '20230818'
                    and
                    dot_code in ('click_gift_a', 'click_gift_b', 'click_gift_c')
            )
            and
            event_id = 1
        GROUP by
            did
    ) a
union all
SELECT
    count(a.did) as new
    ,'two' as `stage`
from
    (
        SELECT
            did
        from
            bili_dwd.dwd_demo_log_dot_yyyymmdd  
        where
            dt >= '20230826'
            and
            dt <= '20230901'
            and
             dot_code in ('click_gift_a', 'click_gift_b', 'click_gift_c')
            and
            did not in (
                SELECT
                    did
                from
                   bili_dwd.dwd_demo_log_dot_yyyymmdd 
                where
                    dt >= '20230725'
                    and
                    dt < '20230825'
                    and
                   dot_code in ('click_gift_a', 'click_gift_b', 'click_gift_c')
            )
            and
            event_id = 1
        GROUP by
            did
    ) a;

优化后,成本100+左右

代码语言:javascript
复制
--第一步创建临时表扫描相关分区数据
CREATE TABLE IF NOT EXISTS  bili_tmp.tmp_demo_20230909 AS
SELECT
    did
    ,event_id
    ,dt
FROM
    bili_dwd.dwd_demo_log_dot_yyyymmdd   -- 埋点日志表
WHERE
    dt >= '20190718'
    AND dt <= '20190901'
     AND dot_code in ('click_gift_a', 'click_gift_b', 'click_gift_c')
GROUP BY
    did
    ,event_id
    ,dt;
-- 第二步:针对数据进行相关逻辑处理,避免子查询,避免多次扫描大表
SELECT
    count(a.did) as new
    ,'one' as `stage`
from
    (
        SELECT
            did
        from
            bili_tmp.tmp_demo_20230909
        where
            dt >= '20230718'
            and
            dt <= '20230825'
        GROUP by
            did
        HAVING
            max(if(dt >= '20190819' AND dt <= '20190825' AND event_id = 1, 'Y', 'N')) = 'Y'
            AND max(if(dt >= '20190718' AND dt <= '20190818', 'Y', 'N')) = 'N'
    ) a
union all
SELECT
    count(a.did) as new
    ,'two' as `stage`
from
    (
        SELECT
            did
        from
            bili_tmp.tmp_demo_20230909
        where
            dt >= '20190725'
            and
            dt <= '20190901'
        GROUP by
            did
        HAVING
            max(if(dt >= '20190826' AND dt <= '20190901' AND event_id = 1, 'Y', 'N')) = 'Y'
            AND max(if(dt >= '20190725' AND dt <= '20190825', 'Y', 'N')) = 'N'
    ) a;

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2023-10-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 万能数据的小草 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档