
分析师于日常数据分析及图表制作工作当中,时常会遇到一些离群值和异常值,其数值显著偏离所属样本的其余观测值,可能不具备参考价值,同时可能致使整体数据呈现偏高或偏低之状,难以精确剖析出普遍规律。

因而,在数据分析过程之中,经常会面临一类问题,即剔除极端值。因此在 SQL 面试里,“剔除最大最小值取平均”等各类指标统计,同样是一类常见的题目。
举例:存在一部门薪资表,期望剔除该部门的最高工资与最低工资,以求得平均工资。
临时表生成的 SQL 语句如下:
-- --------- start 生成临时测试表,如已有可用表忽略此步骤--------
WITH table_employee_salary AS (
select 802264760 AS employee_id,5000 AS salary_amount
union all
select 802264761 AS employee_id,7000 AS salary_amount
union all
select 802264763 AS employee_id,10000 AS salary_amount
union all
select 802264764 AS employee_id,7000 AS salary_amount
union all
select 802264765 AS employee_id,3000 AS salary_amount
union all
select 802264767 AS employee_id,2500 AS salary_amount
union all
select 802264768 AS employee_id,1000 AS salary_amount
union all
select 970014504 AS employee_id,5000 AS salary_amount
union all
select 970014503 AS employee_id,6000 AS salary_amount
union all
select 96620202 AS employee_id,3000 AS salary_amount
)
-- -------------------- 测试表生成 end ------------------对于此类问题,最为常见的一种处理方式是进行正序、倒序操作之后,剔除正数第一和倒数第一的行,而后求其均值,SQL如下:
select avg(salary_amount)
from (select employee_id,
salary_amount,
row_number() OVER (order by salary_amount asc) up_rank,
row_number() OVER (order by salary_amount desc) desc_rank
from table_employee_salary
) a
where up_rank>1 and desc_rank>1 -- 剔除正数第一和倒数第一的行数据执行展示结果如下:

row_number():无重复排名(相同排名的按序排名) dense_rank():排序相同时会重复,但不会跳过,占用的排名,总数变少。 rank(): 有相同排名会重复,但会跳过占用的排名,总数不变
解法参考如下:
select avg(salary_amount)from (select employee_id, salary_amount, rank() OVER (order by salary_amount asc) up_rank, rank() OVER (order by salary_amount desc) desc_rank from table_employee_salary ) a where up_rank>1 and desc_rank>1
欢迎探讨学习,其他类似衍生题目补充中...
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。