当前位置: 首页 > news >正文

HiveSQL——连续增长问题

注:参考文章:

SQL连续增长问题--HQL面试题35_sql判断一个列是否连续增长-CSDN博客文章浏览阅读2.6k次,点赞6次,收藏30次。目录0 需求分析1 数据准备3 小结0 需求分析假设我们有一张订单表shop_order shop_id,order_id,order_time,order_amt 我们需要计算过去至少3天销售金额连续增长的商户shop_id。数据如下:shop_idorder_amtorder_time11002021-05-10 10:03:5411012021-05-10 10:04:5413002021-0_sql判断一个列是否连续增长https://blog.csdn.net/godlovedaniel/article/details/119080882

0 需求分析

  现有一张订单表shop_order ,含有字段shop_id,order_id,order_time,order_amt, 需要统计过去至少连续3天销售金额连续增长的商户shop_id。

1 数据准备

create table shop_order(shop_id int,order_amt int,order_time string
)
row format delimited fields terminated by '\t';
load data local inpath "/opt/module/hive_data/shop_order.txt" into table shop_order;

2 数据分析

   完整的代码如下:

with tmp as (selectshop_id,to_date(order_time) as dt,sum(order_amt)      as amtfrom shop_ordergroup by shop_id, to_date(order_time)
)
selectshop_id
from (select *,-- 判断日期是否连续date_sub(dt, row_number() over (partition by shop_id order by dt )) as order_date_difffrom (selectshop_id,dt,amt,--判断销售额是否增长-- 当前行的销售金额与上一行的销售金额之间的差值 order_amt_diffamt - lag(amt, 1, 0) over (partition by shop_id order by dt) as order_amt_diff from tmp) t1-- 差值大于0的代表销售额增长where order_amt_diff > 0) t2
group by shop_id, order_date_diff
having count(1) >=3;

输出结果为 shop_id 为2

上述代码分析:

 step1: 求出每家商户销售金额连续增长的记录

with tmp as (selectshop_id,to_date(order_time) as dt,sum(order_amt)      as amtfrom shop_ordergroup by shop_id, to_date(order_time)
)select *
from (selectshop_id,dt,amt,--判断销售额是否增长-- 当前行的销售金额与上一行的销售金额之间的差值 order_amt_diffamt - lag(amt, 1, 0) over (partition by shop_id order by dt) as order_amt_difffrom tmp) t1-- 差值大于0的代表销售额增长
where order_amt_diff > 0

 step2: 求出每家商户至少连续3天销售金额连续增长,在step1的基础上,还要求dt是连续的

with tmp as (selectshop_id,to_date(order_time) as dt,sum(order_amt)      as amtfrom shop_ordergroup by shop_id, to_date(order_time)
)select *,-- 判断日期是否连续date_sub(dt, row_number() over (partition by shop_id order by dt )) as order_date_diff
from (selectshop_id,dt,amt,--判断销售额是否增长-- 当前行的销售金额与上一行的销售金额之间的差值 order_amt_diffamt - lag(amt, 1, 0) over (partition by shop_id order by dt) as order_amt_difffrom tmp) t1-- 差值大于0的代表销售额增长
where order_amt_diff > 0

step3: 对商户shop_id以及日期差值order_date_diff这两个字段分组,求出最终结果

with tmp as (selectshop_id,to_date(order_time) as dt,sum(order_amt)      as amtfrom shop_ordergroup by shop_id, to_date(order_time)
)
selectshop_id
from (select *,-- 判断日期是否连续date_sub(dt, row_number() over (partition by shop_id order by dt )) as order_date_difffrom (selectshop_id,dt,amt,--判断销售额是否增长-- 当前行的销售金额与上一行的销售金额之间的差值 order_amt_diffamt - lag(amt, 1, 0) over (partition by shop_id order by dt) as order_amt_diff --判断是否增长from tmp) t1-- 差值大于0的代表销售额增长where order_amt_diff > 0) t2
group by shop_id, order_date_diff
having count(1) >=3;

3 小结

   date_sub(日期减少函数)

  • 语法:date_sub(string startdate,int days)
  • 返回值:string
  • 说明:返回   开始日期startdate 减去days天后的日期
  • 举例:select  date_sub('2024-02-01',3) --->2024-01-29

lag

  • 语法:lag(column,n,default) over(partition by ....order by....)
  • 说明:取得column列前边的第n行数据,如果存在则返回,如果不存在,返回默认值default

     针对【日期连续】等类型的题型,一般处理思路:先计算date_sub(dt, row_number() over (partition by shop_id order by dt )) as dt_diff ,再对dt_diff 分组,求count()值

    针对【xx连续增长】等类型的题型,一般处理思路:利用前后函数lag或者lead往前/往后取一行,计算两者的差值diff,再利用 if( diff >0,1,0) as flag 等条件判断函数 进行打标签,基于标签再进行后续的分组计算.......

http://www.lryc.cn/news/297535.html

相关文章:

  • 使用cocos2d-console初始化一个项目
  • VitePress-13- 配置-title的作用详解
  • Rust-AI todo list 开发体验
  • 2024-02-07(Sqoop,Flume)
  • LDAR管理系统解决方案
  • [vscode]ssh报错: Resolver error: Error: XHR failedscode错误
  • 【Maven】依赖、构建管理 继承与聚合 快速学习(3.6.3 )
  • Flume安装部署
  • 点云从入门到精通技术详解100篇-非结构化道路下无人平台路径规划与运动控制
  • 生成树技术华为ICT网络赛道
  • [HTTP协议]应用层的HTTP 协议介绍
  • Linux 命令基础
  • 【开源】JAVA+Vue+SpringBoot实现实验室耗材管理系统
  • 集成开发环境 IntelliJ IDEA的基本使用
  • 【Flink入门修炼】1-2 Mac 搭建 Flink 源码阅读环境
  • Spring IoC容器详解
  • 06 MP之自动填充+SQL执行的语句和速度分析
  • 3 scala集合-Set
  • Android应用图标微技巧,8.0系统中应用图标的适配
  • java学习(多态)
  • [UI5 常用控件] 07.SplitApp,SplitContainer
  • MyBatisPlus之分页查询及Service接口运用
  • 对象存储minio
  • 大模型学习笔记二:prompt工程
  • MATLAB实现LSTM时间序列预测
  • Kubernetes CNI Calico:Route Reflector 模式(RR) calico IPIP切换RR网络模式
  • 探索Gin框架:Golang Gin框架请求参数的获取
  • 极值图论基础
  • word导出链接
  • (delphi11最新学习资料) Object Pascal 学习笔记---第4章第2.5节(重载和模糊调用)