当前位置: 首页 > news >正文

HIVE实战处理(二十四)留存用户数

留存概念:
次X日活跃留存,次X日新增留存,也就是看今天的新增或活跃用户在后续几天的留存情况

一、留存表的生成逻辑

因为用户活跃日期和留存的日期无法对齐所以搞了2级分区(dt,static_day)

1)首先获得计算日D、根据要出的次X日留存,推算出前面的DT ,整体从活跃表里根据这些日期生成临时活跃表tmp1
2)分别把计算DT和前X日的DT进行匹配,按相差的天数进行匹配,如果匹配一直分别得到对应的次X日留存标识。
3)需要使用1个新的字段存储留存指标的的日期,比如20250701号的留存keep1_num只能等20250702号过完才能计算,那对应也是7.1号算留存日期,是指在DT=20250702的留存时间。

所以根据dt往前推算的日期都是留存日期,不能写到dt这个字段里,因为除了留存指标外还要计算统计日的指标。
如果留存日期=统计日期的,出的当日活跃。留存日期< 统计日期的话,出的是次X日留存指标。


--活跃临时表
create table tmp1 as 
select ,t1.uuid,t1.dt                                  as statis_day,case when t1.dt='${DT}' then 'Y' else 'N' end                      as keep_0d_active_flag,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-1), '-', '') then 'Y'else 'N' end                      as keep_1d_active_flag,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-2), '-', '') then 'Y'else 'N' end                      as keep_2d_active_flag     ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-3), '-', '') then 'Y'else 'N' end                      as keep_3d_active_flag      ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-4), '-', '') then 'Y'else 'N' end                      as keep_4d_active_flag      ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-5), '-', '') then 'Y'else 'N' end                      as keep_5d_active_flag      ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-6), '-', '') then 'Y'else 'N' end                      as keep_6d_active_flag      ,case when t1.dt=regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-7), '-', '') then 'Y'else 'N' end                      as keep_7d_active_flag        from 活跃表 t1
where t1.dt in ( 
${DT}
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-1), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-2), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-3), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-4), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-5), '-', '')
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-6), '-', '')    
,regexp_replace(date_add(from_unixtime(to_unix_timestamp('${DT}', 'yyyyMMdd')),-7), '-', '')
);--当日活跃以及留存指标
insert overwrite table 留存表 partition(dt='${DT}')
select   group_id,statis_day,channel,version,sum(case when keep_0d_active_flag='Y' then 1 else 0 end)  as av,sum(case when keep_1d_active_flag='Y' then 1 else 0 end)  as keep_1d_av,sum(case when keep_2d_active_flag='Y' then 1 else 0 end)  as keep_2d_av,sum(case when keep_3d_active_flag='Y' then 1 else 0 end)  as keep_3d_av,sum(case when keep_4d_active_flag='Y' then 1 else 0 end)  as keep_4d_av,sum(case when keep_5d_active_flag='Y' then 1 else 0 end)  as keep_5d_av,sum(case when keep_6d_active_flag='Y' then 1 else 0 end)  as keep_6d_av,sum(case when keep_7d_active_flag='Y' then 1 else 0 end)  as keep_7d_av
from(select cast(grouping__id as bigint)& 7 & 3  as group_id,channel,uuid,statis_day,max(keep_1d_active_flag)      as keep_1d_active_flag,max(keep_2d_active_flag)      as keep_2d_active_flag,max(keep_3d_active_flag)      as keep_3d_active_flag,max(keep_4d_active_flag)      as keep_4d_active_flag,max(keep_5d_active_flag)      as keep_5d_active_flag,max(keep_6d_active_flag)      as keep_6d_active_flag,max(keep_7d_active_flag)      as keep_7d_active_flag from tmp1group by ,channel  --1,version	--2		,uuid       -- 4,statis_day --8				grouping sets(          (channel,uuid,statis_day)    ,(version,uuid,statis_day),(uuid,statis_day)			)
) ta
group by  group_id,statis_day,channel,version

二、对于留存的表的查询处理

1)非留存指标的话,直接使用where dt between ‘20250701’ and ‘20250707’
2)对于留存指标要取static_day,这个static_day是代表留存日期在dt的不同留存指标。

select
dt
,sum(active_num)
,sum(keep1_num)
,sum(keep2_num)
,sum(keep3_num)
,sum(keep4_num)
from
(select
dt,
,active_num
,0 as keep1_num
,0 as keep2_num
,0 as keep3_num
,0 as keep4_num
from 留存表 where dt between ‘20250701’ and ‘20250704’
union all

select
static_day dt,
,0 as active_num
,keep1_num
,keep2_num
,keep3_num
,keep4_num
from 留存表 where static_day between ‘20250701’ and ‘20250704’
) t group by dt

http://www.lryc.cn/news/592789.html

相关文章:

  • 专题:2025智能体研究报告|附70份报告PDF、原数据表汇总下载
  • 线程控制:互斥与同步
  • math.h函数
  • 深度学习零基础入门(3)-图像与神经网络
  • 需求变更频繁?构建动态估算机制四大要点
  • 短视频矩阵系统:选择与开发的全面指南
  • nastools继任者?极空间部署影视自动化订阅系统『MediaMaster』
  • 代理模式及优化
  • 解锁时序数据库选型密码,为何国产开源时序数据库IoTDB脱颖而出?
  • 脉冲神经网络(Spiking Neural Network, SNN)与知识蒸馏(Knowledge Distillation, KD)
  • Vue3 Anime.js超级炫酷的网页动画库详解
  • Kubernetes (k8s)、Rancher 和 Podman 的异同点分析
  • Jmeter系列(6)-测试计划
  • 网关-微服务网关实现
  • Postman/Apipost中使用Post URL编码发送含换行符参数的问题分析
  • vue2 面试题及详细答案150道(101 - 120)
  • 智慧后厨检测算法构建智能厨房防护网
  • Redis学习其三(订阅发布,主从复制,哨兵模式)
  • 【大模型:知识图谱】--6.Neo4j DeskTop安装+使用
  • RS485转PROFIBUS DP网关写入命令让JRT激光测距传感器开启慢速模式连续测量
  • CCF编程能力等级认证GESP—C++1级—20250628
  • FLTK UI窗口关闭时延时卡顿问题全流程分析与优化实战
  • C++算法竞赛篇:DevC++ 如何进行debug调试
  • 记录DataGrip 2025.1.3破解失败后,无法重启问题修复
  • 第16章 基于AB实验的增长实践——验证想法:AB实验实践
  • pom.xml文件中的${}变量从哪里传值
  • UniApp TabBar 用户头像方案:绕过原生限制的实践
  • React + Mermaid 图表渲染消失问题剖析及 4 种代码级修复方案
  • Java异步日志系统性能优化实践指南:基于Log4j2异步Appender与Disruptor
  • Camera相机人脸识别系列专题分析之十七:人脸特征检测FFD算法之libhci_face_camera_api.so 296点位人脸识别检测流程详解