当前位置: 首页 > news >正文

hive统计页面停留时间

1、背景:通过业务埋点数据,统计用户在页面的停留时间

样例数据,样例数据存入表tmp,

有如下字段用户uid、动作时间戳time、页面名称pn、动作名称action

SELECT '12345' AS uid, '1695613731020' AS time, '搜索' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613732021' AS time, '搜索' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613734024' AS time, '搜索' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613737036' AS time, '列表' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613738037' AS time, '列表' AS pn, 'click' AS action
UNION ALL
SELECT '12345' AS uid, '1695613740040' AS time, '列表' AS pn, 'click' AS action
uidtimepnaction
123451695613731020搜索click
123451695613732021搜索click
123451695613734024搜索click
123451695613737036列表click
123451695613738037列表click
123451695613740040列表click

思路:以用户维度按时间进行升序排列,通过lag函数找到上一个时间动作last_pn

SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn, lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
FROM tmp

rn排序的作用是找到最后一个动作

uidtimepnrnlast_pn
123451695613731020搜索6
123451695613732021搜索5搜索
123451695613734024搜索4搜索
123451695613737036列表3搜索
123451695613738037列表2列表
123451695613740040列表1列表

然后将发生页面变化的节点进行标记,

SELECT *, if(pn <> nvl(last_pn, '空') OR rn = 1, 1, 0) AS label
FROM (
SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn
, lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pn
FROM tmp
) t
uidtimepnrnlast_pnlabel
123451695613731020搜索61
123451695613732021搜索5搜索0
123451695613734024搜索4搜索0
123451695613737036列表3搜索1
123451695613738037列表2列表0
123451695613740040列表1列表1

之后统计停留时间就可以只看label =1的日志之间的时间差即可,全部代码如下,

WITH tmp AS (SELECT '12345' AS uid, '1695613731020' AS time, '搜索' AS pn, 'click' AS actionUNION ALLSELECT '12345' AS uid, '1695613732021' AS time, '搜索' AS pn, 'click' AS actionUNION ALLSELECT '12345' AS uid, '1695613734024' AS time, '搜索' AS pn, 'click' AS actionUNION ALLSELECT '12345' AS uid, '1695613737036' AS time, '列表' AS pn, 'click' AS actionUNION ALLSELECT '12345' AS uid, '1695613738037' AS time, '列表' AS pn, 'click' AS actionUNION ALLSELECT '12345' AS uid, '1695613740040' AS time, '列表' AS pn, 'click' AS action)
SELECT *
FROM (SELECT uid, pn, time, CAST((lead(time, 1) OVER (PARTITION BY uid ORDER BY time ASC) - time) / 1000 AS BIGINT) AS stay_timeFROM (SELECT *, if(pn <> nvl(last_pn, '空')OR rn = 1, 1, 0) AS labelFROM (SELECT uid, time, pn, row_number() OVER (PARTITION BY uid ORDER BY time DESC) AS rn, lag(pn, 1) OVER (PARTITION BY uid ORDER BY time ASC) AS last_pnFROM tmp) t) ttWHERE label = 1
) ttt
WHERE stay_time IS NOT NULL

最终统计结果如下

uidpntimestay_time
12345搜索16956137310206
12345列表16956137370363
http://www.lryc.cn/news/188041.html

相关文章:

  • LeetCode 24.两两交换链表中的结点
  • 【每日一记】OSPF区域划分详讲、划分区域的优点好处
  • 复旦管院启动科创战略,培养科技研发人才,引领未来发展!
  • Infinity同步
  • C语言:转义字符
  • 为什么 0.1 + 0.1 !== 0.2
  • 超详细!主流大语言模型的技术原理细节汇总!
  • 本人4年测试经验,211 本科计算机专业,由于互联网裁员,然后谈谈我最近测试面试的总结
  • Android中级——Activity数据恢复过程
  • 国内就能使用的chatgpt网页版,包含AIGC应用工具
  • Fast DDS之RTPS
  • 【算法|动态规划No.16】leetcode931. 下降路径最小和
  • Jenkins 构建时动态获取参数
  • android app开机自启动
  • XSS CSRF
  • 新加坡星银行项目组笔试题面试题
  • 基于SpringBoot的智能物流管理系统
  • 【开源电商网站】(2),使用docker-compose和dockerfile进行配置,设置自定义的镜像,安装插件,增加汉化包,支持中文界面汉化。
  • HTML5开发实例-3D全景(ThreeJs全景Demo) 详解(图)
  • springboot项目静态资源映射
  • 【Linux初阶】多线程1 | 页表的索引作用,线程基础(优缺点、异常、用途),线程VS进程,线程控制,C++多线程引入
  • Flink--9、双流联结(窗口联结、间隔联结)
  • 家政服务行业做开发微信小程序可以实现什么功能
  • 20哈希表-三数之和
  • JVM 运行时数据区和垃圾收集算法
  • Java基于SpringBoot的高校招生系统
  • 6. Python使用Asyncio开发TCP服务器简单案例
  • 景联文科技:AI大模型强势赋能,助力自动驾驶迭代升级
  • 多周期CPU设计
  • Go 复合类型之字典类型介绍