当前位置: 首页 > news >正文

HiveSQL高级进阶技巧

目录

  • 1.删除
  • 2.更新:
  • 3.行转列:
  • 4.列转行:
  • 5.分析函数:
  • 6.多维分析
  • 7.数据倾斜
    • groupby:
    • join:

掌握下面的技巧,你的SQL水平将有一个质的提升!

1.删除

正常hive删除操作基本都是覆盖原数据;

insert overwrite tmp 
select * from tmp where id != '666';

2.更新:

更新也是覆盖操作;

insert overwrite tmp 
select id,label,if(id = '1' and label = 'grade','25',value) as value 
from tmp where id != '666';

3.行转列:

思路1:
先通过concat函数把多列数据拼接成一个长的字符串,分割符为逗号,再通过explode函数炸裂成多行,然后使用split函数根据分隔符进行切割;

-- Step03:最后将info的内容切分
select id,split(info,':')[0] as label,split(info,':')[1] as value
from 
(
-- Step01:先将数据拼接成“heit:180,weit:60,age:26”select id,concat('heit',':',height,',','weit',':',weight,',','age',':',age) as value from tmp
) as tmp
-- Step02:然后在借用explode函数将数据膨胀至多行
lateral view explode(split(value,',')) mytable as info;

思路2:使用union all函数,多段union

select id,'heit' as label,height as value
union all 
select id,'weit' as label,weight as value
union all 
select id,'age' as label,age as value

4.列转行:

思路1:多表join,进行关联

select 
tmp1.id as id,tmp1.value as height,tmp2.value as weight,tmp3.value as age 
from 
(select id,label,value from tmp2 where label = 'heit') as tmp1
join
on tmp1.id = tmp2.id
(select id,label,value from tmp2 where label = 'weit') as tmp2
join
on tmp1.id = tmp2.id
(select id,label,value from tmp2 where label = 'age') as tmp3
on tmp1.id = tmp3.id;

思路2:使用max(if) 或max(case when ),可以根据实际情况换成sum函数

select 
id,
max(case when label = 'heit' then value  end) as height,
max(case when label = 'weit' then value  end) as weight,
max(case when label = 'age' then value  end) as age 
from tmp2 
group by
id;

思路3:map的思想,先拼接成map的形式,再取下标

select
id,tmpmap['height'] as height,tmpmap['weight'] as weight,tmpmap['age'] as age
from 
(select id,str_to_map(concat_ws(',',collect_set(concat(label,':',value))),',',':') as tmpmap  from tmp2 group by id
) as tmp1;

5.分析函数:

select id,label,value,lead(value,1,0)over(partition by id order by label) as lead,lag(value,1,999)over(partition by id order by label) as lag,first_value(value)over(partition by id order by label) as first_value,last_value(value)over(partition by id order by label) as last_value
from tmp;
select id,label,value,row_number()over(partition by id order by value) as row_number,rank()over(partition by id order by value) as rank,dense_rank()over(partition by id order by value) as dense_rank
from tmp;

6.多维分析

select col1,col2,col3,count(1),Grouping__ID 
from tmp 
group by col1,col2,col3
grouping sets(col1,col2,col3,(col1,col2),(col1,col3),(col2,col3),())
select col1,col2,col3,count(1),Grouping__ID 
from tmp 
group by col1,col2,col3
with cube;

7.数据倾斜

groupby:

select label,sum(cnt) as all from 
(select rd,label,sum(1) as cnt from (select id,label,round(rand(),2) as rd,value from tmp1) as tmpgroup by rd,label
) as tmp
group by label;

join:

select label,sum(value) as all from 
(select rd,label,sum(value) as cnt from(select tmp1.rd as rd,tmp1.label as label,tmp1.value*tmp2.value as value from (select id,round(rand(),1) as rd,label,value from tmp1) as tmp1join(select id,rd,label,value from tmp2lateral view explode(split('0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9',',')) mytable as rd) as tmp2on tmp1.rd = tmp2.rd and tmp1.label = tmp2.label) as tmp1group by rd,label
) as tmp1
group by label;
http://www.lryc.cn/news/217387.html

相关文章:

  • 【Flutter】Flutter 动画深入解析(1):掌握 AnimationController 的使用
  • 安装富文本组件
  • Tomcat下载地址(详细)
  • 领星ERP如何无需API开发轻松连接OA、电商、营销、CRM、用户运营、推广、客服等近千款系统
  • Django实战项目-学习任务系统-自定义URL拦截器
  • [已解决]该主机与 Cloudera Manager Server 失去联系的时间过长。 该主机未与 Host Monitor 建立联系。
  • 通过在Z平面放置零极点的来设计数字滤波器
  • linux环境docker部署nginx对生产日志按日切割并压缩处理
  • 【Spring Boot】发送邮件功能
  • ELK问题整理
  • 《黑客帝国:破解编程密码》——探索编程世界的奥秘
  • 【优选算法系列】【专题六模拟】第一节.1576. 替换所有的问号和495. 提莫攻击
  • 路由器基础(十二):IPSEC VPN配置
  • Python 获取cpu、内存利用率
  • Apache ECharts简介和相关操作
  • 怎么看待工信部牵头推动人形机器人发展
  • Hikari源码分析
  • 修改YOLOv5的模型结构
  • React 与 React Native 区别
  • Android 12.0 系统system模块开启禁用adb push和adb pull传输文件功能
  • 基于单片机的衣物消毒清洗机系统设计
  • 将 UniLinks 与 Flutter 集成(安卓 AppLinks + iOS UniversalLinks)
  • Spring-Spring 之底层架构核心概念解析
  • 电脑版WPS怎么将更新目录加到快速访问栏
  • 保障效率与可用,分析Kafka的消费者组与Rebalance机制
  • “1-5-15”原则:中国联通数字化监控平台可观测稳定性保障实践
  • LinkedList详解-Deque接口链表实现方案
  • 【考研数据结构代码题1】二叉搜索树的插入与查找
  • 世微 平均电流型降压恒流驱动器 电动摩托车LED灯小钢炮驱动IC AP5218
  • docker 下安装mysql8.0