当前位置: 首页 > news >正文

Spark SQL 每年的1月1日算当年的第一个自然周, 给出日期,计算是本年的第几周

一、问题

按每年的1月1日算当年的第一个自然周
(遇到跨年也不管,如果1月1日是周三,那么到1月5号(周日)算是本年的第一个自然周, 如果按周一是一周的第一天)
计算是本年的第几周,那么 spark sql 如何写 ?

二、分析

难点 :

  1. Spark SQL 的 DAYOFWEEK 函数返回的每周第一天是周日。
  2. 边界值的处理,即第一周如何判定、第二周从哪天开始计算。

对应的伪代码

int day_of_week(int day) {if ( day == 7) {return 1;} else {return day + 1;}
}dayofyear = DAYOFYEAR(your_date_column)
if(dayofyear <= 7 - day_of_week(first_day_of_year_week_number) + 1) {return 1;
} else {return ceil( (dayofyear - 1) / 7.0);
}

先给出 sql 关键逻辑

CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7ELSE DAYOFWEEK(your_date_column) - 1
END AS day_of_week,CASE WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1
END AS first_day_of_year_week_number,to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year,// 上面的 sql 是内层CASE WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1ELSE CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1
END AS week_number,

多找一些边界值测试一下。

DAYOFWEEK(your_date_column)分别返回

周日		周一 	周二 	周三		周四		周五		周六
1		2		3		4		5		6		7

如果要让周一是第一天,那么需要调整偏移量

int day_of_week(int day) {if ( day == 7) {return 1;} else {return day + 1;}
}

调整后的函数逻辑

周一 	周二 	周三		周四		周五		周六		周日
1		2		3		4		5		6		7

sql 逻辑

 CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7ELSE DAYOFWEEK(your_date_column) - 1END AS day_of_week,

在这里插入图片描述
2023-01-01 年是周日,
那么 DAYOFWEEK(your_date_column) 返回的是 1,即本周第一天。
WEEKOFYEAR(your_date_column) 返回的是 52, 即 2022 年最后一周。
但实际上我们要求的结果应该是 2023 年的第一周。

2023-01-02 年是周一,
那么 DAYOFWEEK(your_date_column) 返回的是 2,即本周第二天。
WEEKOFYEAR(your_date_column) 返回的是 1, 即 2023 年第一周。
但实际上我们要求的结果应该是 2023 年的第二周。

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

三、验证


drop table your_table;CREATE TABLE your_table (id INT,your_date_column DATE
);CREATE OR REPLACE TEMPORARY VIEW temp_view AS 
SELECT 1 as id, to_date('2023-01-01', 'yyyy-MM-dd') as your_date_column
UNION ALL SELECT 2, to_date('2023-01-02', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-03', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-04', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-05', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-06', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-07', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-08', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-09', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-10', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-11', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-12', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-13', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-14', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-15', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-16', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-17', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-18', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-19', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-20', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-21', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-22', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-23', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-24', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-25', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-26', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-27', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-28', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-29', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-30', 'yyyy-MM-dd')
UNION ALL SELECT 2, to_date('2023-01-31', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-01', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-02', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-03', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-04', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-05', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-06', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-07', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-08', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-09', 'yyyy-MM-dd')
UNION ALL SELECT 3, to_date('2023-02-15', 'yyyy-MM-dd')
UNION ALL SELECT 4, to_date('2023-12-31', 'yyyy-MM-dd')
UNION ALL SELECT 5, to_date('2024-01-01', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-02', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-03', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-04', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-05', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-06', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-07', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-08', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-09', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-10', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-11', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-12', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-13', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-14', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-15', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-16', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-17', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-18', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-19', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-20', 'yyyy-MM-dd')
UNION ALL SELECT 6, to_date('2024-01-21', 'yyyy-MM-dd')
;INSERT INTO your_table
SELECT * FROM temp_view;SELECT your_date_column,DAYOFYEAR(your_date_column),8 - first_day_of_year_week_number,(DAYOFYEAR(your_date_column) - day_of_week ),(DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ,CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ),CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1,CASE WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1ELSE CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1END AS week_number, // 所求的结果*
FROM (SELECT'|',your_date_column,DAYOFWEEK(your_date_column),DAYOFYEAR(your_date_column),CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7ELSE DAYOFWEEK(your_date_column) - 1END AS day_of_week,CASE WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1END AS first_day_of_year_week_number, // 每年第一天是周几,如果是周一返回 1,周日返回 7to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year, // 每年第一天的日期date_format(your_date_column, 'EEEE') as WEEKFROMyour_table
);
2023-01-01	1	1	-6	-0.857143	0	1	1	|	2023-01-01	1	1	7	7	2023-01-01	Sunday
2023-01-02	2	1	1	0.142857	1	2	2	|	2023-01-02	2	2	1	7	2023-01-01	Monday
2023-01-03	3	1	1	0.142857	1	2	2	|	2023-01-03	3	3	2	7	2023-01-01	Tuesday
2023-01-04	4	1	1	0.142857	1	2	2	|	2023-01-04	4	4	3	7	2023-01-01	Wednesday
2023-01-05	5	1	1	0.142857	1	2	2	|	2023-01-05	5	5	4	7	2023-01-01	Thursday
2023-01-06	6	1	1	0.142857	1	2	2	|	2023-01-06	6	6	5	7	2023-01-01	Friday
2023-01-07	7	1	1	0.142857	1	2	2	|	2023-01-07	7	7	6	7	2023-01-01	Saturday
2023-01-08	8	1	1	0.142857	1	2	2	|	2023-01-08	1	8	7	7	2023-01-01	Sunday
2023-01-09	9	1	8	1.142857	2	3	3	|	2023-01-09	2	9	1	7	2023-01-01	Monday
2023-01-10	10	1	8	1.142857	2	3	3	|	2023-01-10	3	10	2	7	2023-01-01	Tuesday
2023-01-11	11	1	8	1.142857	2	3	3	|	2023-01-11	4	11	3	7	2023-01-01	Wednesday
2023-01-12	12	1	8	1.142857	2	3	3	|	2023-01-12	5	12	4	7	2023-01-01	Thursday
2023-01-13	13	1	8	1.142857	2	3	3	|	2023-01-13	6	13	5	7	2023-01-01	Friday
2023-01-14	14	1	8	1.142857	2	3	3	|	2023-01-14	7	14	6	7	2023-01-01	Saturday
2023-01-15	15	1	8	1.142857	2	3	3	|	2023-01-15	1	15	7	7	2023-01-01	Sunday
2023-01-16	16	1	15	2.142857	3	4	4	|	2023-01-16	2	16	1	7	2023-01-01	Monday
2023-01-17	17	1	15	2.142857	3	4	4	|	2023-01-17	3	17	2	7	2023-01-01	Tuesday
2023-01-18	18	1	15	2.142857	3	4	4	|	2023-01-18	4	18	3	7	2023-01-01	Wednesday
2023-01-19	19	1	15	2.142857	3	4	4	|	2023-01-19	5	19	4	7	2023-01-01	Thursday
2023-01-20	20	1	15	2.142857	3	4	4	|	2023-01-20	6	20	5	7	2023-01-01	Friday
2023-01-21	21	1	15	2.142857	3	4	4	|	2023-01-21	7	21	6	7	2023-01-01	Saturday
2023-01-22	22	1	15	2.142857	3	4	4	|	2023-01-22	1	22	7	7	2023-01-01	Sunday
2023-01-23	23	1	22	3.142857	4	5	5	|	2023-01-23	2	23	1	7	2023-01-01	Monday
2023-01-24	24	1	22	3.142857	4	5	5	|	2023-01-24	3	24	2	7	2023-01-01	Tuesday
2023-01-25	25	1	22	3.142857	4	5	5	|	2023-01-25	4	25	3	7	2023-01-01	Wednesday
2023-01-26	26	1	22	3.142857	4	5	5	|	2023-01-26	5	26	4	7	2023-01-01	Thursday
2023-01-27	27	1	22	3.142857	4	5	5	|	2023-01-27	6	27	5	7	2023-01-01	Friday
2023-01-28	28	1	22	3.142857	4	5	5	|	2023-01-28	7	28	6	7	2023-01-01	Saturday
2023-01-29	29	1	22	3.142857	4	5	5	|	2023-01-29	1	29	7	7	2023-01-01	Sunday
2023-01-30	30	1	29	4.142857	5	6	6	|	2023-01-30	2	30	1	7	2023-01-01	Monday
2023-01-31	31	1	29	4.142857	5	6	6	|	2023-01-31	3	31	2	7	2023-01-01	Tuesday
2023-02-01	32	1	29	4.142857	5	6	6	|	2023-02-01	4	32	3	7	2023-01-01	Wednesday
2023-02-02	33	1	29	4.142857	5	6	6	|	2023-02-02	5	33	4	7	2023-01-01	Thursday
2023-02-03	34	1	29	4.142857	5	6	6	|	2023-02-03	6	34	5	7	2023-01-01	Friday
2023-02-04	35	1	29	4.142857	5	6	6	|	2023-02-04	7	35	6	7	2023-01-01	Saturday
2023-02-05	36	1	29	4.142857	5	6	6	|	2023-02-05	1	36	7	7	2023-01-01	Sunday
2023-02-06	37	1	36	5.142857	6	7	7	|	2023-02-06	2	37	1	7	2023-01-01	Monday
2023-02-07	38	1	36	5.142857	6	7	7	|	2023-02-07	3	38	2	7	2023-01-01	Tuesday
2023-02-08	39	1	36	5.142857	6	7	7	|	2023-02-08	4	39	3	7	2023-01-01	Wednesday
2023-02-09	40	1	36	5.142857	6	7	7	|	2023-02-09	5	40	4	7	2023-01-01	Thursday
2023-02-15	46	1	43	6.142857	7	8	8	|	2023-02-15	4	46	3	7	2023-01-01	Wednesday
2023-12-31	365	1	358	51.142857	52	53	53	|	2023-12-31	1	365	7	7	2023-01-01	Sunday
2024-01-01	1	7	0	0.000000	0	1	1	|	2024-01-01	2	1	1	1	2024-01-01	Monday
2024-01-02	2	7	0	0.000000	0	1	1	|	2024-01-02	3	2	2	1	2024-01-01	Tuesday
2024-01-03	3	7	0	0.000000	0	1	1	|	2024-01-03	4	3	3	1	2024-01-01	Wednesday
2024-01-04	4	7	0	0.000000	0	1	1	|	2024-01-04	5	4	4	1	2024-01-01	Thursday
2024-01-05	5	7	0	0.000000	0	1	1	|	2024-01-05	6	5	5	1	2024-01-01	Friday
2024-01-06	6	7	0	0.000000	0	1	1	|	2024-01-06	7	6	6	1	2024-01-01	Saturday
2024-01-07	7	7	0	0.000000	0	1	1	|	2024-01-07	1	7	7	1	2024-01-01	Sunday
2024-01-08	8	7	7	1.000000	1	2	2	|	2024-01-08	2	8	1	1	2024-01-01	Monday
2024-01-09	9	7	7	1.000000	1	2	2	|	2024-01-09	3	9	2	1	2024-01-01	Tuesday
2024-01-10	10	7	7	1.000000	1	2	2	|	2024-01-10	4	10	3	1	2024-01-01	Wednesday
2024-01-11	11	7	7	1.000000	1	2	2	|	2024-01-11	5	11	4	1	2024-01-01	Thursday
2024-01-12	12	7	7	1.000000	1	2	2	|	2024-01-12	6	12	5	1	2024-01-01	Friday
2024-01-13	13	7	7	1.000000	1	2	2	|	2024-01-13	7	13	6	1	2024-01-01	Saturday
2024-01-14	14	7	7	1.000000	1	2	2	|	2024-01-14	1	14	7	1	2024-01-01	Sunday
2024-01-15	15	7	14	2.000000	2	3	3	|	2024-01-15	2	15	1	1	2024-01-01	Monday
2024-01-16	16	7	14	2.000000	2	3	3	|	2024-01-16	3	16	2	1	2024-01-01	Tuesday
2024-01-17	17	7	14	2.000000	2	3	3	|	2024-01-17	4	17	3	1	2024-01-01	Wednesday
2024-01-18	18	7	14	2.000000	2	3	3	|	2024-01-18	5	18	4	1	2024-01-01	Thursday
2024-01-19	19	7	14	2.000000	2	3	3	|	2024-01-19	6	19	5	1	2024-01-01	Friday
2024-01-20	20	7	14	2.000000	2	3	3	|	2024-01-20	7	20	6	1	2024-01-01	Saturday
2024-01-21	21	7	14	2.000000	2	3	3	|	2024-01-21	1	21	7	1	2024-01-01	Sunday
Time taken: 8.512 seconds, Fetched 63 row(s)在这个查询中:
date_format 函数的第二个参数 'EEEE' 指定返回完整的星期名称(如 Monday, Tuesday 等)。
DAYOFYEAR(your_date_column) 计算出年中的天数。
DAYOFWEEK(your_date_column) 返回一周中的某天(以周日为一周的第一天)。
// 直接求结果,整理后的 sql 表达式
SELECT your_date_column,CASE WHEN DAYOFYEAR(your_date_column) <= 8 - first_day_of_year_week_number THEN 1ELSE CEIL(  (DAYOFYEAR(your_date_column) - day_of_week ) / 7.0 ) + 1END AS week_number
FROM (SELECTyour_date_column,CASE WHEN DAYOFWEEK(your_date_column) = 1 THEN 7ELSE DAYOFWEEK(your_date_column) - 1END AS day_of_week,CASE WHEN DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) = 1 THEN 7ELSE DAYOFWEEK(to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd')) - 1END AS first_day_of_year_week_number,to_date(CONCAT( cast(YEAR(your_date_column) as string), '-01-01'), 'yyyy-MM-dd') as first_day_of_year,date_format(your_date_column, 'EEEE') as WEEKFROMyour_table
);2023-01-01	1
2023-01-02	2
2023-01-03	2
2023-01-04	2
2023-01-05	2
2023-01-06	2
2023-01-07	2
2023-01-08	2
2023-01-09	3
2023-01-10	3
2023-01-11	3
2023-01-12	3
2023-01-13	3
2023-01-14	3
2023-01-15	3
2023-01-16	4
2023-01-17	4
2023-01-18	4
2023-01-19	4
2023-01-20	4
2023-01-21	4
2023-01-22	4
2023-01-23	5
2023-01-24	5
2023-01-25	5
2023-01-26	5
2023-01-27	5
2023-01-28	5
2023-01-29	5
2023-01-30	6
2023-01-31	6
2023-02-01	6
2023-02-02	6
2023-02-03	6
2023-02-04	6
2023-02-05	6
2023-02-06	7
2023-02-07	7
2023-02-08	7
2023-02-09	7
2023-02-15	8
2023-12-31	53
2024-01-01	1
2024-01-02	1
2024-01-03	1
2024-01-04	1
2024-01-05	1
2024-01-06	1
2024-01-07	1
2024-01-08	2
2024-01-09	2
2024-01-10	2
2024-01-11	2
2024-01-12	2
2024-01-13	2
2024-01-14	2
2024-01-15	3
2024-01-16	3
2024-01-17	3
2024-01-18	3
2024-01-19	3
2024-01-20	3
2024-01-21	3
Time taken: 0.493 seconds, Fetched 63 row(s)
23/11/14 14:27:07 INFO SparkSQLCLIDriver: Time taken: 0.493 seconds, Fetched 63 row(s)
http://www.lryc.cn/news/229487.html

相关文章:

  • WebSocket Day04 : 消息推送
  • 【Hadoop】MapReduce详解
  • ctf之流量分析学习
  • Linux——vim简介、配置方案(附带超美观的配置方案)、常用模式的基本操作
  • 在线预览编辑PDF::RAD PDF for ASP.NET
  • 【赠书第4期】机器学习与人工智能实战:基于业务场景的工程应用
  • npm封装插件打包上传后图片资源错误
  • [云原生案例2.3 ] Kubernetes的部署安装 【多master集群架构高可用 ---- (二进制安装部署)】
  • 归并排序(含递归和非递归版)
  • 微服务的注册发现和微服务架构下的负载均衡
  • 从混沌到有序:sortedcontainers库的数据魔法改变你的编程体验
  • 读取pdf、docx、doc、ppt、pptx并转为txt
  • 11.13/14 理解SDK框架遇到的问题
  • 计算机网络——b站王道考研笔记
  • Stm32_标准库_18_串口蓝牙模块_手机与蓝牙模块通信_控制LED灯亮灭
  • 低代码与传统开发:综合比较
  • pyqt环境搭建
  • JavaScript数据类型和存储区别
  • Java学习笔记(七)——面向对象编程(中级)
  • 详细推导MOSFET的跨导、小信号模型、输出阻抗、本征增益
  • 循环2作业
  • 一个车厢号码识别算法(2005年的老程序----ccc)
  • 「Verilog学习笔记」优先编码器电路①
  • 解决企业项目管理难题:痛点分析与实用解决方案探索
  • Nginx 简介和安装
  • idea生成代码(一):实现java语言的增删改查功能(基于EasyCode插件)支持自定义模板【非常简单】
  • vue预览各种格式图片png jpg tif tiff dcm
  • 出入库管理系统vue2前端开发服务器地址配置
  • 民安智库(第三方满意度调研公司):助力奢侈品品牌提升客户满意度
  • 蓝牙特征值示例1-迈金L308自行车尾灯夜骑智能表情尾灯的