spark日期时间处理
spark中常用的日期时间函数格式转换
一、相关参考
官网参考链接:Spark SQL, Built-in Functions
时间日期格式符号: Datetime patterns - Spark 3.3.1 Documentation
标准日期格式符号: yyyy-MM-dd HH:mm:ss、yyyy-M-ddTHH:mm:ss
二、时区影响
- 时间相关的函数,大多会受时区的影响;
- 不特殊指定时区,会默认使用服务器的时区;
- spark中处理时间时,最好指定session级时区,并且最好指定为0时区,便于进行时区偏移;
set spark.sql.session.timeZone=GMT+8;
set spark.sql.session.timeZone=UTC;
三、时间函数
1. 受时区影响
1. current_date获取当前日期
select current_date() as cur_date;
2018-04-09
2. current_timestamp/now()获取当前时间
select current_timestamp() as cur_time;
2018-04-09 15:20:49.247
3. unix_timestamp/to_unix_timestamp从当前时区转换到unix时间
这两个函数从官方文档看是相同的;
从指定格式的时间转换到Unix时间
unix_timestamp([timeExp[, fmt]])
Returns the UNIX timestamp of current or specified time.
Arguments:
- timeExp - A date/timestamp or string. If not provided, this defaults to current time.
- fmt - Date/time format pattern to follow. Ignored if
timeExp
is not a string. Default value is "yyyy-MM-dd HH:mm:ss". See Datetime Patterns for valid date and time format patterns.Examples:
> SELECT unix_timestamp(); 1476884637 > SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd'); 1460041200
to_unix_timestamp(timeExp[, fmt])
Returns the UNIX timestamp of the given time.
Arguments:
- timeExp - A date/timestamp or string which is returned as a UNIX timestamp.
- fmt - Date/time format pattern to follow. Ignored if
timeExp
is not a string. Default value is "yyyy-MM-dd HH:mm:ss". See Datetime Patterns for valid date and time format patterns.Examples:
> SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd'); 1460098800
4. from_unixtime从unix时间转换到当前时区的时间戳
注意: 第一个参数是10位的时间戳
from_unixtime(unix_time[, fmt])
Returns
unix_time
in the specifiedfmt
.Arguments:
- unix_time - UNIX Timestamp to be converted to the provided format.
- fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns. The 'yyyy-MM-dd HH:mm:ss' pattern is used if omitted.
Examples:
> SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss'); 1970-01-01 00:00:00 > SELECT from_unixtime(0); 1970-01-01 00:00:00
5. cast从unix时间强转到时间戳(转到设定的时区)
select cast(1614815429 as timestamp);
2021-03-04 07:50:29
6. date_format日期格式转换
第一个参数必须是标准格式的时间
date_format(timestamp, fmt)
Converts
timestamp
to a value of string in the format specified by the date formatfmt
.Arguments:
- timestamp - A date/timestamp or string to be converted to the given format.
- fmt - Date/time format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
> SELECT date_format('2016-04-08', 'y'); 2016
1) 不受时区影响
SELECT date_format('2016-04-08 18', 'yyyy-MM-dd');
2016-04-09
2) 这种特定时间戳格式受时区影响
select date_format('2021-06-24T22:04:50+00:00', 'yyyy-MM-dd');
2021-06-25
2. 不受时区影响
1. 返回当前时间的unix时间戳
SELECT unix_timestamp();
1614237507
2. to_utc_timestamp时间戳从指定的时区转换到0时区
to_utc_timestamp(timestamp, timezone)
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.
Examples:
> SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul'); 2016-08-30 15:00:00
3. from_utc_timestamp从0时区转换到指定时区
from_utc_timestamp(timestamp, timezone)
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1' would yield '2017-07-14 03:40:00.0'.
Examples:
> SELECT from_utc_timestamp('2017-07-14 02:40:00.0', 'GMT+1'); 2017-07-14 03:40:00
4. to_date时间从指定格式转换到标准格式日期yyyy-MM-dd
to_date(date_str[, fmt])
Parses the
date_str
expression with thefmt
expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if thefmt
is omitted.Arguments:
- date_str - A string to be parsed to date.
- fmt - Date format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
> SELECT to_date('2009-07-30 04:17:52'); 2009-07-30 > SELECT to_date('2016-12-31', 'yyyy-MM-dd'); 2016-12-31 > select to_date('20161231', 'yyyyMMdd'); 2016-12-31
5. 日期的加减(最终转成日期格式)
SELECT date_add('2016-07-30 22:12:30', 1);
2016-07-31
6. to_timestamp从指定时间格式转换到标准格式
to_timestamp(timestamp_str[, fmt])
Parses the
timestamp_str
expression with thefmt
expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if thefmt
is omitted. The result data type is consistent with the value of configurationspark.sql.timestampType
.Arguments:
- timestamp_str - A string to be parsed to timestamp.
- fmt - Timestamp format pattern to follow. See Datetime Patterns for valid date and time format patterns.
Examples:
> select to_timestamp('2023/5/14 2:11:00', 'yyyy/M/d h:mm:ss'); 2023-05-14 02:11:00 > select to_timestamp('20200726', 'yyyyMMdd'); 2020-07-26 00:00:00
3. 客户端显示影响
前端显示date、time类型的时候,有可能收时区的影响,为了避免不必要的麻烦可存成字符串;
select
typeof( from_utc_timestamp(cast(cast(1697992306453 as bigint)/1000 as timestamp), 'GMT+00:00') ) as time_ori -- timestamp类型, 客户端展示时可能转到本地时区
,typeof( date_format(from_utc_timestamp(cast(cast(1697992306453 as bigint)/1000 as timestamp),'GMT+00:00'), 'yyyy-mm-dd hh:mm:ss') ) as time_utc -- 字符串类型, 不受时区影响
;
time_ori time_utc
timestamp string
四、获取时间差
1. 获取某个时段数据
限定时间最好要闭合, 否则回溯历史数据会出问题.
-- 1. sql 函数推算(日期必须是标准格式:yyyy-MM-dd)
select
*
from
tbl_name
where
dt >= date_format(date_sub('2020-07-26', 2),'yyyyMMdd')
and dt <= date_format('2020-07-26','yyyyMMdd')
;
-- 2. 调度直接传入
select
*
from
tbl_name
where
dt >= '20200726'
and dt <= '20200727'
;
2. 获取时间差
-- 指定时间格式,获取两个时间相差的秒数
select unix_timestamp('2023/5/14 9:09:05', 'yyyy/M/d h:mm:ss') - unix_timestamp('2023/5/14 9:09:00', 'yyyy/M/d h:mm:ss') as diff_seconds;
五、格式转换
指定时区:set spark.sql.session.timeZone=UTC;
不受时区影响函数:to_timestamp、
时区影响函数:from_unixtime、from_utc_timestamp
-- 1 从指定时间格式转换到标准格式
select to_timestamp('2023/5/14 2:11:00', 'yyyy/M/d h:mm:ss');
2023-05-14 02:11:00
select to_timestamp('20200726', 'yyyyMMdd');
2020-07-26 00:00:00
-- 2 从标准格式时间(yyyy-MM-dd 和 yyyy-MM-dd HH:mm:ss.SSSS)转到指定格式时间
select date_format('2016-04-08', 'yyyyMMdd');
20160408
-- 3 yyyyMMdd--->yyyy-MM-dd 时间格式转换
select to_date('20161231', 'yyyyMMdd');
2016-12-31
select concat_ws('-',substr('20200726',1,4),substr('20200726',5,2),substr('20200726',7,2));
2020-07-26
select date_format(to_timestamp('20200726', 'yyyyMMdd'), 'yyyy-MM-dd');
2020-07-26
-- 4 从unix时间戳(10位)转换到当前时区时间戳,从当前时区转换到指定时区
-- session时区是0时区,转换到东八区
> set spark.sql.session.timeZone=UTC;
> select from_utc_timestamp(from_unixtime(0),'GMT+8') as dt;
1970-01-01 08:00:00
日期格式符 - 格式符的数量影响显示格式,具体要什么样式可以实际测试;
Datetime patterns - Spark 3.5.1 Documentation
-- 季度的格式符号
> select date_format(date '2024-05-01', "q") as time;
2
> select date_format(date '2024-05-01', "qq") as time;
02
> select date_format(date '2024-05-01', "Q") as time;
2
> select date_format(date '2024-05-01', "QQ") as time;
02
> select date_format(date '2024-05-01', "QQQ") as time;
Q2
> select date_format(date '2024-05-01', "QQQQ") as time;
2nd quarter
-- 月份的格式符号
> select date_format(date '1970-01-01', "M");
1
> select date_format(date '1970-09-01', "MM");
09
> select date_format(date '1970-01-01', "MMM");
Jan
> select date_format(date '1970-01-01', "MMMM");
January
六、获取特定日期
-- 1 获取周一日期
SELECT date_format(date_trunc('WEEK', '2020-09-15'),'yyyy-MM-dd');
2020-09-14
-- 2 获取季初日期
SELECT date_format(date_trunc('QUARTER', '2020-09-15'),'yyyy-MM-dd');
2020-07-01
-- 3 获取年初日期
SELECT trunc('2020-09-15', 'YEAR');
2020-01-01
SELECT date_format(date_trunc('YEAR', '2020-09-15'),'yyyy-MM-dd');
2020-01-01
-- 4 获取月初日期
SELECT trunc('2020-09-15', 'MM');
2020-09-01
SELECT date_format(date_trunc('MM', '2020-09-15'),'yyyy-MM-dd');
2020-09-01
SELECT concat(substring('2020-09-15',1,8),'01');
2020-09-01
-- 5 获取月末日期
select last_day('2009-01-12');
2009-01-31
-- 6 获取昨天的日期
select date_format(date_sub(current_date, 1),'yyyMMdd');
-- 7 yyyyMMdd格式日期增加1天
select date_format( date_add( to_timestamp('20240410', 'yyyyMMdd'),1),'yyyyMMdd') as st;
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
所有评论(0)