Hive--开窗函数--窗口位置函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE
窗口函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE总结FIRST_VALUE功能:取每个分区内某列的第一个值语法:FIRST_VALUE(col) over (partition by col1 order by col2)LAST_VALUE功能:取每个分区内某列的最后一个值语法:LAST_VALUE(col) over (partition by col1 order b
·
窗口函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE
总结
-
不指定窗口时,不排序默认第一行到最后一行,排序默认第一行到当前行
-
指定窗口时 --rows between 起始位置 and 结束位置
- N preceding:往前多少行
- N following:往后多少行
- current row:当前行
- unbounded:起点或者终点,没有边界
- unbounded preceding 表示从前面的起点
- unbounded following:表示到后面的终点
-
FIRST_VALUE
- 功能:取每个分区内某列的第一个值
- 语法:FIRST_VALUE(col,true/false) over (partition by col1 order by col2)
- 第二个参数为true,跳过空值(默认为false)
-
LAST_VALUE
- 功能:取每个分区内某列的最后一个值
- 语法:LAST_VALUE(col,true/false) over (partition by col1 order by col2)
- 第二个参数为true,跳过空值(默认为false)
-
LAG
- 功能:取每个分区内某列的
前面
的第N个值
- 语法:LAG(col,N,defaultValue) over (partition by col1 order by col2)
- 功能:取每个分区内某列的
-
LEAD
- 功能:取每个分区内某列的
后面
的第N个值
- 语法:LEAD(col,N,defaultValue) over (partition by col1 order by col2)
- 功能:取每个分区内某列的
FIRST_VALUE
- 功能:取每个分区内某列的第一个值
- 语法:FIRST_VALUE(col) over (partition by col1 order by col2)
- 示例:取每个部门薪资最高的员工姓名
select
empno,
ename,
salary,
deptno,
FIRST_VALUE(ename) over (partition by deptno order by salary desc ) as first
from
db_emp.tb_emp;
+--------+---------+---------+---------+---------+--+
| empno | ename | salary | deptno | maxsal |
+--------+---------+---------+---------+---------+--+
| 7839 | KING | 5000.0 | 10 | 5000.0 |
| 7782 | CLARK | 2450.0 | 10 | 5000.0 |
| 7934 | MILLER | 1300.0 | 10 | 5000.0 |
| 7788 | SCOTT | 3000.0 | 20 | 3000.0 |
| 7902 | FORD | 3000.0 | 20 | 3000.0 |
| 7566 | JONES | 2975.0 | 20 | 3000.0 |
| 7876 | ADAMS | 1100.0 | 20 | 3000.0 |
| 7369 | SMITH | 800.0 | 20 | 3000.0 |
| 7698 | BLAKE | 2850.0 | 30 | 2850.0 |
| 7499 | ALLEN | 1600.0 | 30 | 2850.0 |
| 7844 | TURNER | 1500.0 | 30 | 2850.0 |
| 7654 | MARTIN | 1250.0 | 30 | 2850.0 |
| 7521 | WARD | 1250.0 | 30 | 2850.0 |
| 7900 | JAMES | 950.0 | 30 | 2850.0 |
+--------+---------+---------+---------+---------+--+
LAST_VALUE
- 功能:取每个分区内某列的最后一个值
- 语法:LAST_VALUE() over (partition by col1 order by col2)
- 注意:默认窗口是从第一条到当前条
- 示例:取每个部门薪资最低的员工编号
select
empno,
ename,
salary,
deptno,
LAST_VALUE(empno) over (partition by deptno order by salary desc) as last
from
db_emp.tb_emp;
--上面的语句实现不了,默认窗口是从每个分区的第一行到当前行
select
empno,
ename,
salary,
deptno,
LAST_VALUE(empno) over (partition by deptno order by salary desc rows between unbounded preceding and unbounded following) as last
from
db_emp.tb_emp;
+--------+---------+---------+---------+-------+--+
| empno | ename | salary | deptno | last |
+--------+---------+---------+---------+-------+--+
| 7839 | KING | 5000.0 | 10 | 7934 |
| 7782 | CLARK | 2450.0 | 10 | 7934 |
| 7934 | MILLER | 1300.0 | 10 | 7934 |
| 7788 | SCOTT | 3000.0 | 20 | 7369 |
| 7902 | FORD | 3000.0 | 20 | 7369 |
| 7566 | JONES | 2975.0 | 20 | 7369 |
| 7876 | ADAMS | 1100.0 | 20 | 7369 |
| 7369 | SMITH | 800.0 | 20 | 7369 |
| 7698 | BLAKE | 2850.0 | 30 | 7900 |
| 7499 | ALLEN | 1600.0 | 30 | 7900 |
| 7844 | TURNER | 1500.0 | 30 | 7900 |
| 7654 | MARTIN | 1250.0 | 30 | 7900 |
| 7521 | WARD | 1250.0 | 30 | 7900 |
| 7900 | JAMES | 950.0 | 30 | 7900 |
+--------+---------+---------+---------+-------+--+
LAG
- 功能:取每个分区内某列的前N个值
- 语法:LAG(col,N,defaultValue) over (partition by col1 order by col2)
- 示例:
select
empno,
ename,
salary,
deptno,
LAG(salary,1,0) over (partition by deptno order by salary) as lagvalue
from
db_emp.tb_emp;
+--------+---------+---------+---------+-----------+--+
| empno | ename | salary | deptno | lagvalue |
+--------+---------+---------+---------+-----------+--+
| 7934 | MILLER | 1300.0 | 10 | 0.0 |
| 7782 | CLARK | 2450.0 | 10 | 1300.0 |
| 7839 | KING | 5000.0 | 10 | 2450.0 |
| 7369 | SMITH | 800.0 | 20 | 0.0 |
| 7876 | ADAMS | 1100.0 | 20 | 800.0 |
| 7566 | JONES | 2975.0 | 20 | 1100.0 |
| 7788 | SCOTT | 3000.0 | 20 | 2975.0 |
| 7902 | FORD | 3000.0 | 20 | 3000.0 |
| 7900 | JAMES | 950.0 | 30 | 0.0 |
| 7654 | MARTIN | 1250.0 | 30 | 950.0 |
| 7521 | WARD | 1250.0 | 30 | 1250.0 |
| 7844 | TURNER | 1500.0 | 30 | 1250.0 |
| 7499 | ALLEN | 1600.0 | 30 | 1500.0 |
| 7698 | BLAKE | 2850.0 | 30 | 1600.0 |
+--------+---------+---------+---------+-----------+--+
LEAD
- 功能:取每个分区内某列的后N个值
- 语法:LEAD(col,N,defaultValue) over (partition by col1 order by col2)
- 示例:
select
empno,
ename,
salary,
deptno,
LEAD(salary,1,0) over (partition by deptno order by salary) as leadvalue
from
db_emp.tb_emp;
+--------+---------+---------+---------+------------+--+
| empno | ename | salary | deptno | leadvalue |
+--------+---------+---------+---------+------------+--+
| 7934 | MILLER | 1300.0 | 10 | 2450.0 |
| 7782 | CLARK | 2450.0 | 10 | 5000.0 |
| 7839 | KING | 5000.0 | 10 | 0.0 |
| 7369 | SMITH | 800.0 | 20 | 1100.0 |
| 7876 | ADAMS | 1100.0 | 20 | 2975.0 |
| 7566 | JONES | 2975.0 | 20 | 3000.0 |
| 7788 | SCOTT | 3000.0 | 20 | 3000.0 |
| 7902 | FORD | 3000.0 | 20 | 0.0 |
| 7900 | JAMES | 950.0 | 30 | 1250.0 |
| 7654 | MARTIN | 1250.0 | 30 | 1250.0 |
| 7521 | WARD | 1250.0 | 30 | 1500.0 |
| 7844 | TURNER | 1500.0 | 30 | 1600.0 |
| 7499 | ALLEN | 1600.0 | 30 | 2850.0 |
| 7698 | BLAKE | 2850.0 | 30 | 0.0 |
+--------+---------+---------+---------+------------+--+
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
已为社区贡献2条内容
所有评论(0)