13. Mysql 使用WITH进行复杂和递归查询

介绍Mysql中with的常见用法：复杂查询简单化，递归查询清晰化。

回忆あ节拍

3724人浏览 · 2023-12-13 08:30:00

回忆あ节拍 · 2023-12-13 08:30:00 发布

概述

WITH 语句，允许我们使用常用表达式（Common Table Expressions，CTE），CTE是一个临时命名的结果集，它可以在一个查询中被引用多次。使用WITH关键字，我们可以将一个复杂的查询分解成更小的、易于理解的部分，并将这些部分组合在一起以创建最终的查询结果。

使用场景

复杂查询：WITH语句可以用于构建复杂的查询逻辑，将多个子查询组合在一起，提高查询的可读性和维护性。
数据转换：通过WITH语句，可以在查询中创建临时表达式，并对其进行数据转换、筛选、聚合等操作，以满足特定的查询需求。
递归查询：WITH RECURSIVE语法可以用于执行递归查询，即在查询结果中引用自身，常用于处理树状结构或层级关系的数据。

WITH语句的注意事项

WITH语句定义的临时表达式只在当前查询中有效，不能在其他查询中引用。
WITH语句中的子查询可以引用之前定义的临时表达式，允许多个临时表达式之间的相互引用。
WITH语句中的临时表达式可以在后续查询中像普通表一样使用，可以进行联接、过滤、排序等操作。
WITH语句中的列名可以省略，此时将使用子查询的列名作为默认列名。
WITH语句在MySQL 8.0版本及以上才被支持，旧版本的MySQL不支持此语法。

基本语法

with_clause:
    with [recursive]
        cte_name [(col_name [, col_name] ...)] as (subquery)
        [, cte_name [(col_name [, col_name] ...)] as (subquery)] ...

非递归改进的派生表

# 使用派生表子查询
select max(txt) max_txt, min(txt) min_txt
from (select concat(cte2.txt, cte3.txt) as txt
      from (select concat(cte1.txt, 'is a ') as txt
            from (select 'this ' as txt) as cte1) as cte2,
           (select 'nice query' as txt
            union
            select 'query that rocks'
            union
            select 'query') as cte3) as cte4;
# 使用with非递归查询
with cte1 as (select 'this ' as txt),
     cte2 as (select concat(cte1.txt, 'is a') as txt from cte1),
     cte3 as (select 'nice query' as txt
              union
              select 'query that rocks' as txt
              union
              select 'query' as txt),
     cte4 as (select concat(cte2.txt, cte3.txt) as txt
              from cte2,
                   cte3)
select max(txt) max_txt, min(txt) min_txt
from cte4;
+---------------------------+---------------------+
| max_txt                   | min_txt             |
+---------------------------+---------------------+
| this is aquery that rocks | this is anice query |
+---------------------------+---------------------+

通过以上示例，可以发现使用with语法大大提高查询的可读性，更加简洁。

递归生成序列

递归生成序列的步骤：

定义根节点，初始值。
所谓递归迭代，是指每一次递归都要调用上一次查询的结果集，UNION ALL是指每次都把结果集并在一起。
定义递归终止条件。

# 生成1-6的序列
with recursive my_cte(n) as
                   (select 1  -- 初始值
                    union all
                    select 1 + n
                    from my_cte
                    where n < 6) -- 递归终止条件
select n
from my_cte;
+------+
| n    |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
+------+
# 生成连续日期
with recursive date_list(calendar_date) as (select '2023-12-01' calendar_date
                                            union all
                                            select date_add(calendar_date, interval 1 day) calendar_date
                                            from date_list
                                            where date_add(calendar_date, interval 1 day) <= '2023-12-10')
select calendar_date
from date_list;
+---------------+
| calendar_date |
+---------------+
| 2023-12-01    |
| 2023-12-02    |
| 2023-12-03    |
| 2023-12-04    |
| 2023-12-05    |
| 2023-12-06    |
| 2023-12-07    |
| 2023-12-08    |
| 2023-12-09    |
| 2023-12-10    |
+---------------+
# 斐波那契序列
with recursive my_cte as
                   (select 1 as f, 1 as next_f
                    union all
                    select next_f, f + next_f
                    from my_cte
                    where f < 500)
select f,next_f
from my_cte;
+------+--------+
| f    | next_f |
+------+--------+
|    1 |      1 |
|    1 |      2 |
|    2 |      3 |
|    3 |      5 |
|    5 |      8 |
|    8 |     13 |
|   13 |     21 |
|   21 |     34 |
|   34 |     55 |
|   55 |     89 |
|   89 |    144 |
|  144 |    233 |
|  233 |    377 |
|  377 |    610 |
|  610 |    987 |
+------+--------+

递归生成序列常常应用于生成完整的序列，在实际应用中，实际数据常常伴随着缺失，而使用序列可以更好地验证数据是否缺失情况。

建表、插入、更新和删除中应用

# 建表
create table numbers
with recursive my_cte(n) as
                   (select 1
                    union all
                    select 1 + n
                    from my_cte
                    where n < 6)
select n
from my_cte;
# 插入数据
insert into numbers
with recursive my_cte(n) as
                   (select 1
                    union all
                    select 1 + n
                    from my_cte
                    where n < 6)
select n
from my_cte;
# 更新数据
with recursive my_cte(n) as
                   (select 1
                    union all
                    select 1 + n
                    from my_cte
                    where n < 6)
update numbers, my_cte
set numbers.n=0
where numbers.n = my_cte.n * my_cte.n;
# 删除数据
with recursive my_cte(n) as
                   (select 1
                    union all
                    select 1 + n
                    from my_cte
                    where n < 6)
delete
from numbers
-- delete the numbers greater than the average of 1,...,6 (=3.5)
where numbers.n > (select avg(n) from my_cte);

delete
from numbers
where numbers.n >
      (with recursive my_cte(n) as
                          (select 1
                           union all
                           select 1 + n
                           from my_cte
                           where n < 6)
                      # half the average is 3.5/2=1.75
       select avg(n) / 2
       from my_cte);

with 关键字在许多场景下都可以用，可见非常灵活。

层次结构

常见的层次结构有：

首席执行官->副总裁->经理->员工
项目->子项目->子项目
父母->儿子->孙子
电子邮件或论坛线程（问题->回复->回复回复）
城镇->地区->州

层次结构递归步骤：

定义根节点，初始值，如parent is null 作为递归查询的起点。
所谓递归迭代，是指每一次递归都要调用上一次查询的结果集，UNION ALL是指每次都把结果集并在一起。
迭代公式利用上一次查询返回的结果集执行特定的查询，直到CTE返回NULL或达到最大的迭代次数。

数据准备：准备相关电子产品类别层级数据如下。

create table category
(
    category_id   int,
    category_name varchar(255),
    parent        varchar(255)
);
insert into category
select 1 as category_id, 'ELECTRONICS' as name, null as parent
union all
select 2 as category_id, 'TELEVISIONS' as name, 1 as parent
union all
select 3 as category_id, 'TUBE' as name, 2 as parent
union all
select 4 as category_id, 'LCD' as name, 2 as parent
union all
select 5 as category_id, 'PLASMA' as name, 2 as parent
union all
select 6 as category_id, 'PORTABLE ELECTRONICS' as name, 1 as parent
union all
select 7 as category_id, 'MP3 PLAYERS' as name, 6 as parent
union all
select 8 as category_id, 'FLASH' as name, 7 as parent
union all
select 9 as category_id, 'CD PLAYERS' as name, 6 as parent
union all
select 10 as category_id, '2 WAY RADIOS' as name, 6 as parent;

想要查询每个类别对应的父类、类别层级深度（总共有几层）、类别层级路径。

with recursive cte as (select category_id
                            , category_name
                            , parent
                            , category_name                  as parent_name     -- 查询每个类别的父类
                            , 0                              as depth           -- 查询类别层级深度
                            , cast(category_id as char(200)) as path            -- 查询类别层级路径
                       from category
                       where parent is null
                       union all
                       select c.category_id
                            , c.category_name
                            , c.parent
                            , cte.category_name                     as parent_name -- 查询每个类别的父类
                            , cte.depth + 1                         as depth   -- 查询类别层级深度
                            , concat(cte.path, '->', c.category_id) as path    -- 查询类别层级路径
                       from category as c
                                inner join cte
                                           on c.parent = cte.category_id)
select * from cte;
+-------------+----------------------+--------+----------------------+-------+------------+
| category_id | category_name        | parent | parent_name          | depth | path       |
+-------------+----------------------+--------+----------------------+-------+------------+
|           1 | ELECTRONICS          | NULL   | ELECTRONICS          |     0 | 1          |
|           2 | TELEVISIONS          | 1      | ELECTRONICS          |     1 | 1->2       |
|           6 | PORTABLE ELECTRONICS | 1      | ELECTRONICS          |     1 | 1->6       |
|           3 | TUBE                 | 2      | TELEVISIONS          |     2 | 1->2->3    |
|           4 | LCD                  | 2      | TELEVISIONS          |     2 | 1->2->4    |
|           5 | PLASMA               | 2      | TELEVISIONS          |     2 | 1->2->5    |
|           7 | MP3 PLAYERS          | 6      | PORTABLE ELECTRONICS |     2 | 1->6->7    |
|           9 | CD PLAYERS           | 6      | PORTABLE ELECTRONICS |     2 | 1->6->9    |
|          10 | 2 WAY RADIOS         | 6      | PORTABLE ELECTRONICS |     2 | 1->6->10   |
|           8 | FLASH                | 7      | MP3 PLAYERS          |     3 | 1->6->7->8 |
+-------------+----------------------+--------+----------------------+-------+------------+

循环避免

数据准备：

create table rockets
(origin char(20), destination char(20), trip_time int);
insert into rockets values
('earth', 'mars', 2),
('mars', 'jupiter', 3),
('jupiter', 'saturn', 4),
('saturn', 'earth', 9);

数据来看，从地球开始，我们添加火星，然后是木星，然后是土星，然后是地球（因为新的火箭），所以我们又回到了起点—>地球。然后我们永远添加火星…等，进入了循环，执行会报以下错。

[HY000][3636] Recursive query aborted after 1001 iterations. Try increasing @@cte_max_recursion_depth to a larger value.

我们可以通过一下方法来避免循环：

如果行星已经存在，就不要为结果添加行星。这种重复的消除是通过使用UNION DISTINCT而不是UNION ALL完成的。

with recursive all_destinations as
                   (select destination as planet
                    from rockets
                    where origin = 'earth'
                    union distinct
                    select r.destination
                    from rockets as r,
                         all_destinations as d
                    where r.origin = d.planet)
select *
from all_destinations;
+---------+
| planet  |
+---------+
| mars    |
| jupiter |
| saturn  |
| earth   |
+---------+

构建一个“路径”列（如深度/宽度），使用find_in_set(r.destination, d.path) = 0进行中断。我们不需要再使用DISTINCT，所以我们使用union all以避免（无用的）重复消除的开销。

with recursive all_destinations as
                   (select destination                    as planet,
                           trip_time                      as total_time,
                           cast(destination as char(500)) as path
                    from rockets
                    where origin = 'earth'
                    union all
                    select r.destination,
                           d.total_time + r.trip_time,
                           concat(d.path, ',', r.destination)
                    from rockets r,
                         all_destinations d
                    where r.origin = d.planet
                      and find_in_set(r.destination, d.path) = 0)
select * from all_destinations;
+---------+------------+---------------------------+
| planet  | total_time | path                      |
+---------+------------+---------------------------+
| mars    |          2 | mars                      |
| jupiter |          5 | mars,jupiter              |
| saturn  |          9 | mars,jupiter,saturn       |
| earth   |         18 | mars,jupiter,saturn,earth |
+---------+------------+---------------------------+
# 或者也可以通过以下方法过滤掉
with recursive all_destinations as
                   (select destination                    as planet,
                           trip_time                      as total_time,
                           cast(destination as char(500)) as path,
                           0                              as is_cycle
                    from rockets
                    where origin = 'earth'
                    union all
                    select r.destination,
                           d.total_time + r.trip_time,
                           concat(d.path, ',', r.destination),
                           find_in_set(r.destination, d.path) != 0
                    from rockets r,
                         all_destinations d
                    where r.origin = d.planet
                      and is_cycle = 0)
select * from all_destinations where is_cycle = 0;
+---------+------------+---------------------------+----------+
| planet  | total_time | path                      | is_cycle |
+---------+------------+---------------------------+----------+
| mars    |          2 | mars                      |        0 |
| jupiter |          5 | mars,jupiter              |        0 |
| saturn  |          9 | mars,jupiter,saturn       |        0 |
| earth   |         18 | mars,jupiter,saturn,earth |        0 |
+---------+------------+---------------------------+----------+

总结

WITH语句是MySQL中一种强大的查询语法，可以创建临时表达式并在后续查询中引用。它广泛应用于递归查询、复杂查询和数据转换等场景，提高了查询的灵活性和可读性。使用WITH语句时，需要注意其语法规则和限制，以确保正确使用和理解其功能。关于递归查询也经常在面试中考察，可以多动手实验一下，并深度理解它。