当前位置: 首页 > news >正文

【PostgreSQL案例】我要查的表没有在执行计划中

问题:查的表没有在执行计划中

sql:

SELECT*
FROM(SELECTA.column1 as "column1",--中间省略很多A字段A.column99 as "column99"fromtable_a Aleft join (SELECTlzl_idfromtable_a AAinner join table_b BB ON AA.lzl_key = BB.lzl_idwhereAA.column_code = '1'GROUP BYlzl_id) B ON B.lzl_id = A.lzl_keywhereA.flagflagflag = '1'AND A.typetypetype = '2') TEMP
limit100
offset1000

执行计划:

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------Limit  (cost=2.84..5.68 rows=1 width=1105) (actual time=0.038..0.039 rows=0 loops=1)Buffers: shared hit=2->  Seq Scan on table_a a  (cost=0.00..2.84 rows=1 width=1105) (actual time=0.036..0.037 rows=0 loops=1)Filter: (((flagflagflag)::text = '1'::text) AND ((typetypetype)::text = '2'::text))Rows Removed by Filter: 38Buffers: shared hit=2Planning Time: 0.184 msExecution Time: 0.066 ms

可以看到,sql本身是比较复杂的,SQL的逻辑查了3次表,总共查了2张表。table_a 在执行计划中我可以理解,但是需要查的table_b根本没在执行计划里面!这个执行计划只不过是简单的全表扫描了table_a。

分析的心路历程

中间其实想过很多可能,不过最有可能的是逻辑优化了,也就是说pg优化器认为table_b不需要查。
观察sql发现sql最终只查询了table_a的字段,没有查table_b。此时任意增加一个中间表B的字段,sql执行计划看上去就“正常”了,访问了table_b

explain SELECT*
FROM(SELECTA.column1 as "column1",--中间省略很多A字段A.column99 as "column99",B.lzl_id --新增一个B中间表的字段fromtable_a Aleft join (SELECTlzl_idfromtable_a AAinner join table_b BB ON AA.lzl_key = BB.lzl_idwhereAA.column_code = '1'GROUP BYlzl_id) B ON B.lzl_id = A.lzl_keywhereA.flagflagflag = '1'AND A.typetypetype = '2') TEMP
limit100
offset1000
---------------------------------------------------------------------------------------------------------------------------------------------------------------Limit  (cost=14.69..17.67 rows=1 width=1113)->  Nested Loop Left Join  (cost=11.72..14.69 rows=1 width=1113)Join Filter: (bb.lzl_id = a.lzl_key)->  Seq Scan on table_a a  (cost=0.00..2.84 rows=1 width=1113)Filter: (((flagflagflag)::text = '1'::text) AND ((typetypetype)::text = '2'::text))->  Group  (cost=11.72..11.74 rows=5 width=8)Group Key: bb.lzl_id->  Sort  (cost=11.72..11.73 rows=5 width=8)Sort Key: bb.lzl_id->  Nested Loop  (cost=0.15..11.66 rows=5 width=8)->  Seq Scan on table_a aa  (cost=0.00..2.70 rows=1 width=8)Filter: ((company_code)::text = '1'::text)->  Index Only Scan using idx_table_b_lzl_id on table_b bb  (cost=0.15..8.83 rows=13 width=8)Index Cond: (lzl_id = aa.lzl_key)

这看上去跟left join有关系,但是简单想想又不对,因为右表的结果是会影响查询的最终结果的,不应该不去查右表。随便来个简单的left join,右表会被扫描

explain select lzlleft.a from lzlleft left join lzlright on lzlleft.a=lzlright.a;QUERY PLAN                             
--------------------------------------------------------------------Hash Left Join  (cost=1.04..15.47 rows=320 width=4)Hash Cond: (lzlleft.a = lzlright.a)->  Seq Scan on lzlleft  (cost=0.00..13.20 rows=320 width=4)->  Hash  (cost=1.02..1.02 rows=2 width=4)->  Seq Scan on lzlright  (cost=0.00..1.02 rows=2 width=4)

但是,在中间表B中,有个关键字GROUP BY。如果把GROUP BY去掉,那么无论有没有查询B的字段,都会访问table_b。
我们再在测试表中加个group by看看结果

> select * from lzlleft;a |  b  
---+-----1 | zzz
(1 row)Time: 0.259 ms
> select * from lzlright;a |   b   
---+-------1 | qwer1 | poiuy > select lzlright.b from lzlleft full join  lzlright on lzlleft.b=lzlright.b group by lzlright.b;b    
--------[null]poiuyqwer
(3 rows)

这里就意识到了group by出来的结果集一定有一个特性——唯一性
我们再在测表里加group by

explain select lzlleft.a from lzlleft left join  (select  a from  lzlright group by a) c on lzlleft.a=c.a;QUERY PLAN                        
----------------------------------------------------------
Seq Scan on lzlleft  (cost=0.00..13.20 rows=320 width=4)

右表不查了!
根据右表唯一性的原则,下面还可以有一些骚操作:

--distinct确保右表唯一
> explain select lzlleft.a from lzlleft left join  (select distinct a from  lzlright) c on lzlleft.a=c.a;QUERY PLAN                        
----------------------------------------------------------
Seq Scan on lzlleft  (cost=0.00..13.20 rows=320 width=4) 
--唯一索引确保右表唯一,哪怕是select  a from  lzlright
>  explain select lzlleft.a from lzlleft left join  (select  a from  lzlright) c on lzlleft.a=c.a;QUERY PLAN                               
-----------------------------------------------------------------------
Hash Left Join  (cost=17.20..49.12 rows=512 width=4)Hash Cond: (lzlleft.a = lzlright.a)->  Seq Scan on lzlleft  (cost=0.00..13.20 rows=320 width=4)->  Hash  (cost=13.20..13.20 rows=320 width=4)->  Seq Scan on lzlright  (cost=0.00..13.20 rows=320 width=4)
(5 rows)Time: 0.510 ms
> create unique index idx_right on lzlright(a);
CREATE INDEX
Time: 3.576 ms
> explain select lzlleft.a from lzlleft left join  (select  a from  lzlright) c on lzlleft.a=c.a;QUERY PLAN                        
----------------------------------------------------------
Seq Scan on lzlleft  (cost=0.00..13.20 rows=320 width=4)
(1 row)

到这里来个分析小结:只要右表的数据是唯一的且只查询左表数据时,不需要真的去访问右表 。所以这不是一个bug,而是PG优化器的特性,是符合逻辑的。

源码分析

本期没有源码分析~
优化器源码实在太难了,这里就找了下优化器源码的注释看了下。可以搜索关键字unique-ify,有这么一句话:

 * Also, this routine and others in this module accept the special JoinTypes* JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should* unique-ify the outer or inner relation and then apply a regular inner* join.  These values are not allowed to propagate outside this module,* however.  Path cost estimation code may need to recognize that it's* dealing with such a case --- the combination of nominal jointype INNER* with sjinfo->jointype == JOIN_SEMI indicates that. 

特殊的JoinTypes:JOIN_UNIQUE_INNER 和JOIN_UNIQUE_OUTER ,尝试把外表和内表连接唯一化后,成为inner join。Path代价估算需要考虑这种场景。

与oracle、mysql优化器的对比

对比看下oracle、mysql优化器有没有类似的逻辑优化提升

--oracle
create table lzlleft(a number);
create table lzlright(a number);select lzlleft.a from lzlleft left join  (select  distinct a from  lzlright) c on lzlleft.a=c.a;
--group by 唯一性
SQL>  select lzlleft.a from lzlleft left join  (select  a from  lzlright group by a) c on lzlleft.a=c.a; no rows selectedExecution Plan
----------------------------------------------------------
Plan hash value: 3533354041---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     1 |    26 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN OUTER     |          |     1 |    26 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | LZLLEFT  |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |   VIEW               |          |     1 |    13 |     3  (34)| 00:00:01 |
|   4 |    HASH GROUP BY     |          |     1 |    13 |     3  (34)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| LZLRIGHT |     1 |    13 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------1 - access("LZLLEFT"."A"="C"."A"(+))
--ditinct 唯一 
SQL>  select lzlleft.a from lzlleft left join  (select  distinct a from  lzlright) c on lzlleft.a=c.a;no rows selectedExecution Plan
----------------------------------------------------------
Plan hash value: 3859658234---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     1 |    26 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN OUTER     |          |     1 |    26 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | LZLLEFT  |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |   VIEW               |          |     1 |    13 |     3  (34)| 00:00:01 |
|   4 |    HASH UNIQUE       |          |     1 |    13 |     3  (34)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| LZLRIGHT |     1 |    13 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------1 - access("LZLLEFT"."A"="C"."A"(+))
--mysql 
create table lzlleft(a int primary key);
create table lzlright(a int primary key);
--group by唯一
explain select lzlleft.a from lzlleft left join  (select  a from  lzlright group by a) c on lzlleft.a=c.a; 
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref             | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
|  1 | PRIMARY     | lzlleft    | NULL       | index | NULL          | PRIMARY     | 4       | NULL            |    1 |   100.00 | Using index |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>   | <auto_key0> | 4       | lzldb.lzlleft.a |    2 |   100.00 | Using index |
|  2 | DERIVED     | lzlright   | NULL       | index | PRIMARY       | PRIMARY     | 4       | NULL            |    1 |   100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
--distinct唯一
explain select lzlleft.a from lzlleft left join  (select  distinct a from  lzlright) c on lzlleft.a=c.a;
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
| id | select_type | table      | partitions | type  | possible_keys | key         | key_len | ref             | rows | filtered | Extra       |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
|  1 | PRIMARY     | lzlleft    | NULL       | index | NULL          | PRIMARY     | 4       | NULL            |    1 |   100.00 | Using index |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>   | <auto_key0> | 4       | lzldb.lzlleft.a |    2 |   100.00 | Using index |
|  2 | DERIVED     | lzlright   | NULL       | index | PRIMARY       | PRIMARY     | 4       | NULL            |    1 |   100.00 | Using index |
+----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+

综上,oracle、mysql均不会对left join只查左表且右表唯一做优化,他们会访问右表。
pg优化器确实是有些东西的。

http://www.lryc.cn/news/408901.html

相关文章:

  • 《程序猿入职必会(5) · CURD 页面细节规范 》
  • 操作系统面试知识点总结5
  • BigInteger和BigDecimal类
  • 2024最新Uniapp的H5网页版添加谷歌授权验证
  • 学习java第一百四十四天
  • Meta 发布 Llama3.1,一站教你如何推理、微调、部署大模型
  • XSSFWorkbook 和 SXSSFWorkbook 的区别
  • 会议主题:NICE Seminar|神经组合优化方法的大规模泛化研究(南方科技大学王振坤副研究员)
  • 昇思25天学习打卡营第22天|CycleGAN图像风格迁移互换
  • 《Java初阶数据结构》----6.<优先级队列之PriorityQueue底层:堆>
  • Matrix Equation(高斯线性异或消元+bitset优化)
  • 【一图学技术】2.API测试9种方法图解
  • 力扣刷题----42. 接雨水
  • 【论文精读】 | 基于图表示的视频抑郁症识别的两阶段时间建模框架
  • 采集PCM,将base64片段转换为wav音频文件
  • eclipse ui bug
  • 前端获取blob文件格式的两种格式
  • 向日葵RCE复现(CNVD-2022-10270/CNVD-2022-03672)
  • Postman中的负载均衡测试:确保API的高可用性
  • anaconda+tensorflow+keras+jupyter notebook搭建过程(CPU版)
  • LitCTF2024赛后web复现
  • Elasticsearch:跨集群使用 ES|QL
  • 学习笔记4:docker和k8s选择简述
  • 关于锁策略
  • 昇思25天学习打卡营第3天|基础知识-数据集Dataset
  • C++11新特性——智能指针——参考bibi《 原子之音》的视频以及ChatGpt
  • “微软蓝屏”全球宕机,敲响基础软件自主可控警钟
  • 【Linux C | 网络编程】进程间传递文件描述符socketpair、sendmsg、recvmsg详解
  • 高并发内存池(六)Page Cache回收功能的实现
  • 浅析JWT原理及牛客出现过的相关面试题