【SQL优化案例】统计信息缺失
在 Oracle 数据库优化过程中,统计信息的准确性直接影响优化器的执行计划选择。一旦统计信息缺失或过时,可能导致优化器执行计划不准确,选错索引甚至走全表扫描,从而造成 SQL 性能严重劣化。本文结合两个实际生产案例,分享优化经验。
案例一:执行计划错误,导致全表扫描
问题 SQL
该SQL执行时间接近 470ms,逻辑读48,593块次
SELECT A.DATA_TYPEFROM (SELECT T.DATA_TYPEFROM APP_SERVLEVEL_BASE TWHERE (T.SERV_NO = :1 OR :2 IS NULL)AND (T.SUBSID = :3 OR :4 IS NULL)AND SYSDATE >= T.EFFECTIVE_TIMEAND SYSDATE <= T.FAILURETIMEORDER BY T.CREATEDATE DESC) AWHERE ROWNUM = 1;
执行计划显示走了 TABLE ACCESS FULL:
Plan hash value: 2522863837-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 10884 (100)| | | |
|* 1 | COUNT STOPKEY | | | | | | | |
| 2 | PARTITION RANGE ALL | | 14734 | 44202 | 10884 (1)| 00:00:01 | 8 | 1 |
| 3 | VIEW | | 14734 | 44202 | 10884 (1)| 00:00:01 | | |
|* 4 | SORT ORDER BY STOPKEY| | 14734 | 618K| 10884 (1)| 00:00:01 | | |
|* 5 | TABLE ACCESS FULL | APP_SERVLEVEL_BASE | 14734 | 618K| 10883 (1)| 00:00:01 | 8 | 1 |
-----------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------1 - filter(ROWNUM=1)4 - filter(ROWNUM=1)5 - filter(((:2 IS NULL OR "T"."SERV_NO"=TO_NUMBER(:1)) AND (:4 IS NULL OR <<<< 生产绑定执行计划未生效,存在隐式转化或者使用/*+ USE_CONCAT */ 强制优化器使用 UNION ALL 运算符将 OR 条件转换为复合查询。"T"."SUBSID"=TO_NUMBER(:3)) AND "T"."FAILURETIME">=SYSDATE@! AND "T"."EFFECTIVE_TIME"<=SYSDATE@!))29 rows selected.
全表扫描优化空间很大,先尝试找合适的索引,观察where
条件,T.SERV_NO = :1
,T.SUBSID = :3
,T.EFFECTIVE_TIME
,T.FAILURETIME
可以选择。
表观察发现表数据量约 610 万行,SERV_NO
和 SUBSID
的选择性都很好。
select count(*) from PANDA.APP_SERVLEVEL_BASE;COUNT(*)
----------6106131select * from (select SERV_NO,count(*) from PANDA.APP_SERVLEVEL_BASE group by SERV_NO order by count(*) desc) where rownum<=10;
select * from (select SUBSID,count(*) from PANDA.APP_SERVLEVEL_BASE group by SUBSID order by count(*) desc) where rownum<=10;
select * from (select FAILURETIME,count(*) from PANDA.APP_SERVLEVEL_BASE group by FAILURETIME order by count(*) desc) where rownum<=10;
select * from (select EFFECTIVE_TIME,count(*) from PANDA.APP_SERVLEVEL_BASE group by EFFECTIVE_TIME order by count(*) desc) where rownum<=10;select * from (select SERV_NO,SUBSID,count(*) from PANDA.APP_SERVLEVEL_BASE group by SERV_NO,SUBSID order by count(*) desc) where rownum<=10;SERV_NO COUNT(*)
---------- ----------
1.3484E+10 60
1.3633E+10 15
1.7833E+10 15
1.3703E+10 14
1.9833E+10 14
1.5134E+10 14
1.5133E+10 14
1.3784E+10 14
1.3603E+10 13
1.3663E+10 1310 rows selected.SQL> SUBSID COUNT(*)
---------- ----------
3.1621E+12 60
3.1408E+12 15
3.1421E+12 15
3.1408E+12 14
3.1620E+12 14
3.1620E+12 14
3.1608E+12 14
3.1421E+12 14
3.1408E+12 13
3.1408E+12 1310 rows selected.SQL>
FAILURETIME COUNT(*)
------------------- ----------
2025-02-01 00:00:00 1160002
2023-01-01 00:00:00 858503
2022-01-01 00:00:00 840766
2023-02-01 00:00:00 834958
2024-02-01 00:00:00 748715
2026-02-01 00:00:00 742798
2021-01-01 00:00:00 684236
2099-01-01 00:00:00 141319
2024-02-07 00:00:00 23261
2024-11-02 00:00:00 888410 rows selected.SQL>
EFFECTIVE_TIME COUNT(*)
------------------- ----------
2022-01-01 00:00:00 1693464
2021-01-01 00:00:00 840766
2023-02-01 00:00:00 699945
2020-01-01 00:00:00 588544
2024-01-29 00:00:00 389265
2025-01-26 00:00:00 364200
2024-02-01 00:00:00 349610
2025-01-24 00:00:00 283198
2020-01-15 00:00:00 94434
2024-05-24 00:00:00 8710010 rows selected.
查看表上索引信息,IDX_APP_SERVLEVEL_BASE
索引的前导列是SUBSID
,按理说执行计划应该选择这个才对,检查发现该表缺失统计信息,优化器无法正确估算谓词选择性,导致走了全表扫。
TABLE TABLE Index COLUMN Col
OWNER NAME Name UCPTDVS NAME Pos DESC
--------------- -------------------- ------------------------- ------- ---------------- ---- ----
PANDA APP_SERVLEVEL_BASE IDX_APP_SERVLEVEL_BASE NNYNNVO SUBSID 1 ASC <<<<<<<选择性不错NNYNNVO CITY 2 ASCNNYNNVO STATUS 3 ASCNNYNNVO EFFECTIVE_TIME 4 ASCNNYNNVO FAILURETIME 5 ASCIDX_APP_SERVLEVEL_BASE_REG NNYNNVO CITY 1 ASCNNYNNVO STATUS 2 ASC7 rows selected.
优化措施
- 收集统计信息
SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => '&TOWN' , tabname => '&TNAME' , cascade => true, estimate_percent => dbms_stats.auto_sample_size,method_opt=>'FOR TABLE FOR ALL COLUMNS SIZE REPEAT', degree => 8,no_invalidate=>false);
Enter value for town: PANDA
Enter value for tname: APP_SERVLEVEL_BASE
- 避免隐式转换
确保绑定变量和字段类型一致(例如 SERV_NO、SUBSID 均为 VARCHAR2)。
整个or
都在filter,
- OB的经验
在维护OB时,经常遇到带有子查询或者or的SQL执行计划不是很好,or可以尝试绑定hint /*+ USE_CONCAT */
,强制优化器使用 UNION ALL 运算符将 OR 条件转换为复合查询来优化,Oracle这里收集完统计信息恢复正常没做尝试。
收集统计信息后测试:执行计划走 INDEX RANGE SCAN + TABLE ACCESS BY INDEX ROWID
SQL> SELECT A.DATA_TYPE2 FROM (SELECT T.DATA_TYPE DATA_TYPE3 FROM APP_SERVLEVEL_BASE T4 WHERE (T.SERV_NO = '13785698097' OR '13785698097' IS NULL)5 AND (T.SUBSID = '3160806007929' OR '3160806007929' IS NULL)6 AND SYSDATE >= T.EFFECTIVE_TIME7 AND SYSDATE <=8 T.FAILURETIME9 ORDER BY T.CREATEDATE DESC) A10 WHERE ROWNUM = 1;Elapsed: 00:00:00.00Execution Plan
----------------------------------------------------------
Plan hash value: 1876015739------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 9 (12)| 00:00:01 | | |
|* 1 | COUNT STOPKEY | | | | | | | |
| 2 | PARTITION RANGE ALL | | 1 | 3 | 9 (12)| 00:00:01 | 7 | 1 |
| 3 | VIEW | | 1 | 3 | 9 (12)| 00:00:01 | | |
|* 4 | SORT ORDER BY STOPKEY | | 1 | 43 | 9 (12)| 00:00:01 | | |
|* 5 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| APP_SERVLEVEL_BASE | 1 | 43 | 8 (0)| 00:00:01 | 7 | 1 |
|* 6 | INDEX RANGE SCAN | IDX_APP_SERVLEVEL_BASE | 5 | | 6 (0)| 00:00:01 | 7 | 1 |
------------------------------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------1 - filter(ROWNUM=1)4 - filter(ROWNUM=1)5 - filter("T"."SERV_NO"=13785698097)6 - access("T"."SUBSID"=3160806007929 AND "T"."FAILURETIME">=SYSDATE@! AND "T"."EFFECTIVE_TIME"<=SYSDATE@!)filter("T"."FAILURETIME">=SYSDATE@! AND "T"."EFFECTIVE_TIME"<=SYSDATE@!)Statistics
----------------------------------------------------------0 recursive calls0 db block gets11 consistent gets0 physical reads0 redo size549 bytes sent via SQL*Net to client741 bytes received via SQL*Net from client2 SQL*Net roundtrips to/from client7 sorts (memory)0 sorts (disk)1 rows processed
优化后效果
执行时间从:470ms → 2.48ms,逻辑读:48,593 → 11,执行计划走 INDEX RANGE SCAN + TABLE ACCESS BY INDEX ROWID
生产执行时间(ms) 执行次数/0.5h 优化前逻辑读
优化前 470.00 391 48,593.21
优化后 2.48 11
举一反三
该表统计信息缺失是事实,优化前把全库可能涉及到执行计划错误的SQL全部捞出来彰显工作量,比较好资源的就是TABLE ACCESS FULL
,可以将该对象全表扫描的SQL捞出来
with sql_stat as(select sql_id,parsing_schema_name,module,sum(executions_delta) exec_total,sum(elapsed_time_delta) ela_totalfrom dba_hist_sqlstatwhere module NOT in ('PL/SQL Developer','plsqldev.exe')and parsing_schema_name not in ('SYSTEM', 'SYS', 'DBMT')AND exists (select snap_idfrom dba_hist_snapshotwhere END_INTERVAL_TIME >= sysdate - 14and INSTANCE_NUMBER = userenv('INSTANCE') )and INSTANCE_NUMBER = userenv('INSTANCE')group by sql_id, parsing_schema_name, moduleorder by module)
select module,ss.parsing_schema_name,ss.sql_id,sql_text,exec_total "执行次数",sp.object_owner,sp.object_name,round((ela_total / 1000000) /(decode(nvl(exec_total, 0), 0, 1, exec_total))) as "平均耗费时间"from sql_stat ss, dba_hist_sqltext sh, dba_hist_sql_plan spwhere ss.sql_id = sh.sql_idand sh.sql_id = sp.sql_id(+)and sp.operation || ' ' || sp.options = 'TABLE ACCESS FULL'and sp.object_name='APP_SERVLEVEL_BASE'--and round((ela_total / 1000000) / (decode(nvl(exec_total, 0), 0, 1, exec_total))) > 600order by 8 desc;-- PANDA1 : bjny9xmh9db49 cmu40rkkg859aMODULE PARSING_SCHEMA_NAME SQL_ID SQL_TEXT 执行次数 OBJECT_OWNER OBJECT_NAME 平均耗费时间
1 JDBC Thin Client PANDA cmu40rkkg859a <CLOB> 919 PANDA APP_SERVLEVEL_BASE 1
2 JDBC Thin Client PANDA bjny9xmh9db49 <CLOB> 433760 PANDA APP_SERVLEVEL_BASE 1-- PANDA2 : bjny9xmh9db49、cmu40rkkg859a、942z0ct51j104、bm77buvu3hjmf、7xscda3yvccykMODULE PARSING_SCHEMA_NAME SQL_ID SQL_TEXT 执行次数 OBJECT_OWNER OBJECT_NAME 平均耗费时间
1 JDBC Thin Client PANDA 942z0ct51j104 <CLOB> 814041 PANDA APP_SERVLEVEL_BASE 0
2 JDBC Thin Client PANDA bm77buvu3hjmf <CLOB> 4345 PANDA APP_SERVLEVEL_BASE 0
3 JDBC Thin Client PANDA cmu40rkkg859a <CLOB> 62104 PANDA APP_SERVLEVEL_BASE 0
4 JDBC Thin Client PANDA 7xscda3yvccyk <CLOB> 601580 PANDA APP_SERVLEVEL_BASE 0
5 JDBC Thin Client PANDA bjny9xmh9db49 <CLOB> 142323 PANDA APP_SERVLEVEL_BASE 0
最后找出该表类似sql走全表扫的有很多,其中 bjny9xmh9db49、cmu40rkkg859a、942z0ct51j104、bm77buvu3hjmf 四个问题一样,优化一个bjny9xmh9db49
,其他三个也跟着优化了,工作量也上来了。
5 - filter(((:2 IS NULL OR "T"."SERV_NO"=TO_NUMBER(:1)) AND (:4 IS NULL OR"T"."SUBSID"=TO_NUMBER(:3)) AND "T"."FAILURETIME">=SYSDATE@! AND "T"."EFFECTIVE_TIME"<=SYSDATE@!))
案例二:走错索引导致 CPU 占用过高
问题SQL
业务反馈 CPU 消耗过高的SQL,该SQL执行时间接近 220ms,逻辑读10,823块次,SQL 是一个 UPDATE:
UPDATE CUST_ENTITY_ATTRIBUTESET ATTR_VALUE = :1WHERE ENTITY_ID = :2AND ATTR_ID = :3;
这个比较简单,执行计划错误地选择了 INX_CUST_ENTITY_ATTR_ATTR_ID 索引(低选择性),导致
-- 优化前的执行计划
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gm2h56j2u0hs2, child number 0
-------------------------------------
UPDATE CUST_ENTITY_ATTRIBUTE SET ATTR_VALUE = :1 WHERE ENTITY_ID = :2
AND ATTR_ID = :3Plan hash value: 4187902683--------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | 1 (100)| |
| 1 | UPDATE | CUST_ENTITY_ATTRIBUTE | | | | |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| CUST_ENTITY_ATTRIBUTE | 91 | 28392 | 1 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | INX_CUST_ENTITY_ATTR_ATTR_ID | 3636 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - filter("ENTITY_ID"=TO_NUMBER(:2))3 - access("ATTR_ID"=:3)22 rows selected.
而最佳的访问路径应是 INX_CUST_ENTITY_ATTR_ENTITY_ID(ENTITY_ID 的选择性更高):
select * from (select ENTITY_ID,count(*) from PANDA.CUST_ENTITY_ATTRIBUTE group by ENTITY_ID order by count(*) desc) where rownum<=10;
select * from (select ATTR_ID,count(*) from PANDA.CUST_ENTITY_ATTRIBUTE group by ATTR_ID order by count(*) desc) where rownum<=10;ENTITY_ID COUNT(*)
---------- ----------
8.9182E+13 12
8.9166E+13 10
8.9193E+13 10
8.9193E+13 10
8.9192E+13 10
8.9192E+13 10
8.9192E+13 10
8.9192E+13 10
8.9187E+13 10
8.9187E+13 1010 rows selected.SQL>
ATTR_ID COUNT(*)
-------------------------------- ----------
isValuableCust 391535
customerTreePCode 317687
parentNodePCode 317687
industryCustType 267461
InSyncNumber 101377
strategicCustTypeFL 71911
strategicCustTypeSL 70831
custGroupFeature 63472
custTreeHierarchy 48042
InFileName 385510 rows selected.select count(*) from PANDA.CUST_ENTITY_ATTRIBUTE;COUNT(*)
----------16538601 row selected.
优化措施
- 收集统计信息:
EXEC DBMS_STATS.GATHER_TABLE_STATS(ownname => 'PANDA',tabname => 'CUST_ENTITY_ATTRIBUTE',method_opt => 'FOR ALL COLUMNS SIZE 1',estimate_percent => 20,degree => 15,cascade => TRUE,no_invalidate => FALSE
);
--------- 0808生产核实
-- 收集统计信息后走对了索引
col owner format a10
col table_name format a20
col partition_name format a20
col stale_stats format a5
col last_analyzed format a16SELECT owner,table_name,partition_name,stale_stats,last_analyzed
FROM dba_tab_statistics
WHERE table_name = UPPER('CUST_ENTITY_ATTRIBUTE')AND owner = 'PANDA';
OWNER TABLE_NAME PARTITION_NAME STALE LAST_ANALYZED
---------- -------------------- -------------------- ----- ----------------
PANDA CUST_ENTITY_ATTRIBUTE NO 2025-08-07 23:57:30
执行计划也走了对的索引
SQL> @xi gm2h56j2u0hs2 %
eXplain the execution plan for sqlid gm2h56j2u0hs2 child %...PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID gm2h56j2u0hs2, child number 0
-------------------------------------
UPDATE CUST_ENTITY_ATTRIBUTE SET ATTR_VALUE = :1 WHERE ENTITY_ID = :2
AND ATTR_ID = :3Plan hash value: 4187902683--------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)|
--------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | 1 (100)|
| 1 | UPDATE | CUST_ENTITY_ATTRIBUTE | | |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| CUST_ENTITY_ATTRIBUTE | 91 | 1 (0)|
|* 3 | INDEX RANGE SCAN | INX_CUST_ENTITY_ATTR_ATTR_ID | 3636 | 1 (0)|
--------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - filter("ENTITY_ID"=TO_NUMBER(:2))3 - access("ATTR_ID"=:3)Note
------ Warning: basic plan statistics not available. These are only collected when:* hint 'gather_plan_statistics' is used for the statement or* parameter 'statistics_level' is set to 'ALL', at session or system levelSQL_ID gm2h56j2u0hs2, child number 1
-------------------------------------
UPDATE CUST_ENTITY_ATTRIBUTE SET ATTR_VALUE = :1 WHERE ENTITY_ID = :2
AND ATTR_ID = :3Plan hash value: 3441046480----------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | Cost (%CPU)|PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | 2 (100)|
| 1 | UPDATE | CUST_ENTITY_ATTRIBUTE | | |
|* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| CUST_ENTITY_ATTRIBUTE | 1 | 2 (0)|
|* 3 | INDEX RANGE SCAN | INX_CUST_ENTITY_ATTR_ENTITY_ID | 4 | 1 (0)|
----------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------2 - filter("ATTR_ID"=:3)3 - access("ENTITY_ID"=TO_NUMBER(:2))Note
------ Warning: basic plan statistics not available. These are only collected when:* hint 'gather_plan_statistics' is used for the statement or* parameter 'statistics_level' is set to 'ALL', at session or system level56 rows selected.
优化后效果
- 执行时间:220ms → 0.26ms
- 逻辑读:10,823 → 6
- CPU 消耗降低 99.9%
Please enter the number of days before now for the BEGIN time
You can also enter n/24 for n hours before. Leave blank for 10 day before.
2FLAG SNAP_ID INST_ID PHV EXECS READS READS_PER GETS GETS_PER ROWS_PROCESSED ROWS_PER ELAP_MS ELAP_PER_MS
--------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------- ---------- ---------- -----------
GV$SQL 0 1 3441046480 1780 110 .061797753 12050 6.76966292 1773 .996067416 460.841 .258899438
GV$SQL 0 1 4187902683 7724 127763 16.5410409 83643264 10829.0088 7660 .991714138 796951.424 103.178589
附录:如何快速定位统计信息缺失问题(查询脚本)
select distinct (d.segment_name),d.owner,sum(d.bytes / 1024 / 1024 ) as MB,sum(d.blocks * 8192 / 1024 / 1024) as block,max(d.last_analyzed) as last_analyzed,d.stale_stats as "统计信息是否过期",'exec dbms_stats.gather_table_stats(''' || d.owner || ''', ''' || d.segment_name || ''', cascade => true, estimate_percent => dbms_stats.auto_sample_size,method_opt=>''FOR TABLE FOR ALL COLUMNS SIZE REPEAT'', degree => 8,no_invalidate=>false); ' as "收集脚本"from (select b.owner,b.segment_name,b.segment_type,b.bytes,c.BLOCKS,c.LAST_ANALYZED,c.STALE_STATSfrom (select a.owner,a.segment_name,a.partition_name,a.segment_type,a.bytesFROM dba_segments aWHERE a.segment_type IN ('TABLE','TABLE PARTITION','TABLE SUBPARTITION')AND a.owner NOT IN ('SYS','SYSTEM','SYSMAN','SCOTT','HR','PANDA')AND a.owner NOT LIKE '%HW%'AND a.segment_name NOT LIKE 'BIN%') b,dba_tab_statistics cwhere b.owner = c.ownerand b.segment_name = c.TABLE_NAMEand b.partition_name = c.PARTITION_NAMEand (c.LAST_ANALYZED IS NULL OR c.STALE_STATS = 'YES')) dgroup by d.owner, d.segment_name, d.stale_stats;