OceanBase 4.3.5 解析:DDL性能诊断
背景
DDL操作通常耗时较长,特别是涉及补数据流程的DDL语句。在执行过程中,用户面临两个主要痛点:一是无法实时获取DDL执行进度,难以区分长时间运行是正常现象还是由内部异常导致的停滞;二是执行效率经常低于预期,而传统的问题诊断需要跨越多张系统表和视图,操作流程复杂。为此,OceanBase 的新版本通过提供可查询的系统视图来简化DDL性能问题排查。
使用说明
SESSION_LONGOPS视图输出行为变更
变更了 SESSION_LONGOPS视图中 DDL 处于REDEFINITION阶段时 message列的输出逻辑 。其他保持和原有的一致。
DDL 处于REDEFINITION阶段即通过发生innsql完成补数据流程图。通过sql的算子实现。整个过程主要分为 scan,sort, insert三个阶段。在以前版本的行为中,无论具体执行到了哪一个阶段,都会返回这三个阶段处理的行数。如下所示
// 不使用column group
+----------------------------------------------------------------------------------------------------------------------------+
| MESSAGE |
+----------------------------------------------------------------------------------------------------------------------------+
| TENANT_ID: %d, TASK_ID: %d, STATUS: REPLICA BUILD, PARALLELISM: %ld, ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld |
+----------------------------------------------------------------------------------------------------------------------------+// 使用 column group
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MESSAGE |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TENANT_ID: %d, TASK_ID: %d, STATUS: REPLICA BUILD, PARALLELISM: %ld, ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED_INTO_TMP_FILE: %ld, ROW_INSERTED: %ld out of %ld column group rows|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
状态定义
在新的方案中,我们统计了更加详细的信息,但是一次展示过于复杂。因此分阶段展示信息。考虑到多线程场景,不同线程执行速度可能不一致,因此考虑展示最慢线程所处阶段的信息。阶段定义如下。同时,无论处于哪个阶段,还是会输出总扫描行数,总排序行数,总插入行数这些整体信息。
enum class RedefinitionState
{BEFORESCAN = 0,SCAN,SORT-PHASE1,SORT-PHASE2,INSERT
};
排序阶段分为 内存排序和外部排序,两个需要分别展示,但是为了方便用户理解,不暴露实现细节,使用 SORT-PHASE1 和 SORT-PHASE2 分别代表两个阶段。
BEFORESCAN
扫描开始前的阶段,可能位于采样等环节。
"TENANT_ID: %d, TASK_ID:%d STATUS: REPLICA BUILD, BEFORE-SCAN"
SCAN
扫描阶段。因为还无法获取总行数,所以扫描进度和预计用时无法计算,只能输出已经扫描的总行数。 同时还会输出扫描的并行度和 扫描已经花费的时间,另外还会输出所有线程中扫描最多和最少的行数
TENANT_ID: %d, TASK_ID:%d STATUS: REPLICA BUILD, SCANNING, PARALLELISM: %ld,
ROW_ROUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SCAN_INFO:{ SCAN_TIME_ELAPSED: %.3fs, MAX_THREAD_ROW_SCANNED: %ld, MIN_THREAD_ROW_SCANNED: %ld }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SCANNING, PARALLELISM: 2,
ROW_COUNT_INFO:{ ROW_SCANNED: 30798887, ROW_SORTED: 0, ROW_INSERTED: 0 },
SCAN_INFO:{ SCAN_TIME_ELAPSED: 86.529s, MAX_THREAD_ROW_SCANNED: 15428902, MIN_THREAD_ROW_SCANNED: 15369985 }第一行是整个ddl_task的整体信息,执行的租户id为1004,task_id为 1000817,处于扫描阶段,扫描并行度为2第二行是扫描行数,总排序行数,总插入行数信息。第三行是为扫描阶段具体信息,当前扫描已花费时间为 86.529s ,最多的线程扫了 15428902 行, 最少的线程扫了 15369985 行
SORT-PHASE1
内存排序阶段。从此阶段开始,不同线程之间进度独立。输出的阶段和进度按照最慢统计。 为了不产生误导PARALLELISM仍然展示所有的工作线程数(inmem_sort+merge_sort+insert),另外单独输出位于该阶段的线程数。
输出格式为
TENANT_ID: 1, TASK_ID: 3051020, STATUS: REPLICA BUILD, SORT_PHASE1, PARALLELISM: %ld, SORT_PHASE1_THREAD_NUM: %ld,
ROW_COUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SORT_PHASE1_PROGRESS_INFO:{ SORT_PHASE1_TIME_ELAPSED: %.3fs, SORT_PHASE1_PROGRESS: %.2f%%, SORT_PHASE1_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, SORTED_ROW_COUNT: %ld }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SORT_PHASE1, PARALLELISM: 2, SORT_PHASE1_THREAD_NUM: 2,
ROW_COUNT_INFO:{ ROW_SCANNED: 32000000, ROW_SORTED: 30917952, ROW_INSERTED: 0 },
SORT_PHASE1_PROGRESS_INFO:{ SORT_PHASE1_TIME_ELAPSED: 137.501s, SORT_PHASE1_PROGRESS: 96.56%, SORT_PHASE1_TIME_REMAINING: 4.892s },
SLOWEST_THREAD_INFO:{ THREAD_ID: 2807314, SORTED_ROW_COUNT: 15458976 }第一行是整个ddl_task的整体信息,执行的租户id为1004,task_id为 1000817,
当前阶段为REPLICA BUILD的INMEM_SORT阶段。当前并行度为2第二行总扫描行数,总排序行数,总插入行数信息的整体信息,第三行为阶段的时间和进度信息,当前阶段已进行137.501s, 当前阶段进度为96.56%,预计还需要4.892s完成。第四行是最慢线程信息, 最慢线程的线程id为 2807314, 最慢线程处理了 15458976 行,
SORT-PHASE2
外部排序阶段,前面和内存排序一样, 但是因为涉及到落盘操作,会额外输出 落盘io量和临时文件压缩类型等信息
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SORT_PHASE2, PARALLELISM: %ld, SORT_PHASE2_THREAD_NUM: %ld,
ROW_COUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SORT_PHASE2_PROGRESS_INFO:{ SORT_PHASE2_TIME_ELAPSED: %.3fs, SORT_PHASE2_PROGRESS: %.2f%%, SORT_PHASE2_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, SORTRD_ROW_COUNT: %ld },
TEMP_FILE_INFO:{ DUMP_SIZE: %ld, COMPRESS_TYPE: %s }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, SORT_PHASE2, PARALLELISM: 2, SORT_PHASE2_THREAD_NUM: 1,
ROW_COUNT_INFO:{ ROW_SCANNED: 32000000, ROW_SORTED: 121040340, ROW_INSERTED: 44300 },
SORT_PHASE2_PROGRESS_INFO:{ SORT_PHASE2_TIME_ELAPSED: 35.277s, SORT_PHASE2_PROGRESS: 85.49%, SORT_PHASE2_TIME_REMAINING: 5.986s },
SLOWEST_THREAD_INFO:{ THREAD_ID: 2807315, SORTRD_ROW_COUNT: 41013268 },
TEMP_FILE_INFO:{ DUMP_SIZE: 366975583, COMPRESS_TYPE: zstd_1.3.8 }第一行是整个ddl_task的整体信息,执行的租户id为1004 ,task_id为 1000817,
当前阶段为REPLICA BUILD的SORT-PHASE2阶段。当前并行度为2,但是只有一个线程处于排序第二阶段,说明其完成排序进入插入阶段。第二行是总扫描行数,总排序行数,总插入行数信息。第三行是当前阶段时间和进度信息。当前阶段已进行35.277s,当前阶段进度为85.49%,预计剩余5.986s完成。第四行是最慢线程信息,最慢线程的线程id为 2807315, 处理了 41013268行第五行是临时文件存储有关信息,当前总落盘量为 366975583, 临时文件采用的压缩算法为 zstd_1.3.8
insert
插入阶段,对于column group > 1的表,与原版本保持一致。对于column group为1的表,新行为和sort相似
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, INSERT, PARALLELISM: %ld, INSERT_THREAD: %ld,
ROW_COUNT_INFO:{ ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
INSERT_PROGRESS_INFO:{ INSERT_TIME_ELAPSED: %.3fs, INSERT_PROGRESS: %.2f%%, INSERT_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, INSERTED_ROW_COUNT: %ld }
示例
TENANT_ID: 1004, TASK_ID: 1000817, STATUS: REPLICA BUILD, INSERT, PARALLELISM: 2, INSERT_THREAD: 2,
ROW_COUNT_INFO:{ ROW_SCANNED: 32000000, ROW_SORTED: 128000000, ROW_INSERTED: 28806480 },
INSERT_PROGRESS_INFO:{ INSERT_TIME_ELAPSED: 121.941s, INSERT_PROGRESS: 86.61%, INSERT_TIME_REMAINING: 18.845s },
SLOWEST_THREAD_INFO:{ THREAD_ID: 2807315, INSERTED_ROW_COUNT: 13850544 }第一行是整体信息, 租户ID 1004, task_id 1000817,当前阶段为REPLICA BUILD的insert阶段。当前并行度为2第二行是总扫描行数,总排序行数,总插入行数信息。第三行是当前阶段时间和进度信息。当前阶段已进行121.941s,插入进度为57.87%,预计 还有16.725s能完成插入阶段。第四行是最慢线程信息。最慢线程的线程id为2236847, 处理了3188860行
局部索引的特殊处理
局部索引是存在优化,可能会串行分批发送inner_sql。在这种情况下,为了不产生混淆。针对局部索引。会在输出tenant_id后task_id后添加build local index batch num: %ld的信息。
//示例
TENANT_ID: 1, TASK_ID: 3051020, build local index batch num: 2, STATUS: REPLICA BUILD, SORT_PHASE2, PARALLELISM: %ld, SORT_PHASE2_THREAD_NUM: %ld,
ROW_COUNT_INFO:{ROW_SCANNED: %ld, ROW_SORTED: %ld, ROW_INSERTED: %ld },
SORT_PHASE2_PROGRESS_INFO:{ SORT_PHASE2_TIME_ELAPSED: %.3fs, SORT_PHASE2_PROGRESS: %.2f%%, SORT_PHASE2_TIME_REMAINING: %.3fs },
SLOWEST_THREAD_INFO:{ THREAD_ID: %ld, SORTRD_ROW_COUNT: %ld },
TEMP_FILE_INFO:{ DUMP_SIZE: %ld, COMPRESS_TYPE: %s }build local index batch num: 2 表示目前是局部索引的第二批执行。其他语义不变。
新增 __all_virtual_ddl_diagnose_info 虚拟表
各字段说明
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
tenant_id | bigint(20) | NO | PK | 租户ID | |
ddl_task_id | bigint(20) | NO | PK | 任务ID | |
object_table_id | bigint(20) | NO | PK | 目标表ID | |
op_type | varchar(64) | NO | PK | DDL操作类型 | |
create_time | timestamp | NO | PK | DDL开始时间 | |
finish_time | timestamp | NULL | DDL结束时间 | ||
diagnose_info | varchar(1024) | NULL | DDL执行诊断信息 |
目前虚拟表会输出正在进行的DDL和最近完成的100条DDL的信息。因为只能对走sql 路径的DDL进行诊断。类型如下
DDL_CREATE_INDEX
DDL_CREATE_PARTITIONED_LOCAL_INDEX
DDL_MODIFY_COLUMN
DDL_ADD_PRIMARY_KEY
DDL_ALTER_PRIMARY_KEY
DDL_ALTER_PARTITION_BY
DDL_TABLE_REDEFINITION
诊断信息输出
还未开始扫描的DDL
对于正在进行的DDL, 在扫描开始前无统计信息,此时message为空。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:21:13.619570 | 2024-12-30 19:21:14.297205 | |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
还未开始排序的DDL
在扫描阶段无法感知异常信息。在这个阶段因此仅输出 “Scanning” 信息。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:21:13.619570 | 2024-12-30 19:21:16.706037 | Scanning |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------+
空算子DDL
没有找到对应的sql_monitor_node或者算子总行数为0的情况。此时不进行诊断。不会出现在虚拟表中。
开始排序的DDL
THREAD_INFO: { parallel_num : %ld, row_max: %ld, row_max_thread_id: %ld, row_min: %ld, row_min_thread_id: %ld }
展示并行度,还有行数最多和最少的线程信息。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 15:35:37.995584 | 2024-12-30 19:21:16.706037 | THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025919, row_min: 6243328, row_min_thread_id: 3025940 } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------+
执行完成的DDL
对于已经执行成功的DDL, 会输出以下信息
THREAD_INFO: { parallel_num : %ld, row_max: %ld, row_max_thread_id: %ld, row_min: %ld, row_min_thread_id: %ld slowest_thread_id: %ld },
TIME_INFO: { scan_time: %.3fs, sort_time: %.3fs, insert_time: %.3fs }
除了 并行度, 还有行数最多和最少的线程信息,另外还有执行最慢的线程id。
另外,还统计了各个阶段花费的时间信息。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2069349 | 500907 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:16:28.035185 | 2024-12-30 19:19:39.831287 | THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025937, row_min: 6243328, row_min_thread_id: 3025940 slowest_thread_id: 3025936 }, TIME_INFO: { scan_time: 50.350s, sort_time: 70.715s, insert_time: 61.291s } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
发生过重试的DDL
发生重试的DDL ,会在最前面输出尝试次数信息
正在进行的DDL输出 当前次数,已经完成的DDL输出最高次数
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2069349 | 500907 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 19:16:28.035185 | 2024-12-30 19:19:39.831287 | try count: 2, THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025937, row_min: 6243328, row_min_thread_id: 3025940 slowest_thread_id: 3025936 }, TIME_INFO: { scan_time: 50.350s, sort_time: 70.715s, insert_time: 61.291s } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
异常诊断信息输出
分析诊断结果,如果存在异常场景,会输出诊断提醒。
数据倾斜:
当行数最多的线程和最低的线程相差大于 100%时,会认为存在数据倾斜,会在上面诊断信息的基础上添加以下信息。
The data skew is significant, with a low sampling rate or uneven sampling.
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 4000296 | 501821 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-31 09:47:39.521994 | 2024-12-31 09:50:25.560664 | THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3913304, row_min: 6243328, row_min_thread_id: 3913307 slowest_thread_id: 3913304 }, TIME_INFO: { scan_time: 26.608s, sort_time: 68.728s, insert_time: 60.114s }, DIAGNOSE_CASE: { The data skew is significant, with a low sampling rate or uneven sampling. }|
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
并行度低于预期
当实际处理数据的的算子数低于并行度时,会输出以下信息
The number of threads with data is less than the dop. real_parallelism: %ld
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 3046793 | 501563 | DDL_CREATE_INDEX | 2024-12-30 20:59:49.831452 | 2024-12-30 20:59:50.755917 | THREAD_INFO: { parallel_num : 5, row_max: 1, row_max_thread_id: 3150398, row_min: 1, row_min_thread_id: 3150398 slowest_thread_id: 3150399 }, TIME_INFO: { scan_time: 0.030s, sort_time: 0.002s, insert_time: 0.051s }, DIAGNOSE_CASE: { The number of threads with data is less than the dop. real_parallelism: 4. }|
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
局部索引的特殊处理
对于局部索引分批执行的场景。不输出各阶段的执行时间,同时也不进行数据倾斜度的计算。只诊断并行度低于预期的场景。同时会输出批次信息。正在进行的DDL输出当前处理的批次, 完成的DDL输出总批次。
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tenant_id | ddl_task_id | object_table_id | opname | create_time | finish_time | diagnose_info |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1004 | 2079489 | 500005 | DDL_CREATE_PARTITIONED_LOCAL_INDEX | 2024-12-30 15:35:37.995584 | 2024-12-30 19:21:16.706037 | build local index batch num: 3, THREAD_INFO: { parallel_num : 5, row_max: 6512288, row_max_thread_id: 3025919, row_min: 6243328, row_min_thread_id: 3025940 } |
+-----------+-------------+-----------------+------------------------------------+----------------------------+----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+