【Linux】分析hung_panic生成的vmcore
简介
1、遇到一个问题:
上述日志是oom_kill,下述日志是hung_panic
2、分别解释两层含义,全部日志如下:
[75834.243209] kodo invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=968
[75834.245657] CPU: 0 PID: 23476 Comm: kodo Kdump: loaded Tainted: G OE 4.19.90-2305.1.0.019
9.78.uel20.x86_64 #1
[75834.248210] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[75834.250623] Call Trace:
[75834.252090] dump_stack+0x66/0x8b
[75834.253680] dump_header+0x4a/0x1ec
[75834.255234] oom_kill_process+0x24f/0x270
[75834.257018] out_of_memory+0x141/0x570
[75834.259117] mem_cgroup_out_of_memory+0xb5/0xd0
[75834.260763] try_charge+0x723/0x770
[75834.262496] ? mem_cgroup_commit_charge+0x7f/0x4e0
[75834.264713] mem_cgroup_try_charge+0x86/0x180
[75834.266306] __add_to_page_cache_locked+0x60/0x290
[75834.268318] add_to_page_cache_lru+0x4a/0xf0
[75834.270041] iomap_readpages_actor+0x129/0x2a0
[75834.271760] ? iomap_dio_bio_end_io+0x190/0x190
[75834.273816] iomap_apply+0xba/0x160
[75834.275765] ? iomap_dio_bio_end_io+0x190/0x190
[75834.277348] iomap_readpages+0xaa/0x1e0
[75834.279000] ? iomap_dio_bio_end_io+0x190/0x190
[75834.280679] read_pages+0x6d/0x1d0
[75834.282123] ? __do_page_cache_readahead+0x16c/0x1d0
[75834.283745] __do_page_cache_readahead+0x16c/0x1d0
[75834.285347] filemap_fault+0x298/0x8a0
[75834.286755] ? kmem_cache_free+0x180/0x1b0
[75834.288988] __xfs_filemap_fault+0x72/0x200 [xfs]
[75834.290618] __do_fault+0x33/0x110
[75834.291988] do_fault+0x12e/0x490
[75834.293451] __handle_mm_fault+0x613/0x690
[75834.295491] handle_mm_fault+0xc4/0x200
[75834.296884] __do_page_fault+0x240/0x4c0
[75834.298539] do_page_fault+0x31/0x130
[75834.300068] ? async_page_fault+0x8/0x30
[75834.301720] async_page_fault+0x1e/0x30
[75834.303468] memory: usage 12582792kB, limit 12582912kB, failcnt 317157
[75834.305486] memory+swap: usage 12582792kB, limit 9007199254740988kB, failcnt 0
[75834.308073] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[75834.310515] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a: cache:
0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB acti
ve_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[75834.317024] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/5feef66
2206c588f4751444e30c4257c1dfe6f62bec8d5c20bec457186b70fe7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped
_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file
:0KB unevictable:0KB
[75834.324632] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f07
4587671f5e770d3f8071c630a70ede73ee423d59a6dd49149c3a6c734: cache:17524KB rss:12562956KB rss_huge:6912000KBshmem:0KB mapped_file:1188KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:12562956KB in
active_file:16140KB active_file:12KB unevictable:0KB
[75834.333179] Tasks state (memory values in pages):
[75834.335680] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[75834.338171] [ 22697] 0 22697 256 1 32768 0 -998 pause
[75834.340836] [ 23362] 0 23362 3470438 3140655 25550848 0 968 kodo
[75834.343473] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=4e74f074587671f5e770d3f8071c630
a70ede73ee423d59a6dd49149c3a6c734,mems_allowed=0,oom_memcg=/kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-
7cd399c77b7a,task_memcg=/kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f074587671f5e770d3
f8071c630a70ede73ee423d59a6dd49149c3a6c734,task=kodo,pid=23362,uid=0
[75834.354192] Memory cgroup out of memory: Kill process 23362 (kodo) score 1968 or sacrifice child
[75834.357745] Killed process 23362 (kodo) total-vm:13881752kB, anon-rss:12562620kB, file-rss:0kB, shmem-r
ss:0kB
[75834.736239] oom_reaper: reaped process 23362 (kodo), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[76349.203912] TCP: request_sock_TCP: Possible SYN flooding on port 9527. Sending cookies. Check SNMP cou
nters.
[85988.503793] INFO: task kodo:2939685 blocked for more than 1200 seconds.
[85988.506238] Tainted: G OE 4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.508710] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.512771] kodo D 0 2939685 2939616 0x00000080
[85988.515238] Call Trace:
[85988.517192] ? __schedule+0x286/0x740
[85988.517199] schedule+0x29/0xc0
[85988.521494] schedule_preempt_disabled+0xa/0x10
[85988.523722] __mutex_lock.isra.7+0x20b/0x470
[85988.525780] ? fuse_lock_inode+0x27/0x30 [fuse]
[85988.527911] fuse_lock_inode+0x27/0x30 [fuse]
[85988.529928] fuse_lookup+0x46/0x140 [fuse]
[85988.531907] ? d_alloc_parallel+0x95/0x4d0
[85988.533942] __lookup_slow+0x97/0x150
[85988.536004] lookup_slow+0x35/0x50
[85988.537910] walk_component+0x1c4/0x340
[85988.539882] ? fuse_permission+0x30/0x150 [fuse]
[85988.541908] link_path_walk.part.33+0x2a6/0x510
[85988.544042] ? path_init+0x192/0x320
[85988.545916] path_lookupat+0x95/0x210
[85988.547837] filename_lookup+0xb6/0x190
[85988.549753] ? audit_alloc_name+0x7e/0xd0
[85988.551710] ? path_get+0x11/0x30
[85988.553669] ? __audit_getname+0x9f/0xb0
[85988.555655] ? getname_flags+0xb9/0x1e0
[85988.557672] ? vfs_statx+0x73/0xe0
[85988.559591] vfs_statx+0x73/0xe0
[85988.561361] __do_sys_newfstatat+0x31/0x70
[85988.563200] ? syscall_trace_enter+0x1df/0x2e0
[85988.565182] ? __audit_syscall_exit+0x238/0x2c0
[85988.567047] do_syscall_64+0x5f/0x240
[85988.568865] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[85988.571261] INFO: task kodo:2939695 blocked for more than 1200 seconds.
[85988.573951] Tainted: G OE 4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.576253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.578441] kodo D 0 2939695 2939616 0x00000080
[85988.580330] Call Trace:
[85988.581734] ? __schedule+0x286/0x740
[85988.583394] schedule+0x29/0xc0
[85988.584843] schedule_preempt_disabled+0xa/0x10
[85988.586632] __mutex_lock.isra.7+0x20b/0x470
[85988.588191] ? fuse_lock_inode+0x27/0x30 [fuse]
[85988.589818] fuse_lock_inode+0x27/0x30 [fuse]
[85988.591278] fuse_lookup+0x46/0x140 [fuse]
[85988.592731] ? d_alloc_parallel+0x95/0x4d0
[85988.594174] __lookup_slow+0x97/0x150
[85988.595469] lookup_slow+0x35/0x50
[85988.596873] walk_component+0x1c4/0x340
[85988.598236] ? fuse_permission+0x30/0x150 [fuse]
[85988.599717] link_path_walk.part.33+0x2a6/0x510
[85988.601101] ? path_init+0x192/0x320
[85988.602401] path_lookupat+0x95/0x210
[85988.603898] filename_lookup+0xb6/0x190
[85988.605247] ? audit_alloc_name+0x7e/0xd0
[85988.606482] ? path_get+0x11/0x30
[85988.607660] ? __audit_getname+0x9f/0xb0
[85988.609270] ? getname_flags+0xb9/0x1e0
[85988.610547] ? vfs_statx+0x73/0xe0
[85988.611757] vfs_statx+0x73/0xe0
[85988.612875] __do_sys_newfstatat+0x31/0x70
[85988.615046] ? syscall_trace_enter+0x1df/0x2e0
[85988.616437] ? __audit_syscall_exit+0x238/0x2c0
[85988.617825] do_syscall_64+0x5f/0x240
[85988.619091] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[85988.620778] Kernel panic - not syncing: hung_task: blocked tasks
[85988.622425] CPU: 15 PID: 175 Comm: khungtaskd Kdump: loaded Tainted: G OE 4.19.90-2305.1.
0.0199.78.uel20.x86_64 #1
[85988.625743] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[85988.627659] Call Trace:
[85988.628806] dump_stack+0x66/0x8b
[85988.630119] panic+0x106/0x2b6
[85988.631539] watchdog+0x270/0x400
[85988.632777] ? hungtask_pm_notify+0x40/0x40
[85988.634134] kthread+0x113/0x130
[85988.635459] ? kthread_create_worker_on_cpu+0x70/0x70
[85988.636981] ret_from_fork+0x35/0x40
oom-kill内容分析
截取日志如下:
[75834.243209] kodo invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=968
[75834.245657] CPU: 0 PID: 23476 Comm: kodo Kdump: loaded Tainted: G OE 4.19.90-2305.1.0.019
9.78.uel20.x86_64 #1
[75834.248210] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[75834.250623] Call Trace:
[75834.252090] dump_stack+0x66/0x8b
[75834.253680] dump_header+0x4a/0x1ec
[75834.255234] oom_kill_process+0x24f/0x270
[75834.257018] out_of_memory+0x141/0x570
[75834.259117] mem_cgroup_out_of_memory+0xb5/0xd0
[75834.260763] try_charge+0x723/0x770
[75834.262496] ? mem_cgroup_commit_charge+0x7f/0x4e0
[75834.264713] mem_cgroup_try_charge+0x86/0x180
[75834.266306] __add_to_page_cache_locked+0x60/0x290
[75834.268318] add_to_page_cache_lru+0x4a/0xf0
[75834.270041] iomap_readpages_actor+0x129/0x2a0
[75834.271760] ? iomap_dio_bio_end_io+0x190/0x190
[75834.273816] iomap_apply+0xba/0x160
[75834.275765] ? iomap_dio_bio_end_io+0x190/0x190
[75834.277348] iomap_readpages+0xaa/0x1e0
[75834.279000] ? iomap_dio_bio_end_io+0x190/0x190
[75834.280679] read_pages+0x6d/0x1d0
[75834.282123] ? __do_page_cache_readahead+0x16c/0x1d0
[75834.283745] __do_page_cache_readahead+0x16c/0x1d0
[75834.285347] filemap_fault+0x298/0x8a0
[75834.286755] ? kmem_cache_free+0x180/0x1b0
[75834.288988] __xfs_filemap_fault+0x72/0x200 [xfs]
[75834.290618] __do_fault+0x33/0x110
[75834.291988] do_fault+0x12e/0x490
[75834.293451] __handle_mm_fault+0x613/0x690
[75834.295491] handle_mm_fault+0xc4/0x200
[75834.296884] __do_page_fault+0x240/0x4c0
[75834.298539] do_page_fault+0x31/0x130
[75834.300068] ? async_page_fault+0x8/0x30
[75834.301720] async_page_fault+0x1e/0x30
[75834.303468] memory: usage 12582792kB, limit 12582912kB, failcnt 317157
[75834.305486] memory+swap: usage 12582792kB, limit 9007199254740988kB, failcnt 0
[75834.308073] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[75834.310515] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a: cache:
0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB acti
ve_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[75834.317024] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/5feef66
2206c588f4751444e30c4257c1dfe6f62bec8d5c20bec457186b70fe7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped
_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file
:0KB unevictable:0KB
[75834.324632] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f07
4587671f5e770d3f8071c630a70ede73ee423d59a6dd49149c3a6c734: cache:17524KB rss:12562956KB rss_huge:6912000KBshmem:0KB mapped_file:1188KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:12562956KB in
active_file:16140KB active_file:12KB unevictable:0KB
第一段,因系统内存不足,kodo进程触发了oom-killer
[75834.243209] kodo invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=968
[75834.245657] CPU: 0 PID: 23476 Comm: kodo Kdump: loaded Tainted: G OE 4.19.90-2305.1.0.019
9.78.uel20.x86_64 #1
第二段,栈堆是系统遇到了内存不足的问题,内核执行了oom进行回收内存的一个详细说明
[75834.250623] Call Trace:
[75834.252090] dump_stack+0x66/0x8b
[75834.253680] dump_header+0x4a/0x1ec
[75834.255234] oom_kill_process+0x24f/0x270
[75834.257018] out_of_memory+0x141/0x570
[75834.259117] mem_cgroup_out_of_memory+0xb5/0xd0
[75834.260763] try_charge+0x723/0x770
[75834.262496] ? mem_cgroup_commit_charge+0x7f/0x4e0
[75834.264713] mem_cgroup_try_charge+0x86/0x180
[75834.266306] __add_to_page_cache_locked+0x60/0x290
[75834.268318] add_to_page_cache_lru+0x4a/0xf0
[75834.270041] iomap_readpages_actor+0x129/0x2a0
[75834.271760] ? iomap_dio_bio_end_io+0x190/0x190
[75834.273816] iomap_apply+0xba/0x160
[75834.275765] ? iomap_dio_bio_end_io+0x190/0x190
[75834.277348] iomap_readpages+0xaa/0x1e0
[75834.279000] ? iomap_dio_bio_end_io+0x190/0x190
[75834.280679] read_pages+0x6d/0x1d0
[75834.282123] ? __do_page_cache_readahead+0x16c/0x1d0
[75834.283745] __do_page_cache_readahead+0x16c/0x1d0
[75834.285347] filemap_fault+0x298/0x8a0
[75834.286755] ? kmem_cache_free+0x180/0x1b0
[75834.288988] __xfs_filemap_fault+0x72/0x200 [xfs]
[75834.290618] __do_fault+0x33/0x110
[75834.291988] do_fault+0x12e/0x490
[75834.293451] __handle_mm_fault+0x613/0x690
[75834.295491] handle_mm_fault+0xc4/0x200
[75834.296884] __do_page_fault+0x240/0x4c0
[75834.298539] do_page_fault+0x31/0x130
[75834.300068] ? async_page_fault+0x8/0x30
[75834.301720] async_page_fault+0x1e/0x30
第三段意思,内存限制为12G,当前内存使用了12G,由于内存不足有30万次分配失败
内存:
usage 12582792kB: 当前内存使用量为 12,582,792 KB。
limit 12582912kB: 内存限制为12,582,912 KB。
failcnt 317157: 表示由于内存不足,发生了 317,157 次分配失败。交换内存:
usage 12582792kB: 当前内存加交换空间的使用量。
limit 9007199254740988kB:交换空间的限制非常大,接近无限制。
failcnt 0: 目前没有因交换空间不足而导致的失败。内核内存:
usage 0kB: 内核内存使用为 0 KB。
limit 9007199254740988kB: 内核内存限制非常大。
failcnt 0: 内核内存分配没有失败。
[75834.303468] memory: usage 12582792kB, limit 12582912kB, failcnt 317157
[75834.305486] memory+swap: usage 12582792kB, limit 9007199254740988kB, failcnt 0
[75834.308073] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[75834.310515] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a: cache:
0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB acti
ve_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[75834.317024] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/5feef66
2206c588f4751444e30c4257c1dfe6f62bec8d5c20bec457186b70fe7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped
_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file
:0KB unevictable:0KB
[75834.324632] Memory cgroup stats for /kubepods/burstable/podd5e7b3e0-de6a-4965-91c6-7cd399c77b7a/4e74f07
4587671f5e770d3f8071c630a70ede73ee423d59a6dd49149c3a6c734: cache:17524KB rss:12562956KB rss_huge:6912000KBshmem:0KB mapped_file:1188KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:12562956KB in
active_file:16140KB active_file:12KB unevictable:0KB
总结
1、因k8s的每个pod内存最大使用限制为12G,而pod中的内存使用量已经超过了12G。
2、从上述日志可以表明,由于k8s容器pod内存限制导致分配不足,触发内核oom,而kodo为最优考虑而被杀掉,来保证业务正常运行。
hung_panic内容分析
截取日志如下:
[85988.571261] INFO: task kodo:2939695 blocked for more than 1200 seconds.
[85988.573951] Tainted: G OE 4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.576253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.578441] kodo D 0 2939695 2939616 0x00000080
[85988.580330] Call Trace:
[85988.581734] ? __schedule+0x286/0x740
[85988.583394] schedule+0x29/0xc0
[85988.584843] schedule_preempt_disabled+0xa/0x10
[85988.586632] __mutex_lock.isra.7+0x20b/0x470
[85988.588191] ? fuse_lock_inode+0x27/0x30 [fuse]
[85988.589818] fuse_lock_inode+0x27/0x30 [fuse]
[85988.591278] fuse_lookup+0x46/0x140 [fuse]
[85988.592731] ? d_alloc_parallel+0x95/0x4d0
[85988.594174] __lookup_slow+0x97/0x150
[85988.595469] lookup_slow+0x35/0x50
[85988.596873] walk_component+0x1c4/0x340
[85988.598236] ? fuse_permission+0x30/0x150 [fuse]
[85988.599717] link_path_walk.part.33+0x2a6/0x510
[85988.601101] ? path_init+0x192/0x320
[85988.602401] path_lookupat+0x95/0x210
[85988.603898] filename_lookup+0xb6/0x190
[85988.605247] ? audit_alloc_name+0x7e/0xd0
[85988.606482] ? path_get+0x11/0x30
[85988.607660] ? __audit_getname+0x9f/0xb0
[85988.609270] ? getname_flags+0xb9/0x1e0
[85988.610547] ? vfs_statx+0x73/0xe0
[85988.611757] vfs_statx+0x73/0xe0
[85988.612875] __do_sys_newfstatat+0x31/0x70
[85988.615046] ? syscall_trace_enter+0x1df/0x2e0
[85988.616437] ? __audit_syscall_exit+0x238/0x2c0
[85988.617825] do_syscall_64+0x5f/0x240
[85988.619091] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[85988.620778] Kernel panic - not syncing: hung_task: blocked tasks
[85988.622425] CPU: 15 PID: 175 Comm: khungtaskd Kdump: loaded Tainted: G OE 4.19.90-2305.1.
0.0199.78.uel20.x86_64 #1
[85988.625743] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 1.11.0-2.el7 04/01/2014
[85988.627659] Call Trace:
[85988.628806] dump_stack+0x66/0x8b
[85988.630119] panic+0x106/0x2b6
[85988.631539] watchdog+0x270/0x400
[85988.632777] ? hungtask_pm_notify+0x40/0x40
[85988.634134] kthread+0x113/0x130
[85988.635459] ? kthread_create_worker_on_cpu+0x70/0x70
[85988.636981] ret_from_fork+0x35/0x40
第一段:
是kodo:2939695进程由于长时间处于阻塞状态而被标记为“挂起任务”并提醒执行:"echo 0 > /proc/sys/kernel/hung_task_timeout_secs"可以忽略挂起任务超时提醒(默认超时1200后提醒)
[85988.571261] INFO: task kodo:2939695 blocked for more than 1200 seconds.
[85988.573951] Tainted: G OE 4.19.90-2305.1.0.0199.78.uel20.x86_64 #1
[85988.576253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[85988.578441] kodo D 0 2939695 2939616 0x00000080
第二段:
触发了内核panic
[85988.620778] Kernel panic - not syncing: hung_task: blocked tasks
[85988.622425] CPU: 15 PID: 175 Comm: khungtaskd Kdump: loaded Tainted: G OE 4.19.90-2305.1.
0.0199.78.uel20.x86_64 #1
第三段:
详细解释触发panic的栈堆
[85988.627659] Call Trace:
[85988.628806] dump_stack+0x66/0x8b
[85988.630119] panic+0x106/0x2b6
[85988.631539] watchdog+0x270/0x400
[85988.632777] ? hungtask_pm_notify+0x40/0x40
[85988.634134] kthread+0x113/0x130
[85988.635459] ? kthread_create_worker_on_cpu+0x70/0x70
[85988.636981] ret_from_fork+0x35/0x40
panic生成vmcore分析
vmcore解开的错误日志:
KERNEL: vmlinux [TAINTED] DUMPFILE: /root/vmcore [PARTIAL DUMP]CPUS: 32DATE: Sat Aug 10 02:05:30 CST 2024UPTIME: 23:53:08
LOAD AVERAGE: 36.80, 28.43, 21.99TASKS: 2151NODENAME: tcs-30-34-22-251RELEASE: 4.19.90-2305.1.0.0199.78.uel20.x86_64VERSION: #1 SMP Wed Feb 28 12:31:25 CST 2024MACHINE: x86_64 (2699 Mhz)MEMORY: 64 GBPANIC: "Kernel panic - not syncing: hung_task: blocked tasks"PID: 175COMMAND: "khungtaskd"TASK: ffff9a2c46e2b000 [THREAD_INFO: ffff9a2c46e2b000]CPU: 15STATE: TASK_RUNNING (PANIC)
说明:
KERNEL: 内核版本,显示为 [TAINTED] 表示有可能有未签名的模块或其他因素导致内核状态不纯净。
DUMPFILE: 崩溃转储文件的位置,显示为 [PARTIAL DUMP] 表示转储可能不完整。
CPUS: 系统有 32 个 CPU。
UPTIME: 系统运行时间为 23 小时 53 分钟。
LOAD AVERAGE: 系统负载情况,显示平均负载较高,1分,10分,15分。
TASKS: 当前运行的任务数量为 2151。
NODENAME: 主机名。
RELEASE: 内核版本号。
VERSION: 内核构建时间和信息。
MACHINE: 机器架构和主频。
MEMORY: 系统内存为 64 GB。
PANIC: 内核 panic 信息,提示因 hung_task(挂起任务)导致系统无法同步。
PID: 崩溃时的进程 ID 为 175。
COMMAND: 崩溃时正在运行的命令是 khungtaskd,这是处理挂起任务的内核线程。
TASK: 崩溃时的线程信息。
CPU: 崩溃时的 CPU 号为 15。
STATE: 任务状态显示为 TASK_RUNNING(运行中)并处于 panic 状态。
panic的内核栈堆:
PID: 175 TASK: ffff9a2c46e2b000 CPU: 15 COMMAND: "khungtaskd"0 [ffff9a303c0b7d18] machine_kexec at ffffffffb6857b0f1 [ffff9a303c0b7d70] __crash_kexec at ffffffffb695b9812 [ffff9a303c0b7e30] panic at ffffffffb68b0c703 [ffff9a303c0b7eb8] watchdog at ffffffffb698f5e04 [ffff9a303c0b7f10] kthread at ffffffffb68d54e35 [ffff9a303c0b7f50] ret_from_fork at ffffffffb7400245
说明:
进程:175 CPU:15 命令:khungtaskd 触发的panic
总结
1、处理kodo进程超时,任务挂起1200秒并打印到日志提醒
2、由于负载过高,kodo又挂起时间过长,内核khungtaskd进程检测到这一情况,并执行了panic
详细回答
从oom到hung日志都在指向kodo进程,所以基本可以判断是由于此进程导致系统负载过高从而触发了panic。