当前位置: 首页 > news >正文

对 .NET线程 异常退出引发程序崩溃的反思

一:背景

1. 讲故事

前天收到了一个.NET程序崩溃的dump,经过一顿分析之后,发现祸根是因为一个.NET托管线程(DBG=XXXX)的异常退出所致,参考如下:


0:011> !t
ThreadCount:      17
UnstartedThread:  0
BackgroundThread: 16
PendingThread:    0
DeadThread:       0
Hosted Runtime:   noLock  DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception0    1     84d8 000001C0801EAC20    26020 Preemptive  0000000000000000:0000000000000000 000001c080266300 -00001 STA 3    2     9d78 000001C0801F8210    2b220 Preemptive  0000000000000000:0000000000000000 000001c080266300 -00001 MTA (Finalizer) 4    4     8760 000001C08466C800  102b220 Preemptive  0000000000000000:0000000000000000 000001c080266300 -00001 MTA (Threadpool Worker) ...44   16     b2fc 000001C08F949450  102b220 Preemptive  0000000000000000:0000000000000000 000001c080266300 -00001 MTA (GC) (Threadpool Worker) 46   15     9904 000001C08F9487B0  102b220 Preemptive  0000000000000000:0000000000000000 000001c080266300 -00001 MTA (Threadpool Worker) 
XXXX    3     a23c 000001C08F948E00  102b220 Preemptive  0000000000000000:0000000000000000 000001c080266300 -00001 Ukn (Threadpool Worker) 

由于线程异常退出,CLR此时完全不知情,当 GC 触发时会在这个XXXX线程上寻找引用根,由于是一个不存在的线程,所以访问它的空间自然就是访问违例,从 ScanStackRoots 函数调用栈上可以清晰的看到,参考如下:


0:011> .ecxr
rax=00007ffdbefcc8a0 rbx=000000a42007f5f0 rcx=000000a42187f688
rdx=0000000000000000 rsi=000000a42007ee60 rdi=000000a42007f100
rip=00007ffdbec36cbb rsp=000000a42007f828 rbp=000001c08f948e00r8=000000a42007f910  r9=000001c08f948e00 r10=00000fffb7da5860
r11=0555501544555545 r12=ffffffffffffffff r13=0000000000000000
r14=0000000000000000 r15=00007ffdbec14fb0
iopl=0         nv up ei pl nz ac pe cy
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010211
coreclr!InlinedCallFrame::FrameHasActiveCall+0x13:
00007ffd`bec36cbb 483b01          cmp     rax,qword ptr [rcx] ds:000000a4`2187f688=????????????????
0:011> k*** Stack trace for last set context - .thread/.cxr resets it# Child-SP          RetAddr               Call Site
00 000000a4`2007f828 00007ffd`bec36c2e     coreclr!InlinedCallFrame::FrameHasActiveCall+0x13 [D:\a\_work\1\s\src\coreclr\vm\frames.h @ 2927] 
01 000000a4`2007f830 00007ffd`bec36aef     coreclr!ScanStackRoots+0x3a [D:\a\_work\1\s\src\coreclr\vm\gcenv.ee.cpp @ 121] 
02 000000a4`2007f8a0 00007ffd`bec29627     coreclr!GCToEEInterface::GcScanRoots+0x8f [D:\a\_work\1\s\src\coreclr\vm\gcenv.ee.cpp @ 282] 
03 (Inline Function) --------`--------     coreclr!GCScan::GcScanRoots+0x73 [D:\a\_work\1\s\src\coreclr\gc\gcscan.cpp @ 152] 
04 000000a4`2007f8e0 00007ffd`bec14865     coreclr!WKS::gc_heap::background_mark_phase+0xdf [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 37866] 
05 000000a4`2007f990 00007ffd`bed286a0     coreclr!WKS::gc_heap::gc1+0x511 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 22315] 
06 000000a4`2007f9f0 00007ffd`bed391c1     coreclr!WKS::gc_heap::bgc_thread_function+0x68 [D:\a\_work\1\s\src\coreclr\gc\gc.cpp @ 39244] 
07 000000a4`2007fa20 00007ffe`3533e8d7     coreclr!<lambda_7303b2ca2c5f80d5f81ddddfcd2de660>::operator()+0xa1 [D:\a\_work\1\s\src\coreclr\vm\gcenv.ee.cpp @ 1441] 
08 000000a4`2007fa50 00007ffe`363f14fc     kernel32!BaseThreadInitThunk+0x17
09 000000a4`2007fa80 00000000`00000000     ntdll!RtlUserThreadStart+0x2c

说实话这种崩溃我见过很多例,但更多的都是 new Thread 创建出来的,所以用 harmony 对它的 Thread.StartCore 进行拦截就能轻松找出,但这次崩溃有一些特殊,它并不是来自于 new Thread 而是线程池散养的线程(ThreadPool),这对问题分析增加了不少难度,既然是反思,那就好好的总结此类问题的解决思路吧。

二:故障重现

1. 问题代码

为了方便演示,我们用 C# 调用 C,然后在 C 中通过 TerminateThread 让程序异常退出,首先看下 C 代码:


extern "C"
{_declspec(dllexport) void dowork();
}#include "iostream"
#include <Windows.h>using namespace std;void dowork()
{DWORD threadId = GetCurrentThreadId();printf("C++:当前线程ID(十进制):%lu,十六进制:0x%X\n", threadId, threadId);printf("C++:我准备退出了哦。。。\n");TerminateThread(GetCurrentThread(), 1);
}

接下来在 C# 中调用导出的 dowork 方法,参考代码如下:


namespace Example_1_1
{internal class Program{static void Main(string[] args){DoRequest();Console.ReadLine();}static void DoRequest(){Task.Run(() =>{Console.WriteLine("1. 调用 C++ 代码...");try{dowork();Console.WriteLine("2. C++ 代码执行完毕...");}catch (Exception ex){Console.WriteLine($"2. C++ 代码执行异常: {ex.Message}");}});}[DllImport("Example_1_2", CallingConvention = CallingConvention.Cdecl)]public extern static void dowork();}
}

最后将程序运行起来,用windbg附加,可以看到果然有一个 XXXX 线程,截图如下:

故障已经复现,接下来就是寻找到底是谁让 ThreadPool 线程异常退出了。。。

三:如何寻找第一现场

1. process monitor

要想找到这个问题的祸根,需要找到调用 TerminateThread 函数的调用栈,一种简单粗暴的方法就是用 process monitor,根据 Windows 的ETW 规则,一个线程退出时会发出一个 Event 事件,这种事件可以被 process monitor 捕获,并且还能记录到调用栈,有了想法之后说干就干,配置界面如下:

接下来运行程序,使用 windbg 附加进程,寻找问题线程ID,参考如下:


0:005> !t
ThreadCount:      5
UnstartedThread:  0
BackgroundThread: 3
PendingThread:    0
DeadThread:       1
Hosted Runtime:   noLock  DBG   ID     OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception0    1     153c 00000202C603C240    2a020 Preemptive  00000202CA819060:00000202CA81B020 00000202c6088980 -00001 MTA 3    2      afc 00000202C60F0DB0    2b220 Preemptive  0000000000000000:0000000000000000 00000202c6088980 -00001 MTA (Finalizer) 
XXXX    4     4718 00000202C6057D10  102b220 Preemptive  00000202CA80CF70:00000202CA80E740 00000202c6088980 -00001 Ukn (Threadpool Worker) 4    5     4420 00000202C605D510  302b220 Preemptive  00000202CA80EB40:00000202CA810760 00000202c6088980 -00001 MTA (Threadpool Worker) 
0:005> ? 4718
Evaluate expression: 18200 = 00000000`00004718

从卦中可以看到是一个叫 osid=18200 的线程异常退出,接下来从 process monitor 界面上果然看到了一个Thread ID:18200 的 Thread Exit 事件,完美,截图如下:

接下来就是双击,打开 Stack 选项卡,可以清晰的看到是有人调用了 Example_1_2!dowork 导致的退出,截图如下:

在真实项目中,我相信你看到 dowork 函数应该知道发生了什么,排查范围是不是一下子就小了很多。。。相信这个问题你能轻松搞定。

2. MinHook 注入

上面的 process monitor 虽好,但也有一个让人不如意的地方,那就是不能显示托管栈,这个确实没办法,那有没有办法让我看到托管栈呢?如果能看到就完美了,做法非常简单,对 kernel32!TerminateThread 进行注入即可,一旦有人执行了这个方法,记录 Terminate 线程的线程ID以及调用栈即可,完整代码如下:


namespace Example_1_1
{internal class Program{static void Main(string[] args){// Install the hook before any TerminateThread calls can occurTerminateThreadHook.InstallHook();Console.WriteLine("Hook installed. Starting test...");DoRequest();// Uninstall hook when doneTerminateThreadHook.UninstallHook();Console.ReadLine();}static void DoRequest(){Task.Run(() =>{Console.WriteLine("1. 调用 C++ 代码...");try{dowork();Console.WriteLine("2. C++ 代码执行完毕...");}catch (Exception ex){Console.WriteLine($"2. C++ 代码执行异常: {ex.Message}");}});}[DllImport("Example_1_2", CallingConvention = CallingConvention.Cdecl)]public extern static void dowork();}public static class TerminateThreadHook{// TerminateThread function signature[UnmanagedFunctionPointer(CallingConvention.StdCall)]private delegate bool TerminateThreadDelegate(IntPtr hThread, uint dwExitCode);private static TerminateThreadDelegate _originalTerminateThread;private static IntPtr _terminateThreadPtr = IntPtr.Zero;public static void InstallHook(){// 1. Get TerminateThread address from kernel32.dll_terminateThreadPtr = MinHook.GetProcAddress(MinHook.GetModuleHandle("kernel32.dll"), "TerminateThread");if (_terminateThreadPtr == IntPtr.Zero){Console.WriteLine("Failed to find TerminateThread address.");return;}// 2. Initialize MinHookvar status = MinHook.MH_Initialize();if (status != MinHook.MH_STATUS.MH_OK){Console.WriteLine($"MH_Initialize failed: {status}");return;}// 3. Create Hookvar detourPtr = Marshal.GetFunctionPointerForDelegate(new TerminateThreadDelegate(HookedTerminateThread));status = MinHook.MH_CreateHook(_terminateThreadPtr, detourPtr, out var originalPtr);if (status != MinHook.MH_STATUS.MH_OK){Console.WriteLine($"MH_CreateHook failed: {status}");return;}_originalTerminateThread = Marshal.GetDelegateForFunctionPointer<TerminateThreadDelegate>(originalPtr);// 4. Enable Hookstatus = MinHook.MH_EnableHook(_terminateThreadPtr);if (status != MinHook.MH_STATUS.MH_OK){Console.WriteLine($"MH_EnableHook failed: {status}");return;}Console.WriteLine("TerminateThread hook installed successfully!");}public static void UninstallHook(){if (_terminateThreadPtr == IntPtr.Zero)return;// 1. Disable Hookvar status = MinHook.MH_DisableHook(_terminateThreadPtr);if (status != MinHook.MH_STATUS.MH_OK)Console.WriteLine($"MH_DisableHook failed: {status}");// 2. Uninitialize MinHookstatus = MinHook.MH_Uninitialize();if (status != MinHook.MH_STATUS.MH_OK)Console.WriteLine($"MH_Uninitialize failed: {status}");_terminateThreadPtr = IntPtr.Zero;Console.WriteLine("Hook uninstalled.");}private static bool HookedTerminateThread(IntPtr hThread, uint dwExitCode){// Get current thread IDuint currentThreadId = GetCurrentThreadId();uint targetThreadId = GetThreadId(hThread);Console.WriteLine($"[HOOK] TerminateThread intercepted!");Console.WriteLine($"  Attempting to terminate thread: 0x{targetThreadId.ToString("X")} (ID: {targetThreadId})");Console.WriteLine($"  Called from thread ID: {currentThreadId}");// Print managed call stackConsole.WriteLine("\n  [Managed Call Stack]:");Console.WriteLine(Environment.StackTrace);return _originalTerminateThread(hThread, dwExitCode);}[DllImport("kernel32.dll")]private static extern uint GetCurrentThreadId();[DllImport("kernel32.dll")]private static extern uint GetThreadId(IntPtr hThread);}public static class MinHook{public enum MH_STATUS{MH_OK = 0,MH_ERROR_ALREADY_INITIALIZED,MH_ERROR_NOT_INITIALIZED,// ... other status codes}[DllImport("MinHook.x64.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_Initialize();[DllImport("MinHook.x64.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_Uninitialize();[DllImport("MinHook.x64.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_CreateHook(IntPtr pTarget, IntPtr pDetour, out IntPtr ppOriginal);[DllImport("MinHook.x64.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_EnableHook(IntPtr pTarget);[DllImport("MinHook.x64.dll", CallingConvention = CallingConvention.Cdecl)]public static extern MH_STATUS MH_DisableHook(IntPtr pTarget);[DllImport("kernel32.dll", CharSet = CharSet.Unicode)]public static extern IntPtr GetModuleHandle(string lpModuleName);[DllImport("kernel32.dll", CharSet = CharSet.Ansi)]public static extern IntPtr GetProcAddress(IntPtr hModule, string lpProcName);}
}

从卦中信息看果然拦截到了,通过 Environment.StackTrace 属性将托管栈完美的展示出来,但这里也有一个小遗憾就是没看到非托管部分,如果真想要的话可以借助 dbghelp.dll,这个就不细说了,总之根据这些调用栈日志 再比对 dump 中的异常退出线程,最终就会真相大白。。。

四:总结

如今.NET的主战场在工控,而工控中有大量的C#和C++交互的场景,C++处理不慎就会导致C#灾难性后果,这篇文章所输出的经验希望给后来者少踩坑吧!

文章转载自:一线码农

原文链接:对 .NET线程 异常退出引发程序崩溃的反思 - 一线码农 - 博客园

体验地址:JNPF快速开发平台

http://www.lryc.cn/news/612154.html

相关文章:

  • PowerShell部署Windows爬虫自动化方案
  • 玩转 InfluxDB 3:用 HTTP API 快速创建高效数据表
  • 【Linux】调试器gdb/cgdb的使用
  • 信号处理:信号产生
  • 张艺兴续约担任传音手机全球品牌代言人 携手共启创新征程
  • 企业级DDoS防护实战案例
  • 数字取证和网络安全:了解两者的交叉点和重要性
  • 什么是 Kafka 中的消息?它由哪些部分组成
  • 《设计模式之禅》笔记摘录 - 13.迭代器模式
  • JP3-4-MyClub后台前端(二)
  • leetcode 3479. 水果成篮 III 中等
  • 多端同步新解法:Joplin+cpolar联合通过开源设计实现跨平台无缝协作?
  • 【学习笔记之redis】删除缓存
  • vue3 el-select el-option 使用
  • 学习嵌入式之硬件——ARM体系
  • CubeFS存储(一)
  • 【前端开发】四. JS内置函数
  • [特殊字符]企业游学 | 探秘字节,解锁AI科技新密码
  • 【Linux】重生之从零开始学习运维之主从MGR高可用
  • 无人机航拍数据集|第6期 无人机垃圾目标检测YOLO数据集772张yolov11/yolov8/yolov5可训练
  • 【python】OpenCV—Defect Detection
  • AI浪潮下,FPGA如何实现自我重塑与行业变革
  • 深度模拟用户行为:用Playwright爬取B站弹幕与评论数据
  • 2025年高防IP隐身术:四层架构拆解源站IP“消失之谜”
  • 微算法科技(NASDAQ:MLGO)利用鸽群分散算法,提高区块链交易匹配算法效能
  • Kafka ISR机制和Raft区别:副本数优化的秘密
  • 智能提示词引擎的革新与应用:PromptPilot使用全解析
  • 北京JAVA基础面试30天打卡03
  • PDF注释的加载和保存的实现
  • Go语言数据类型深度解析:位、字节与进制