Fixing the EXCEPTION_ACCESS_VIOLATION generated while debugging the suspended processes on Windows
Today we'll discuss one rather mysterious problem that I've encountered while writing my own native debugger for Windows.
The task I was solving was in attaching a debugger written using the Windows
Debugger API to the process that has just been started with a CREATE_SUSPENDED
flag.
The standard (and basically the only) way to attach a native debugger is a
DebugActiveProcess
call. According to the
documentation, this function will suspend all the threads in the target
process, and then generate some initial debug events (that will be received by
the following WaitForDebugEvent
call), and then
will resume the threads. It is also noted that this function should generate an
EXCEPTION_DEBUG_EVENT
.
Unfortunately, this isn't true if the target process is suspended at the moment of the attach.
- The threads that were suspended at the moment of the attach won't be
automatically resumed after the
DebugActiveProcess
call. Yes, this is probably good and expected, but I have a feeling that the documentation could be more explicit on that. - An
EXCEPTION_DEBUG_EVENT
won't be generated on attaching to a process that was just started with theCREATE_SUSPENDED
flag. This surprises most of the people who had the native debugging experience, but believe me: that's true. This event won't be generated.
The documentation isn't only incomplete in a sense that it doesn't cover the
CREATE_SUSPENDED
flag; it also doesn't mention one very important thing: the
DebugActiveProcess
call will create a new thread in the target process. This
thread is necessary for some technical things in a debugger, and it will execute
some code.
This doesn't sound like a problem, right?
Hell it does.
There's a chance (with probability dependent on the target process architecture and OS version) that the debuggee process will fail immediately after you attach to it. I've been experimenting on a 64-bit Windows 10 version (build 1903) and have checked both 32-bit and 64-bit processes. 64-bit processes seems safe, but 32-bit ones will break in about 1.5% of all the cases. Looks like any process could be prone to this issue: both native and .NET processes, GUI and console ones were failing in my tests. In an answer to my Stack Overflow question (which we'll discuss below), a user named @RbMm claims that the processes would always fail on Windows XP.
The "clean" (non-erroneous) debugging session goes like that:
CREATE_PROCESS_DEBUG_EVENT
LOAD_DLL_DEBUG_EVENT
(which supposedly reports aboutntdll.dll
loading, but I never got the name via the debugging API, which is documented and thus fine)CREATE_THREAD_DEBUG_EVENT
LOAD_DLL_DEBUG_EVENT
[…] — after this, many DLLs get loaded into the target process and everything looks okay, the process works as intended
If you're unlucky, then your debugging session will hold the following events:
CREATE_PROCESS_DEBUG_EVENT
LOAD_DLL_DEBUG_EVENT
CREATE_THREAD_DEBUG_EVENT
EXCEPTION_DEBUG_EVENT
:EXCEPTION_ACCESS_VIOLATION
(which I never was able to gather details for: it reports a DEP violation by passing8
in theEXCEPTION_RECORD::ExceptionInformation
array, and the address is empty)
After that, the debuggee is in an exception state, and the debug process cannot be proceeded normally. Your debuggee is doomed and will terminate soon.
I've been debugging my debuggers for quite a long time, and finally decided to ask a question on the Stack Overflow site. Surprisingly enough, my question soon started to receive comments and then even an answer by @RbMm (who single-handedly seems to deal with all the debugger API-related questions on the whole Stack Overflow, much gratitude for that).
So, why is this happening? From @RbMm description and some other reads over
the Internet, I think we can conclude the following. The system runtime (CRT
maybe?) requires the debuggee to perform some initialization. This usually
occurs on a first call to runtime facilities in the process' lifetime, and it is
performed by the first thread which calls the runtime facilities. Usually, when
the debugger is attached to some live process, this initialization is already
performed, so no problem there. But if the debugger is attached to a process
that is in suspended state due to being started with the CREATE_SUSPENDED
flag, then we're in trouble: the new thread created by the DebugActiveProcess
call may perform this initialization. And initializing the system runtime
(whatever it is) in any thread other than main leads to the issue at hand.
Alright, what could we do now? The only documented way of attaching the debugger is error-prone. Well, let's attach the debugger in an undocumented way then! Seriously: there's no documented way to overcome the issue, so we have to be clever.
The system ntdll
library has some helpful functions undocumented officially
(but helpfully discovered by researchers and documented online). We're
particularly interested in NtDebugActiveProcess
and NtWaitForDebugEvent
,
which doesn't differ too much from the official DebugActiveProcess
and
WaitForDebugEvent
: they require the user to explicitly manage the debugger
context via the DebugObjectHandle
parameter (instead of relying on implicit
thread local context), and NtDebugActiveProcess
doesn't create any additional
threads in a target process. Okay, we could use that! Don't forget to link with
ntdll.lib
: it provides these helpful functions to your application.
Building a working prototype out of undocumented function, structure and enum definitions scattered across the Internet isn't very easy task: I had to literally tie different signatures with a bit of tape wire to make it work. But it worked out, and the resulting code is now free of threading issues. At least I think so.
Here's a link to a GitHub project that demonstrates a problem in the
simplified
branch, and shows an
NtDebugActiveProcess
-based solution in the
ntdebugactiveprocess
branch (sorry if the code
is rather dirty, it was quickly written as a proof-of-concept).
To run the project and perform the tests you'll need to compile it and then execute the resulting binary while piping the output to a file:
$ Debug\NetRuntimeWaiter.exe > log.txt
It is important to redirect output to the log file and not show it in the terminal: without that, timings for the log writer get changed, and the issue won't reproduce (due to a possible race condition maybe).