Fixing the EXCEPTION_ACCESS_VIOLATION generated while debugging the suspended processes on Windows
Today we'll discuss one rather mysterious problem that I've encountered while writing my own native debugger for Windows.
The task I was solving was in attaching a debugger written using the Windows
Debugger API to the process that has just been started with a
The standard (and basically the only) way to attach a native debugger is a
DebugActiveProcess call. According to the
documentation, this function will suspend all the threads in the target
process, and then generate some initial debug events (that will be received by
WaitForDebugEvent call), and then
will resume the threads. It is also noted that this function should generate an
Unfortunately, this isn't true if the target process is suspended at the moment of the attach.
- The threads that were suspended at the moment of the attach won't be
automatically resumed after the
DebugActiveProcesscall.. Yes, this is probably good and expected, but I have a feeling that the documentation could be more explicit on that.
EXCEPTION_DEBUG_EVENTwon't be generated on attaching to a process that was just started with the
CREATE_SUSPENDEDflag. This surprises most of the people who had the native debugging experience, but believe me: that's true. This event won't be generated.
The documentation isn't only incomplete in a sense that it doesn't cover the
CREATE_SUSPENDED flag; it also doesn't mention one very important thing: the
DebugActiveProcess call will create a new thread in the target process. This
thread is necessary for some technical things in a debugger, and it will execute
This doesn't sound like a problem, right?
Hell it does.
There's a chance (with probability dependent on the target process architecture and OS version) that the debuggee process will fail immediately after you attach to it. I've been experimenting on a 64-bit Windows 10 version (build 1903) and have checked both 32-bit and 64-bit processes. 64-bit processes seems safe, but 32-bit ones will break in about 1.5% of all the cases. Looks like any process could be prone to this issue: both native and .NET processes, GUI and console ones were failing in my tests. In an answer to my Stack Overflow question (which we'll discuss below), a user named @RbMm claims that the processes would always fail on Windows XP.
The "clean" (non-erroneous) debugging session goes like that:
LOAD_DLL_DEBUG_EVENT(which supposedly reports about
ntdll.dllloading, but I never got the name via the debugging API, which is documented and thus fine)
LOAD_DLL_DEBUG_EVENT[…] — after this, many DLLs get loaded into the target process and everything looks okay, the process works as intended
If you're unlucky, then your debugging session will hold the following events:
EXCEPTION_ACCESS_VIOLATION(which I never was able to gather details for: it reports a DEP violation by passing
EXCEPTION_RECORD::ExceptionInformationarray, and the address is empty)
After that, the debuggee is in an exception state, and the debug process cannot be proceeded normally. Your debuggee is doomed and will terminate soon.
I've been debugging my debuggers for quite a long time, and finally decided to ask a question on the Stack Overflow site. Surprisingly enough, my question soon started to receive comments and then even an answer by @RbMm (who single-handedly seems to deal with all the debugger API-related questions on the whole Stack Overflow, much gratitude for that).
So, why is this happening? From @RbMm description and some other reads over
the Internet, I think we can conclude the following. The system runtime (CRT
maybe?) requires the debuggee to perform some initialization. This usually
occurs on a first call to runtime facilities in the process' lifetime, and it is
performed by the first thread which calls the runtime facilities. Usually, when
the debugger is attached to some live process, this initialization is already
performed, so no problem there. But if the debugger is attached to a process
that is in suspended state due to being started with the
flag, then we're in trouble: the new thread created by the
call may perform this initialization. And initializing the system runtime
(whatever it is) in any thread other than main leads to the issue at hand.
Alright, what could we do now? The only documented way of attaching the debugger is error-prone. Well, let's attach the debugger in an undocumented way then! Seriously: there's no documented way to overcome the issue, so we have to be clever.
ntdll library has some helpful functions undocumented officially
(but helpfully discovered by researchers and documented online). We're
particularly interested in
which doesn't differ too much from the official
WaitForDebugEvent: they require the user to explicitly manage the debugger
context via the
DebugObjectHandle parameter (instead of relying on implicit
thread local context), and
NtDebugActiveProcess doesn't create any additional
threads in a target process. Okay, we could use that! Don't forget to link with
ntdll.lib: it provides these helpful functions to your application.
Building a working prototype out of undocumented function, structure and enum definitions scattered across the Internet isn't very easy task: I had to literally tie different signatures with a bit of tape wire to make it work. But it worked out, and the resulting code is now free of threading issues. At least I think so.
Here's a link to a GitHub project that demonstrates a problem in the
simplified branch, and shows an
NtDebugActiveProcess-based solution in the
ntdebugactiveprocess branch (sorry if the code
is rather dirty, it was quickly written as a proof-of-concept).
To run the project and perform the tests you'll need to compile it and then execute the resulting binary while piping the output to a file:
$ Debug\NetRuntimeWaiter.exe > log.txt
It is important to redirect output to the log file and not show it in the terminal: without that, timings for the log writer get changed, and the issue won't reproduce (due to a possible race condition maybe).