Fixing the EXCEPTION_ACCESS_VIOLATION generated while debugging the suspended processes on Windows

Publication date: 2019-10-06

Today we'll discuss one rather mysterious problem that I've encountered while writing my own native debugger for Windows.

The task I was solving was in attaching a debugger written using the Windows Debugger API to the process that has just been started with a CREATE_SUSPENDED flag.

The standard (and basically the only) way to attach a native debugger is a DebugActiveProcess call. According to the documentation, this function will suspend all the threads in the target process, and then generate some initial debug events (that will be received by the following WaitForDebugEvent call), and then will resume the threads. It is also noted that this function should generate an EXCEPTION_DEBUG_EVENT.

Unfortunately, this isn't true if the target process is suspended at the moment of the attach.

  1. The threads that were suspended at the moment of the attach won't be automatically resumed after the DebugActiveProcess call.. Yes, this is probably good and expected, but I have a feeling that the documentation could be more explicit on that.
  2. An EXCEPTION_DEBUG_EVENT won't be generated on attaching to a process that was just started with the CREATE_SUSPENDED flag. This surprises most of the people who had the native debugging experience, but believe me: that's true. This event won't be generated.

The documentation isn't only incomplete in a sense that it doesn't cover the CREATE_SUSPENDED flag; it also doesn't mention one very important thing: the DebugActiveProcess call will create a new thread in the target process. This thread is necessary for some technical things in a debugger, and it will execute some code.

This doesn't sound like a problem, right?

Hell it does.

There's a chance (with probability dependent on the target process architecture and OS version) that the debuggee process will fail immediately after you attach to it. I've been experimenting on a 64-bit Windows 10 version (build 1903) and have checked both 32-bit and 64-bit processes. 64-bit processes seems safe, but 32-bit ones will break in about 1.5% of all the cases. Looks like any process could be prone to this issue: both native and .NET processes, GUI and console ones were failing in my tests. In an answer to my Stack Overflow question (which we'll discuss below), a user named @RbMm claims that the processes would always fail on Windows XP.

The "clean" (non-erroneous) debugging session goes like that:

If you're unlucky, then your debugging session will hold the following events:

After that, the debuggee is in an exception state, and the debug process cannot be proceeded normally. Your debuggee is doomed and will terminate soon.

I've been debugging my debuggers for quite a long time, and finally decided to ask a question on the Stack Overflow site. Surprisingly enough, my question soon started to receive comments and then even an answer by @RbMm (who single-handedly seems to deal with all the debugger API-related questions on the whole Stack Overflow, much gratitude for that).

So, why is this happening? From @RbMm description and some other reads over the Internet, I think we can conclude the following. The system runtime (CRT maybe?) requires the debuggee to perform some initialization. This usually occurs on a first call to runtime facilities in the process' lifetime, and it is performed by the first thread which calls the runtime facilities. Usually, when the debugger is attached to some live process, this initialization is already performed, so no problem there. But if the debugger is attached to a process that is in suspended state due to being started with the CREATE_SUSPENDED flag, then we're in trouble: the new thread created by the DebugActiveProcess call may perform this initialization. And initializing the system runtime (whatever it is) in any thread other than main leads to the issue at hand.

Alright, what could we do now? The only documented way of attaching the debugger is error-prone. Well, let's attach the debugger in an undocumented way then! Seriously: there's no documented way to overcome the issue, so we have to be clever.

The system ntdll library has some helpful functions undocumented officially (but helpfully discovered by researchers and documented online). We're particularly interested in NtDebugActiveProcess and NtWaitForDebugEvent, which doesn't differ too much from the official DebugActiveProcess and WaitForDebugEvent: they require the user to explicitly manage the debugger context via the DebugObjectHandle parameter (instead of relying on implicit thread local context), and NtDebugActiveProcess doesn't create any additional threads in a target process. Okay, we could use that! Don't forget to link with ntdll.lib: it provides these helpful functions to your application.

Building a working prototype out of undocumented function, structure and enum definitions scattered across the Internet isn't very easy task: I had to literally tie different signatures with a bit of tape wire to make it work. But it worked out, and the resulting code is now free of threading issues. At least I think so.

Here's a link to a GitHub project that demonstrates a problem in the simplified branch, and shows an NtDebugActiveProcess-based solution in the ntdebugactiveprocess branch (sorry if the code is rather dirty, it was quickly written as a proof-of-concept).

To run the project and perform the tests you'll need to compile it and then execute the resulting binary while piping the output to a file:

$ Debug\NetRuntimeWaiter.exe > log.txt

It is important to redirect output to the log file and not show it in the terminal: without that, timings for the log writer get changed, and the issue won't reproduce (due to a possible race condition maybe).