Takeaways
It is substantially harder to protect a hypervisor from tampering by a guest when it allows the guest to access hardware capabilities on the opt-out basis. Some processor features may be abused to corrupt hypervisor memory regions from the guest even if they are protected through Second Level Address Translation (SLAT) and IOMMU. The Intel Processor Trace feature is one such example.
If the hypervisor needs to be protected from tampering by the guest, hardware access from the guest should be blocked by default and allowed only on the opt-in basis instead.
Para pass-through hypervisor design and implementation
Hypervisors may be designed in a way that the guest remains capable of accessing the most of system resources. This is done when the hypervisor functions as part of the current system stack (ie, hardware, firmware, operating system, and applications) to offer additional features, instead of to run multiple instances of virtualized system stacks. For example, Blue Pill was a hypervisor adding a rootkit capability to the currently running system, and Avast, Kaspersky and other vendors developed hypervisors for the same security rather entertainingly.
Unlike the full-fledged hypervisors like Xen, KVM/QEMU, VMware, and Hyper-V, which can create full virtualized system stacks (eg, virtual processor, chipset, peripheral devices, and firmware), those hypervisors are add-ons to the current system, and thus, there is no need to create a new fully virtualized environment. This leads to an interesting hypervisor design sometimes referred to as a para pass-through hypervisor, where a hypervisor hyper-jacks the current system and intercepts only minimal activities while passing through most of them to hardware.
Illustration of hypervisor designs |
Implementation of such a hypervisor might look like this:
When a hypervisor can be configured not to intercept system activities that are unnecessary, it does so to reduce overhead. For example, on the Intel processors, access to the Model Specific Registers (MSRs) from the guest can be configured not to cause interception (VM-exit). By doing so, the guest is free to read from and write to MSRs and use associated hardware features, which minimalizes the chances of compatibility and performance issues.
In other cases where interception cannot be disabled, the hypervisor intercepts the guest activity but repeats the same operation on behalf of the guest. The CPUID instruction, for example, always causes VM-exit on the Intel processor. However, the hypervisor executes the CPUID instruction with the same parameters as the guest attempted and returns the results to the guest. This effectively lets the guest execute and get the results from the processor as if there were no hypervisor.
This is a fairly common design and implementation for research hypervisors, as it reduces a significant amount of code that needs to be written while avoiding compatibility issues. The hypervisor cares only what it cares about. It is all fun and game.
Until one hopes to protect the hypervisor from tampering by the guest.
Hypervisor protection from the guest
It is natural to think of protecting the hypervisor from the guest given that the virtualization technology allows a piece of software (ie, the hypervisor) to run on a higher privilege than the traditional kernel mode. A developer of para pass-through hypervisor-based security software will not want the malicious kernel-mode code to be able to disable, bypass, or modify the hypervisor.
Such protection is implemented by leavening SLAT and IOMMU, such as Extended Page Tables (EPT) and DMA remapping on the Intel processors. For example, DMA attempting to access physical memory where hypervisor resides or uses may be blocked by removing access permission in the IOMMU translation tables. The same goes for standard memory access by processors using SLAT.
SLAT and IOMMU bypass with a processor feature
Interestingly, however, multiple processor features completely bypass translations by SLAT and IOMMU. If those features are exposed to the guest, the guest would be able to corrupt contents of the physical memory regions that were supposed to be inaccessible.
Intel Processor Trace is such an example. Intel Processor Trace is a feature to trace code execution by the processor itself and let software analyze trace logs later. The logs are stored into the physical address specified by the IA32_RTIT_OUTPUT_BASE MSR, and tracing is started by setting the bit 0 of IA32_RTIT_CTL MSR. Thus, if the guest were able to write to those two MSRs, it would be possible to corrupt hypervisor memory by setting such a physical address to the IA32_RTIT_OUTPUT_BASE MSR and staring tracing.
One might think the simplest fix would be opt-outing this feature by intercepting write to the MSRs and injecting #GP(0). That is not enough, as those two MSRs are part of the state component, which may be set with the XRSTORS instruction as well. This must be taken care of too.
Alternatively, one can set the "Intel PT uses guest physical addresses" VM-execution control to have the processor treat the address specified via the MSR as a guest physical address and go through translation with the EPT.
This is what I found in some research hypervisors like BitVisor.
The design problem
Given the previously discussed fix, now, the hypervisor is protected from tampering again. What about the hardware feedback interface that is newly added to the 12th gen Intel processor? This is a feature to let the processor give "feedback" to software by writing performance information onto the physical memory address specified by the IA32_HW_FEEDBACK_PTR MSR. Again, if the guest were able to write to the MSR, it would be possible to corrupt hypervisor memory by writing a hypervisor address to the MSR. A fix would simply be opting-out the feature by intercepting write to the MSR and injecting #GP(0).
Though at this point, one can expect that the same issue may resurface in the future as the processor features expand. Additionally, how can we be sure that there are no other hardware features that allow similar corruption? The chipset, for example, has hundreds (if not thousands) of registers that offer little-known capabilities and evolves every generation.
It is very challenging to keep the hypervisor protected from the guest when the hypervisor limits the hardware access from the guest only on the opt-out basis. Instead, all major hypervisors expose hardware features on the opt-in basis and emulate the rest. For example, all MSR accesses are intercepted except those that are deemed to be safe.Holiday happy cat |