CPU Virtualization - 在KVM虛擬機器中支援OpenCL圖形加速裝置

To virtualize a CPU and share the resources of the processor, the hypervisor needs to tercept and handle the execution of special instructions of guest OSes. The types of

in-CHAPTER 2. BACKGROUND 12

structions in a hardware architecture are divided into innocuous instructions and sensitive instructions [26]. Sensitive instructions are those that should be intercepted and handled by the hypervisor when they are executed by a guest OS while innocuous instructions are those other than sensitive instructions. Sensitive instructions can be further divided into control- and behavior-sensitive. Control-sensitive instructions are those that provide con-trol of resources while behavior-sensitive instructions are those whose behavior or results depend on the configuration of resources. The control should be transferred to the hyper-visor when a guest system executes sensitive instuctions to avoid directly accessing the resources or change the system configuration of other guests. Such mechanism is called trap and emulation.

Popek and Goldberg defined a set of conditions sufficient for a computer architecture to support system virtualization in 1974 [26]. They intorduced privileged instructions which are defined as those that trap if the machine is in user mode and do not trap if the machine is in privileged mode. In Popek and Goldberg’s theories, an effctive system VM is con-structed if the set of sensitive instructions is a subset of the set of privileged instructions in a specific hardware architecture. If a hardware architecture meets the condition, the architec-ture can be fully virtualized. The relationship between privileged and sensitive instructions is illustrated in Figure 2.5. In x86 architectures, there is a set of instructions called critical instructions which are sensitive instructions but not belong to privileged instructions. For example, POPF instruction pops the flag registers from a stack held in memory, but the interrupt-enable flag is not affected since it can only be modified in privilieged mode. Such instructions in x86 architectures can not be trapped and emulated efficiently in a system virtualization environment.

There are three methods to handle critical instructions in x86 architectures: software emulation, para-virtualization and hardware-assisted virtualization. The three methods will be discussed as follows.

Privileged Non-Privileged

Sensitive

Behavior-sensitive

Control-sensitive

Innocuous

Sensitive

Figure 2.5: Types of insturctions and their relationship with respect to CPU virtualization.

Software Emulation

With software emulation, the hypervisor emulates all of the execution of instructions so the hypervisor can handle the execution of critical instructions. The guest VM can run unmodified OS but such mechanism has significant performance degradation because the emulation processes have lots of overheads. To reduce the performance impact, dynamic binary translation (DBT) is introduced to increase the speed of the hot-path and decrease the performance impact of emulation. QEMU [11] is an example of CPU emulation.

Para-virtualization

For the efficiency concern, para-virtualization requires the critical instructions in the guest OSes to be substituted by hypercalls which generate a trap so that the hypervisor can re-ceive the notification and perform suitable actions, and it can execute innocuous instruc-tions as in the native environment. Para-virtualization can achieve significant performance improvement, but the requirement of modifying guest OSes is the major disadvantage.

Xen [10] is an example which uses para-virtualization.

Hardware-assisted virtualization

Hardware-assisted virtualization is a hardware extension that enables efficient full virtual-ization using the help from hardware capabilities and allows the hypervisor to execute

un-CHAPTER 2. BACKGROUND 14

Figure 2.6: The relationship among guest OSes, the hypervisor, and hardware-assisted virtualization, using Intel VT-x as an example (adapted from [3]).

modified OSes in complete isolation. Intel and AMD proposed their x86 hardware-assisted virtualization implementations (Intel VT-x and AMD-V) in 2006. Multiple system VMs, such as KVM and Xen hardware virtual machine (Xen HVM), have added the hardware-assisted support for better performance.

Both the virtualization support provided by Intel and AMD are conceptually similar.

Intel VT-x introduced two new execution modes, VMX-root-mode and non-root mode, which are orthogonal to the existing x86 provileged modes. VMX-root-mode and non-root mode are also known as non-root mode and guest mode, respectively. The hypervisor lies in root mode while guest OSes execute in guest mode, and thus guest OSes do not need to be modified. When a CPU executed in guest mode encounters a critical instruction, the CPU will switch to root mode which is called a VM exit and pass the execution to a pre-registered routine of the hypervisor, i.e. trap and emulation by hardware extension.

After the emulation processed by the hypervisor, the control is switched back to a specific guest OS which is called a VM entry. A new register called virtual machine control struc-ture (VMCS) is added to record the system configuration of a specific guest OS, which is maintained by the hypervisor. The relationship among guest OSes, the hpervisor, and

Figure 2.7: Intel EPT translation details (adapted from [3]).

hardware-assisted virtualization support is shown in Figure 2.6.

Intel and AMD also proposed their virtualization support for memory management unit (MMU) to accelerate the address translation from guest virtual address to physical address with much less overhead than maintaining a shadow page table by the hypervisor.

The MMU virtualization techiniques in Intel is named extended page table (Intel EPT) and AMD names it as nested page table (AMD NPT). When a guest OS tries to maintain its page table by accessing the x86 CR3 register, the hypervisor will intercept this action and substitute the page table entry with the extended/nested page table. The Intel EPT translation scheme is shown in Figure 2.7.

2.4 I/O Virtualization

Virtualization of I/O devices is more difficult than virtualization of processors or memory subsystems in a system VM. The difficulty of virtualizing I/O devices is that there are many kinds of I/O devices and the characteristics of I/O devices are much different. There are two key points of virtualizing an I/O device, including building the virtual version of the device and virtualizing the I/O activities of the device. We will briefly describe the two issues as follows.

CHAPTER 2. BACKGROUND 16

Application

Hardware Operating System

VMM I/O Drivers

System calls

Physycal memory and I/O operations Driver calls

Figure 2.8: Major interfaces in performing an I/O action (adapted from [23]).

2.4.1 Virtualizing Devices

There are different virtualization strategies for different kinds of I/O devices. Some I/O devices such as keyboards, mouses, speakers must be dedicated to a specific guest VM or be switched between guest VMs for a long period. Such devices are called dedicated devices. For devices such as disks, it is suitable to partition the resources for multiple guest VMs, which are called partitioned devices. Some devices such as a network interface card (NIC) can be shared among guest VMs. Such devices are called shared devices.

For the different types of devices, the hypervisor has not only to maintain the virtual states for each virtual device but to intercept the interactions between physical and virtual devices. The requests from different guest VMs should be dispatched by the hypervisor in a fairly-sharing manner. The results of I/O devices should be routed by the hypervisor, and the interrupts from physical devices should be first handled by the hypervisor directly and routed to the destination guest VM.

2.4.2 Virtualizing I/O Activities

The actions of I/O processes are divided into three levels: I/O operation level, device driver level, and system call level, which are illustrated in Figure 2.8.

Virtualizing at the I/O Operation Level

The x86 architectures provide both memory-mapped I/O (MMIO) and port-mapped I/O (PMIO) for signaling the device controller or transferring the data. The hypervisor can intercept such I/O operations due to the nature of x86 privilege level. When a guest VM executes a PMIO instruction or access an MMIO space, the operation will trap into the hypervisor, and the hypervisor then performs corresponding emulation. Since a high-level I/O action may takes several I/O operations, it is extremely difficult for the hypervisor to “reverse engineer” the individual I/O operations to infer the complete I/O action. On the other hand, too much trap-and-emulation for I/O actions would cause dramatic perfor-mance degradation.

Virtualizing at the Device Driver Level

System calls such as read()orwrite() are converted by the OS into corresponding device driver calls. If the hypervisor can intercept the invocation of these driver calls, it can directly get the information of high-level I/O action of a virtual device and redirect the calls to the corresponding physical device. This scheme requires the guest VMs to execute a modified version of a device driver which is designed for a specific hypervisor and an OS, and the virtual device driver would deliver the I/O actions actively to the hypervisor.

Although the modification of the device driver results in the guest OS being aware of itself in the virtualized environment, it can extremely reduce the overhead of virtualizing I/O actions. This approach can be regarded as an I/O para-virtualization scheme at device driver level.

Virtualizing at the System Call Level

Virtualizing at the system call level means the hypervisor will handle the whole system call requests of guest VMs. To accomplish this, however, the guest OSes are required to be modified to add a mechanism to transfer the requests of guest VMs or the emulation results

CHAPTER 2. BACKGROUND 18

by the hypervisor, typically by adding new routines at the application binary interface (ABI) level. Comparing with virtualizing at the device driver level, this scheme requires more knowledges about the internals of different guest OS kernels and has much more difficulty for implementation.

在文檔中在KVM虛擬機器中支援OpenCL圖形加速裝置 (頁 22-29)