Introduction - 考慮虛擬機器間傳輸量的虛擬機器搬移機制

1.1 Motivation

Using resource efficiently and providing reliable services are both critical issues for server clusters. However, excessive utilizations of resources such as CPU, memory, network and disk bandwidths would lead to system overload, which decreases the service reliability and performance. A sudden load surge could cause a significant deterioration of service quality, even leads to service denial.

A number of load management mechanisms have been proposed for balancing the loads of the servers in a network service. In those mechanisms, the load controller dispatches the requests to the servers according to the server load. If an overload occurs, the controller can dispatch requests to the other servers or move some requests from the overloaded server (i.e., the hotspot) to the other servers.

Another way to solve the problem of system overload is through process migration, which can eliminate system overload effectively by moving process from the overloaded machine to another one with a lighter load. Process migration is not limited to load management within a network service. However, there are some challenges for process migration, such as full transparency, dependence of other processes, fast transferring for process state, proper migration algorithm and etc [1][5][9][10][11][12]. High implementation cost is needed if the operating systems themselves do not support process migration.

In virtualization environments, which rapidly gain in popularity in recent years, system overloads can easily be solved by using virtual machine (VM) migration [VM migration, sandpiper], which is user-transparent and hence avoids the above implementation costs. However, existing migration policies[22][23][24][25], which

determines the VM to be migrated and the destination host, focus mainly on the load balancing and hotspot elimination, but ignore the fact that different VMs in the same physical machine may communicate with each other.

In this thesis, we assume the communication performance in a physical machine is superior to that among physical machines. This assumption holds for low-cost clusters, which do not use specialized high speed links (such as Myrinet) for intra-cluster communication. Moreover, several techniques such as XenLoop and XenSocket have been proposed to improve the communication performance in a physical machine. As a result, it is common that the performance of optimized shared-memory based communication would outperform the NIC based communication.

We refer the communication between different guest VMs in the same physical machine as Inter-Domain Communication (IDC). Instead of passing the network interface card (NIC), IDC is usually implemented in shared memory in modern virtualization environments so that its bandwidth is not limited by the NIC. It is limited by the spend of CPU and Memory. Moreover, traffic via IDC can get better performance than original network because the speed of CPU is often faster than network interface card, especially in the workload of higher network I/O, such ftp, data center and web cluster.

According to our experimental results, grouping the virtual machines that communicate with one another, which are referred to as a logical group in this thesis, on the same machine helps to achieve a better performance. However, existing migration policies focus mainly on eliminating the hotspots and may separate the members (i.e., the VMs) of a logical group on different physical machines, leading to a performance degradation.

Table 1.1: Performance Results of Netperf via NIC and IDC

Throughput (Mb/sec) CPU Utilizations (%)

NIC 283.17 61.708725

IDC 1113.755 82.61254

As an example, Table 1.1 compares the performance results of netperf[30], a network benchmark, via NIC and IDC under a four-domain environment. Two netperf instances, with each of which contains a pair of netperf client and server programs, run in the environment, and each domain executes one of the programs. The IDC values show the results when all the netperf client programs communicate with their servers via IDC while the NIC show the results when all the netperf client programs communicate with their servers via NIC. As shown in the table, the throughput of IDC outperforms that of NIC by four times with the cost of about 21% CPU utilization. As another example, we run the Support test of SPECweb2005[31] with 90 sessions in the same four-domain environment. As shown in Table 1.2 and 1.3, communication via IDC can achieve a shorter response time without the extra cost of CPU resources.

Specifically, 24.6% of the response time can be reduced in average.

Table 1.2 : Response time of each request type in SPECweb via NIC and IDC Request Types

home search catalog product fileCatalog file download Average NIC

results(ms) 223 229.5 227 413.75 331.5 330.75 1757.75 415.25 IDC

results(ms) 156 158.5 156 296.5 230.75 233 1547.75 313

Table 1.3 : CPU Utilizations of SPECweb via NIC and IDC CPU Utilizations (%)

NIC 70.2125 IDC 68.9375

To take advantage of the superior performance of IDC, we propose a new VM migration policy that considers IDC when making migration decisions. The policy solve system overload while trying to keep a logical group in the same physical machine (i.e., group union). Moreover, we also design and implement an automatic load management system and integrate the proposed policy into the system.

According to the performance results, we demonstrate that the system is capable of solving multiple system overloads, and the IDC aware migration policy can achieve a superior performance by group union, when compared to the migration policy proposed by Sandpiper[24], an existing VM migration system. Specifically, the proposed policy can reduce the response time by up to 24% under the support test of the SPECweb2005 benchmark and improve the network communication performance by up to 102% under the netperf benchmark.

1.2 Structure of the Thesis

The rest of this thesis in structured as follows. Section 2 presents the related work.

Section 3 presents the design and implementation of the load management system and the IDC aware migration policy. The performance evaluation is presented in Section 4.

Finally, Section 5 gives the conclusions and the future work.

在文檔中考慮虛擬機器間傳輸量的虛擬機器搬移機制 (頁 9-13)