Fault Recovery - DESIGN AND IMPLEMENTATION

CHAPTER 4 DESIGN AND IMPLEMENTATION

4.2 Fault Recovery

Briefly speacking, we use four techniques to achieve the goal of fault recovery. First, we develop a protocol to create/suspend/resume the backup server. During normal operation, the backup server is suspended so that it does not contend CPU resources with the primary server.

Once the primary fails, the backup server is resumed to take over the job of the primary.

Second, we provide a log buffer in VMM which allows us to store the connection state without communicating with the backup server. Third, we provide a fault detection mechanism, which can detect application or operating system faults and then trigger the recovery job. Finally, we provide a recovery mechanism to recover the service state.

Figure 6. An Overview of Fault Recovery

We give a brief overview of the recovery flow first, which is shown in Figure 6, before the detailed description of our fault recovery techniques. Before starting an Internet service, the administrator starts a backup server, including the backup OS and service application.

Then, in order to supply the primary server with the whole system resources, the backup server releases the resources such as CPU time it holds. The primary server then does the normal operations and logs the connection information in the Virtual Machine Monitor (VMM). When a fault is detected, VMM wakes up the backup server and recovers the service state so that the system can provide the service continually.

In the following, we will describe the details of the techniques. Section 4.2.1 describes the way to boot a second OS instance and release the system resource used by the second OS instance. The flow of logging connection state is presented in Section 4.2.2. Section 4.2.3 describes the fault detection mechanism, and the recovery flow is presented in Section 4.2.4.

4.2.1 Backup Server Boot-up

Figure 7 The Flow of Booting a Backup Server

We define a protocol to manage the boot up of the backup server. The protocol involves the VMM and three domains: control, primary and backup, which implement the protocol based on the API, as shown by Table 1, provided by the framework.

Table 1 System Calls Provided by OZS

Figure 7 shows the flow of booting up a backup server. Originally, Xen only allows the control domain to boot up other domains. In order to enable an authorized primary server to boot up its backup, we allow the administrator to register the primary servers that has the right to boot up their backups. Specifically, the administrator can register an entry for each primary server that has that right in the backup-grant table in advance. The table is stored in VMM and managed by the protocol manager, and the registration is done by calling the sys_ins_auth() system call in the control domain. When a primary server boots a backup

server, the protocol manager will check if the primary server has the grant.

The primary server calls the sys_boot_backup_server() system call to ask Xen to create the backup. As mentioned above, the protocol manager checks to see if the primary server is granted to boot its backup. If it is, the protocol manager asks Xen to create the backup domain .

Originally, Xen gives an unique IP address to each guest OS so that each domain can communicate with external machines. This results in a longer recovery time since the backup

server has to take over the IP address of the primary server when the latter crashes. Thus, we provide a sys_change_backup_ip() system call to allow the primary and backup servers to share a single IP address. When the system call is invoked by the primary server, a signal will be sent to the backup server through the VZS, and the backup server will get the primary IP address from the VZS and change its IP address accordingly. The IP address changing is done by a user-level task which invokes a shell command - ipconfig.

After the IP address is changed, the backup server should release its CPU time so that it will not affect the performance of the primary server. This is done by calling the sys_suspend_backup() system call by the primary server. When the system call is invoked,

Xen will remove the backup server task from the run queue of Xen.

From the above description we can see that, although the system calls are implemented in the OZS, most of them require cooperation from the VZS. The communication between OZS and VZS is through hypercalls and events.

4.2.2 Connection State Logging

FT-TCP provides a log buffer to record the connection state of the primary server. When the primary server crashes, the backup server will use the data in the log buffer to recover the system. In our design, we also provide a log buffer which does not lose data even when the primary server crashes. We use a memory area of the primary server as the logger buffer.

During the recovery period, backup server will remap the log buffer into its virtual address space and recover the service state accordingly. In the following, we describe how to implement the log buffer in our framework.

In order to let guest operating systems manage memory conveniently, Xen provides an illusional memory area, a continuous range of physical addresses, for each guest OS. However, physical address is not real machine address. Therefore, there are two problems deserving to be mentioned. First, as mentioned above, the backup server has to map the log buffer into its

virtual address space. This mapping requires the starting machine address of the log buffer.

However, a guest OS does not manage machine addresses directly. Thus, we lookup the page table of the guest OS, which is updated by Xen, to get the machine address of the log buffer.

Once the address is obtained, the OZS issues a hypercall to Xen in order to register the address. As a result, the backup server can get the machine address of the log buffer during the recovery period.

Second, if a primary server crashes, its memory area (including the log buffer) will be released by Xen. To avoid releasing the memory before recovering the service, we increase the reference count that corresponds to the primary server by 1 after booting the primary.

After the service recovery, the reference count is decreased by 1 and the resources held by the primary server can be released.

4.2.3 Fault Detection

Software faults, which cause the system become unavailable, can happen on service applications and the operating system. In the following, we describe how to detect the faults.

Figure 8 Detecting Application Faults

When a fault occurs on an application, the kernel usually invokes the do_exit() function

to kill the application process. As shown in Figure 8(a), two paths lead to the invocation of do_exit(). One is that application detects the fault itself and calls the sys_exit() system call,

which in turn calls do_exit(). The other is that kernel detects the application fault and sends a signal to kill the application process. In this case, kernel calls do_exit() through sig_exit().

Originally, we can intercept do_exit(), by kernel binary instrumentation, to detect the faults.

However, such callee-based instrumentation requires more efforts. Therefore, we use the caller-based instrumentation approach instead. As shown in Figure 8(b), the health monitor intercepts the exit() system call and the sig_exit() function, which only requires modifying the destination addresses of two jump instructions.

Figure 9 Detect Kernel Fault

In addition to application faults, operating system faults may also occur. To detect such faults, we inserted a heartbeat generator in the primary server domain and a heartbeat checker in Xen. At each timer interrupt, the former sends a heartbeat to Xen by increasing the value of the heartbeat counter variable by one, which is shared by the primary server domain and Xen.

The latter checks the variable at each timer interrupt to detect operating system faults. If the value remains the same during two timer interrupt periods, the operating system is regarded as failure, and the checker notifies the recovery manager to recover the system. It is worth noting that the heartbeat mechanism is implemented based on shared memory instead of hypercall, and thus it eliminates the overhead of frequent privilege mode crossings.

4.2.4 Recovery Flow

Figure 10 Recovery Protocol

When a fault is detected, the recovery manager will follow the recovery protocol to recover the system. Figure 10 illustrates the recovery protocol, which is divided into three steps. First, the recovery manager must change the network path so that incoming packets

which are originally delivered to the primary server will now be delivered to the backup server. Xen stores IP-to-domain mappings for each domain (i.e., in the net_schedule_list list) in order to perform packet delivery, and thus the network path changing can simply be done by updating the mapping that corresponds to the IP address of the backup server. Second, the recovery manager must wake up the backup server so that the backup server can take over the job of the primary server. Third, the recovery manager must send a signal to notify the backup server to recover the system. When receiving the signal, the kernel subsystem in the backup server will obtain the machine address of the log buffer through a hypercall, remap the log buffer, and then execute the FT-TCP recovery flow.

It is worth mentioning that, if the fault does not crash the kernel of the primary domain, we can change the IP address and the packet delivery path (in Xen) so that a system administrator can connect to the faulty server to diagnosis the reason of the fault.

在文檔中虛擬機器支援無資料遺失之網際網路服務系統 (頁 23-30)