Tunnel Creation and Release - 低負荷虛擬機器內部通訊

3.4 Implementation

3.4.4 Tunnel Creation and Release

Figure 7 shows the flow of a successful tunnel creation. First, a tunnel is created by the tunnel manager of the sender’s guest OS when the IVC is identified as local. Each tunnel contains a number of memory pages and is initially mapped into the virtual address space of

the sender’s guest OS. Then, the protocol monitor asks the socket event manager to send an EVENT_CREATE_TUNNEL event to the receiver(s). When the receiver gets the event, it

obtains the tunnel information from the VMM and tries to map the tunnel into its address space. Note that the mapping may fail, as shown in Figure 8. Under this situation, the receiver sends back an EVENT_REJECT_TUNNEL event to the sender.

Figure 7 Message Sequence Chart for a Successful Tunnel Creation

Figure 8 Message Sequence Chart for a Failed Tunnel Creation

The tunnel manager releases a tunnel when the sender or all the receivers close the tunnel.

When a sender closes a tunnel, the tunnel manager will send an EVENT_SEND_CLOSE _TUNNEL event to all the receivers. Once a receiver is notified by the event, it should get the

remaining data from the tunnel and then unmap the tunnel. The tunnel manager of the last receiver is responsible for sending an event EVENT_ALL_RECV_GET_DATA back to the tunnel manager of the sender when it unmaps the tunnel. Once the event is received, the tunnel manager of the sender will actually release the tunnel.

When a receiver actively closes a tunnel, the tunnel manager of the receiver will unmap the tunnel and send an event EVENT_RECV_ CLOSE_TUNNEL to the sender, which will modify the reference count of the tunnel. When the reference count reaches zero, the tunnel

manager of the sender will release the tunnel.

3.4.5 Tunnel Protocol

After both the sender and the receiver map a tunnel into their address spaces, they can follow the tunnel protocol to perform data communication. Before the description of the protocol, we describe the structure of a tunnel first, which is shown in Figure 9. In order to reduce the synchronization time between the sender and the receivers, we implement multiple channels in a tunnel. Therefore, the sender can write data to one channel while the receivers can read data from the other channel. Currently, a tunnel has four channels, each of which is a memory page.

Figure 9 Tunnel Structure

Each channel has a header to keep the information about the tunnel. The header includes three fields: gen_num, size and pend_read. Similar to the concept of TCP sequence number, the sender maintains a sequence number which is increased by 1 whenever it begins to put data into a channel. After the sequence number is increased, the gen_num field of the channel

is set to the value of the sequence number to indicate the sequence of the data. Similar to the concept of ACK sequence number used in TCP, each receiver also maintains the next generation number that it expects to see. Before reading data from a channel, the receiver checks if the number equals to the gen_num field of the channel. If it does, the receiver can read data from the channel. The size field indicates the size of data in the channel. The pend_read field represents the pending readers of the channel and the channel should not be

reused by the sender until the value of this field becomes zero. The field is set as the number of receivers by the sender when the channel is full or the sender has no more data to put into the channel. While a receiver gets all the data in the channel, it will decrease the value by 1.

After all the receivers are done, the field will become zero and the channel can be used again by the sender.

The gen_num field of a channel is increased by 1 before the sender writes any data to the channel. Then, the sender copies data into the channel and sets the size field. After the data copy completes, the sender increases the pend_read field by the number of the receivers and notifies the receivers immediately if the channel is full or no more data needs to be sent. If the channel is full or more data needs to be sent, the sender notifies the receiver by event, then switches to the next channel and repeats the above job until the next channel is still locked from the sender.

Each receiver maintains the wait_gen, which is the next generation for waiting. When the wait_gen is equal to the gen_num and the channel is not empty and, it gets data from the

channel. When all data is gotten for a receiver, the pend_read will be checked. If it is not zero, it will decrease pend_read by one. Once the pend_read reaches to zero, and the receiver will send an EVENT_ RECV_GET_DATA event to the sender. After decreasing pend_read, the size field. Since pend_read is not equal to zero and size is not equal to MAX_CHANNEL_SIZE only if sender is no more data to send, receiver can know data is sent completely.

3.4.6 Event Manager

The event manager is designed by two levels architecture. When a component wants to send an event to other domain, it will use the event interface which is provided by socket event manger (SEM), and each event is implemented by a hypercall³, and the event is passed into domain event manger (DEM) which is implemented in the Xen. Then, DEM signals the destination domain by emulating interrupt signals which provided by Xen. However, the interrupt can not carry any information, so the socket event manager (DEM) which is implemented in the guest OS get the event information by hypercall get_event_information()

3.4.6.1 Domain Event Manager

The DEM is implemented in Xen, and is responsible for dispatching events to the corresponding domains. The event dispatching is based on IP-to-domain mappings maintained in Xen. Specifically, when a domain sends an event to another domain, the destination IP address will be passed to Xen for looking up the destination domain. Then, the event will be inserted into an event list of the destination domain, waitting for the SEM to get it.

Since events can happen frequently, the Xen-domain mode switches caused by the events will lead to a large overhead. We utilize three mechanisms to reduce the overhead.

First, since those events are not ordered, the DEM can store all the events for a socket in an per-socket event_info structure which is allocated in Xen when a tunnel is created. When an event is inserted into the structure, an event_flag field of the structure is set to indicate that there are pending events in this structure. When the SEM invokes the get_event_information() hypercall, all the pending events are returned to the SEM. Thus, a number of mode switches

3 Similar to system call interface provided by an OS, hypercall is an interface provided by Xen to allow domains to request Xen to perform privileged operations.

can be eliminated.

Second, all event_info are linked in the event_list. Xen must search all list when it wants to find a event_info. To reduce the search overhead, we implement the event_hint in DEM.

For each domain, DEM stores the pointer of the first corresponded event_info and the total number of the corresponded event_info, which are in the event_list. When a SEM calls get_event_information(), the DEM just gets event from the event_hint, and searches event_list

to the next event if there is still a remaining event. Even it has to search event_list, it only needs to search from the event, which is store in the event_list.

3.4.6.2 Socket Event Manager

The SEM is implemented in the guest OS and is responsible for dispatching events to the corresponding sockets. It provides an interface so that other components can send and receive event. Each event corresponds to a hypercalls. When an event-related interrupt is triggered, ghost OS will invoke the get_event_information() hypercall to get the event_info, which contains the socket IP addresses, port numbers and other necessary information.

After the SEM gets the event_info by get_event_information(), it will map the IP addresses and the port numbers to the real memory address of the socket. We do this by the tcp_v4_lookup() functions which is provided by Xenolinux. Then do corresponding operations which are provided by other components.

CHAPTER 4 PERFORMANCE EVALUATION

在文檔中低負荷虛擬機器內部通訊 (頁 23-30)