Introduction - 在點對點閘道器上整合與加速以內容為基礎的辨別及管理系統

Over the last few years, peer-to-peer (P2P) file sharing has grown astonishingly to dominate the Internet traffic [1]. Managing P2P traffic efficiently and effectively thus becomes an important issue. System administrators used to manage Internet traffic by classifying it according to fixed well-known port numbers. The management includes blocking traffic of specific applications or redirecting the connections to a proxy that performs various kinds of content filtering such as virus scanning.

Nonetheless, the classification for P2P traffic is non-trivial because most P2P applications may use dynamic ports, i.e. dynamically selected ports rather than fixed well-known ports. Therefore, P2P applications should be classified according to the signatures in the application-layer messages [2]. The classification is traditionally executed in the kernel space because it is simple signature matching from the first few bytes of the content. However, the management such as filtering transferred files and scanning viruses on P2P shared files involves complex content processing of the data assembled from the packets. Thus it looks natural to be executed in the user space.

Although executed in the user space, the P2P management tools, such as InstantScan and P2PADM [3], need to exchange data between the kernel space and the user space. The data exchange, however, is a costly overhead involving the memory copy between the kernel space to the user space. In fact, the overhead also exists in web server packages, e.g. HTTPd. To reduce the overhead, an in-kernel package kHTTPd (http://www.fenrus.demon.nl/) moves HTTPd into the kernel space to directly handle requests in kernel. This approach avoids the data exchange and indeed provides higher performance than a user-space HTTP daemon.

This work attempts to avoid the data exchange in P2PADM. We move the P2PADM package from the user space to the kernel space. The implementation is

based on P2PADM because of the availability of its source code. We also address two weaknesses of P2PADM: the reconnection issue and non-deterministic delay due to out-of-order packets. The reconnection issue occurs because some P2P applications, say eDonkey, or users will persistently try to reconnect to the peers while P2PADM blocks their connection establishment. The reconnection keeps sending the same requests in a short period because P2PADM always blocks them. These useless requests will reduce the performance. Non-deterministic delay from out-of-order packets occurs because P2PADM must queue the out-of-order packets in order to handle those packets and so the packet delivery time to its peer is non-deterministic.

For the reconnection issue, this work designs a connection cache to handle the packets for reconnection. For non-deterministic delay, the proposed architecture passes the out-of-order packets immediately.

The rest of this work is organized as follows. Chapter 2 introduces P2PADM and indicates its problems. Chapter 3 presents the proposed solutions and architecture.

Chapter 4 discusses the performance of the proposed system. Chapter 5 concludes the study.

Chapter 2 Related Works

2.1 Introduction to P2PADM

P2PADM is a novel gateway architecture to manage P2P traffic. The management objectives in the architecture cover (1) connection classification of P2P applications, (2) filtering undesirable P2P applications, (3) virus scanning for P2P shared files, (4) filtering and auditing of chatting messages and transferred files, and (5) bandwidth control of P2P traffic.

Fig. 1 illustrates the architecture of P2PADM. The kernel queues the packets of the classified connections identified by the L7-filter. A main thread in the proxy gets packets from the queue in the kernel by invoking the libipq library (http://www.cs.princeton.edu/~nakao/libipq.htm) and performs the pre-processing tasks, such as checksum examination, packet classification and TCP sequence handling. The main thread then calls a specific application thread to handle the tasks related to the application protocol. Each application thread is responsible for a specific connection and decides to pass or drop the packets in the connection.

Figure 1. Implementation of P2PADM system architecture

2.2 Problems of P2PADM

According to the description in Section 2.1, P2PADM gets packets from the kernel queue by libipq. Libipq is a development library for iptables (http://www.netfilter.org/projects/iptables/index.html) and it provides an API to communicate with the ip_queue kernel module that registers with Netfilter (http://www.netfilter.org/) to pass packets between the kernel space and the user space.

Therefore, P2PADM must do context switching between the kernel and user modes and copy data from the kernel space to the user space for managing P2P traffic. The impact of copying data can reduce the performance of P2PADM.

According to the previous benchmark result in [3], we are aware that the heavy use of libipq on P2PADM reduces the throughput by about 120 Mbps. Because libipq is responsible for copying data from the kernel space to the user space, reducing the heavy use of libipq can increase the throughput of P2PADM. We also find a few additional SYN packets sent by the P2P applications, say eDonkey, or users in the peer for reconnection while P2PADM blocks the establishment of a connection. They will be sent several times because the peer cannot establish the connection successfully. These useless SYN packets are always handled and dropped by P2PADM all the time, and reduce the performance of P2PADM.

Furthermore, we also find that there are some non-deterministic delays from out-of-order packets. All out-of-order packets cannot pass P2PADM because P2PADM blocks them for assembling them completely and manages them effectively.

Because P2PADM is a transparent proxy to the peers, the non-deterministic delay not from the network transport but from the blocking of P2PADM should be avoided.

2.3 Related in-kernel Solutions

kHTTPd is an http-daemon for Linux and is different from other http-daemons in

that it runs within the Linux-kernel as a module. kHTTPd handles only static Web pages, and passes all requests for non-static information to a user-space Web server such as Apache. Since virtually all images are static and a large portion of HTML pages are also static, the improvement is significant. Static Web pages are not difficult to serve because the delivery of static objects from a Web server is simply a “copying file to network” operation. The Linux kernel is very good at this, and so as an in-kernel daemon, kHTTPd can gain better performance five times than other user-space http-daemons (http://www.fenrus.demon.nl/performance.html). Whether the complex management tasks on P2PADM can be performed entirely in the kernel in a similar way is interesting.

Chapter 3 System Architecture Design

3.1 Solutions and the Proposed Architecture

On P2PADM, the connections of P2P applications have been classified in the kernel space by L7-filter and managed in the user space. All the packets of each connection must pass through the user space, so the processing in the user space may become a bottleneck. Like kHTTPd that moves the code from a user-space daemon to a kernel-space module, this work also moves the code of P2PADM from the user space to the kernel space and then evaluates the improvement in performance.

Moreover, we design a connection cache to solve the reconnection issue.

Because all packets of reconnection have the same (1) source IP address, (2) destination IP address, (3) destination port number (4) protocol id, and perhaps (5) source port number. The connection cache can easily identify a reconnection by keeping the five tuples of a blocked connection, and block it before P2PADM.

Furthermore, non-deterministic delays from out-of-order packets can be solved by duplicating the packets once in the gateway and fast passing the out-of-order packets to the destination instead of queuing them in P2PADM. The receiver can receive the out-of-order packets and send triple ACKs to the sender to invoke the retransmission. Because the retransmission is invoked by triple ACKs rather than TCP timeout, the non-deterministic delays will be shortened when the packets are lost.

Fig. 2 illustrates the operation of the proposed architecture. The entire architecture is called kP2PADM. The letter ‘k’ in the prefix of kP2PADM is used to differ it from the previous one because most functional modules of the proposed architecture are in the kernel space.

Figure 2. The proposed architecture of kP2PADM

In the beginning, all packets can pass through the in-kernel connection cache because the connection cache is empty. The L7-filter then performs connection classification in the kernel. The L7-filter collects at most the first eight packets to reassemble an application message and does signature matching. If the connection is identified by the L7-filter, it will be marked by a predefined application identifier. The kernel can filter the undesirable applications and do bandwidth control according to this predefined application identifier. The packets are then transferred to be pre-processed, say checksum check, connection identification, and TCP sequence handling. kP2PADM must occasionally call the schedule function to surrender the CPU control to other processes to avoid starvation. The schedule function is a Linux kernel function in schedule.c for process scheduling. The CPU control will come back to kP2PADM if no other processes demand the CPU. After kP2PADM finishes packet pre-processing and calls a specific AP module to handle the related packets. Each AP module is a kernel module responsible to set verdict to these related packets. All the handling of kP2PADM is in the kernel space except virus scanning. kP2PADM calls the call_usermodehelper function to invoke virus scanning in the user space, and blocks the Linux kernel until virus scanning is finished. To prevent from long

blocking, the file is scanned piece by piece. After scanning a piece of data, P2PADM calls the schedule function and may surrender the CPU control to the kernel or the other processes. The virus scanning of kP2PADM has been not implemented yet and to be completed in the future.

The connection cache is filled with a new 5-tuple value of a denied connection by kP2PADM. A reconnection of a denied connection will then be blocked by the connection cache rather than kP2PADM because the connection cache has recorded the denied connection.

3.2 In-kernel Management Architecture

3.2.1 Possible Approaches to Move User Modules to Kernel Space

There are two ways to move the code of P2PADM from the user space to the kernel space: one is coding functions in the iptables extended match module (http://www.netfilter.org/documentation/HOWTO/netfilter-hacking-HOWTO.html) and the other is modifying the code of P2PADM to be a new kernel thread. The former codes the classification and management functions, such as connection classification, checksum check, TCP sequence handling, content filter, message log and the proposed connection cache in the iptables extended match module. A new filter rule including all the handling of kP2PADM is registered to the hook of Netfilter framework and all packets traverse through the filter rule. The latter creates a new kernel thread to run kP2PADM. The new kernel thread acquires packets from ip_queue that queues the packets identified by L7-filter and handles the operation of kP2PADM by itself. Therefore, the classification functions are coded in iptables extended match modules and the management functions are coded in the new kernel thread. We choose the latter in this work because the second way makes each function module simple and clear.

After choosing creating a new kernel thread to run kP2PADM, there are some implementation issues like how to transfer a system call from the user space to the kernel space? For example, the read system call is supported by the kernel for acquiring data from I/O device, but it cannot be called in kP2PADM because kP2PADM is a kernel process. A kernel process cannot call almost any system call because almost all system calls are implemented to be called by user programs and some handling in the system calls is unnecessary for kernel process, say data copy from the user space to the kernel space, and vice versa. Therefore, the read system call must be modified. Fortunately, the implementation of the read system call calls vfs_read for acquiring data from I/O devices and vfs_read is an EXPORT_SYMBOL kernel function [4]. An EXPORT_SYMBOL kernel function can be called by any kernel process. Therefore, kP2PADM can call vfs_read instead of system call read.

Through this way of modifying user-space functions to kernel-space function, we can make P2PADM run in the kernel space.

3.2.2 Functional Modules in kP2PADM

The proposed architecture differs from P2PADM at the aspect of management.

The packet pre-processing functions, such as checksum check, connection identification and TCP sequence handling, are moved to the kernel space, application protocol processing to a kernel module, and virus scanning left in the user space. The processing of application protocols is moved to a kernel module instead of the kernel because the application protocol processing may be sometimes updated and the modification of kernel module is faster than the modification of kernel. Leaving virus scanning in the user space is better than moving it to the kernel space because virus scanning takes much time and may block the kernel.

3.2.3 Packet Flow in kP2PADM

Fig. 3 illustrates the packet flow in kP2PADM. First, we create a new kernel

thread before the kernel invokes the init process. The kernel thread runs kP2PADM and is terminated when Linux is shut down. The in-kernel management architecture waits for new connections and calls the schedule function to surrender the CPU utilization to other processes to avoid starvation. After accepting a new connection, kP2PADM maintains the data structure of the connection socket. kP2PADM can do the I/O operations with the data structure rather than rely on the functions at the higher layers. After performing the pre-processing tasks, say packet classification and handling TCP sequence, kP2PADM signals the specific application thread to handle the packet. The application thread then sets verdict to the packet. The entire flow of an application thread is the same as that in P2PADM. The virus scanning of kP2PADM has been not implemented yet and to be completed in the future.

Figure 3: Packet flow in the new architecture 3.3 Connection Cache

To design the connection cache, we must consider that what information should to be kept. In the beginning, the packets in all connections can pass through the connection cache and be processed by kP2PADM because no connections have been

marked as denied ones. If kP2PADM decides to block a connection, then its source IP address, source port number, destination IP address, destination port number, and protocol id will be stored into connection cache. The packets having the same source IP address, destination IP address, destination port number and protocol id are viewed as in the reconnection, even though their source port numbers may be different. For example, BitTorrent (http://www.bittorrent.com/) changes to different source port number if connection is blocked. We believe that kP2PADM can identify reconnections through these tuples. A reconnection of a denied connection can be quickly dropped by the connection cache without being processed by kP2PADM and the performance of P2PADM can be improved.

3.4 Fast Pass

A more efficient way to handle the out-of-order packets is to duplicate them once in the gateway and pass them immediately so that the peer can receive the complete file early. If any packet is lost, the receiver can receive the out-of-order packets and send triple ACKs to the sender to invoke the retransmission rather than let these out-of-order packets be queued in gateway and the retransmission be invoked through TCP timeout. Because the retransmission is invoked by triple ACKs rather than TCP timeout, the non-deterministic delays will be shortened when the packets are lost.

However, the retransmission may be redundant if the out-of-order packet is not made by packet loss. The redundant retransmission will decrease the throughput of kP2PADM. Besides, out-of-order packets passing without content filtering may escape the rule examination and result in false negatives. The probability of false negatives is very low in reality because kP2PADM still scans the out-of-order packet;

nevertheless, it does not scan the content between packets. Fortunately, a signature between two packets is not frequent. Transfer time and false negatives are trade-off in

the design, and we will evaluate whether fast pass is a good design or not in this work

Chapter 4 Performance Evaluation

4.1 Benchmarking Environment

In this chapter, we perform various benchmarks on the kP2PADM system.

kP2PADM is installed on a PC with Pentium III 1GHz CPU, 512 MB SDRAM and 20GB hard disk. Fig. 4 illustrates the benchmark environment. In this environment, there are two HTTP clients and three Web servers. Each client creates one hundred threads and each thread downloads a 2MB files from these three web servers. This means that these two clients download totally 1GB data from the Web servers through kP2PADM.

Figure 4: Benchmark environment of kP2PADM

There are two reasons why we use HTTP traffic instead of real P2P traffic to benchmark kP2PADM. First is that there are no such benchmark tools which can generate P2P traffic. The second is that many P2P applications like FastTrack and Gnutella use HTTP protocol to transfer files. Therefore, using HTTP traffic to simulate P2P traffic is acceptable.

4.2 Comparison with Original Proxy Architecture 4.2.1 Throughput and CPU Utilization of kP2PADM

Throughput and CPU utilization are two import performance measures of a gateway system. The following configurations are compared to understand the impact on performance from each component.

(1) NAT: the pure NAT function.

(2) NAT + packet queue: Besides NAT, every packet is queued in the kernel.

kP2PADM just tells the kernel to pass the packets without any further processing.

(3) NAT + packet queue + L7: Besides NAT + packet queue: The L7-filter is enabled with 20 rules. The entire process is similar to NAT + packet queue. The difference is that only HTTP is processed. This configuration is used to assess the performance impact from the L7-filter.

(4) P2P proxy + Filter: P2P proxy integrates NAT + packet queue + L7 and all pre- processing of P2P management. This configuration enables filtering transferred files according to the file name.

(5) P2P proxy + Log file: P2P proxy with the auditing function on transferred files. It records the transferred files into the file system.

(6) P2P proxy + Virus scan: P2P proxy with the virus scanning function on transferred files.

(7) P2P proxy + Filter + Log file + Virus scan: P2P proxy with all the above functions enabled.

Fig. 5 and 6 show the throughput and CPU utilization on P2PADM and kP2PADM under every configuration. Fig. 6 also plots not only the entire CPU utilization but also the CPU utilization for the kernel. In a gigabit network environment, pure NAT can reach the throughput about 266.13 Mbps on both P2PADM and kP2PADM. NAT + packet queue reduce the throughput to 155.24 Mbps and the CPU has been fully used by P2PADM, but on kP2PADM, NAT + packet queue only reduce the throughput

在文檔中在點對點閘道器上整合與加速以內容為基礎的辨別及管理系統 (頁 8-0)