Introduction - 一個獨特的閘道器架構用來管理使用動態連接埠的點對點連線

Over the last few years, peer-to-peer (P2P) file sharing has grown astonishingly in the Internet. System administrators used to manage Internet traffic by identifying it according to fixed well-known port numbers. The management includes blocking traffic of specific applications or redirecting the connections to the proxy that performs various kinds of content filtering such as virus scanning. Nonetheless, the identification for P2P traffic is non-trivial because most P2P applications may use dynamic ports, i.e. dynamically selected ports rather than fixed well-known ports.

The P2P traffic can be identified either by examining packet payloads [1] or analyzing the connection pattern at the transport layer [2] . Both approaches demand a connection to be established between two peers before the identification. But for P2P management, how to redirect the connection from the kernel to an application proxy to perform content filtering after the connection has been established is rarely addressed in both research and industry fields to the best of our knowledge.

This work designs a novel gateway architecture to manage P2P traffic. The management objectives in the architecture cover (1) connection classification, or identification of P2P applications, (2) filtering undesirable P2P applications, (3) virus scanning for P2P shared files, (4) filtering and auditing of chatting messages and transferred files and (5) bandwidth control of the P2P traffic.

The L7-filter [3] serves as a connection classifier that identifies P2P applications according to the signatures in the application-layer messages. The identification is executed in the kernel space because it is a simple signature matching from the first few bytes. Objectives (2) and (5) follow immediately by referring to the identification results. However, Objectives (3) and (4) typically involve more complex content processing and filtering from data assembled from packets; thus they are better

executed in the user space. The latter requires connection redirection from the kernel to the user space. This work designs a new mechanism to address this problem in the software architecture.

A connection is marked after being classified. Only the packets of the marked connection are queued in the kernel. The queued packets are then duplicated to the user space, where the proxy program performs the necessary content filtering to decide whether to pass or drop the packets in the kernel queue. Because the proxy receives raw packets from the kernel, there might be packet out-of-order problem and the proxy should perform TCP reassembly. Since the proxy handles queued packets sequentially, the time-consuming content filtering may causes head-of-line blocking in the kernel queue, where the packets of other connections are queued behind the packet being examined in the proxy. This work thus proposes a mechanism which uses two packet queues to handle the foregoing situations. The entire mechanism of this architecture is called P2P Proxy Mechanism and the proxy program is called P2P Proxy.

P2P Proxy Mechanism is implemented in Linux kernel 2.6.8. A modified queue handler, ip_queue, queues the packets with two packet queues and then the library libipq [4] are modified to let the proxy manipulate packets in theses two kernel

queues. The proxy is multi-threaded. The main thread handles packet arrivals, and the others handle specific application protocols and perform content filtering.

In this work, we want to answer the following questions: (1) What is the overhead of P2P Proxy Mechanism compared with that of the simple port-redirect architecture? (2) What is the main bottleneck of this system? (3) What is the difference of performance between P2P Proxy Mechanism and virus scanning?

The rest of this work is organized as follows. Chapter 2 surveys present P2P applications and lists our management objects. Chapter 3 presents our ideas and

system architecture. The implementation details, including the selected packages and thread implementation details are illustrated in Chapter 4. Chapter 5 discusses the performance of our system. We conclude the study in Chapter 6.

Chapter 2 Survey and Problem Statement

2.1 Related Works

Research about P2P traffic mostly emphasizes on connection classification to date. Lots of them only consider the traffic on fixed port [5] [6] [7] . Recent works try to identify P2P traffic which uses dynamic ports. Two major approaches are examining the bit strings in the packet payloads [1] and analyzing the P2P flows at the transport layer according to the connection patterns of P2P Network [2] . Both demand a connection to be established between two peers before the identification.

The former can identify the P2P protocol by matching its signatures, but it can do that only if the signatures are known. The latter can identify unknown P2P traffic, but it cannot decide immediately whether this connection is some P2P since it needs the statistics of flows for a while. This method cannot point out exactly what application the connection belongs to.

Some open source packages, such as L7-filter [3] and IPP2P [8] , are also developed to identify P2P traffic. They are both classifiers that inspect the packet payload in the Linux Netfilter [9] subsystem. The L7-filter uses Netfilter’s connection-tracking module and only checks the first eight packets for the application data when a connection is established. If the application data matches the signature, it marks the entire connection as identified by the connection-tracking module. Wile IPP2P checks every packet, this is because it does not adopt connection-tracking module. The other difference is that the signatures of IPP2P are hard-coded but L7-filter can load signatures from files. Therefore, inspecting fewer packets and dynamically loading signatures gives L7-filter higher performance and better scalability than P2PADM. This work presents a complete architecture integrated with the L7 filter for various kinds of P2P management objectives.

For proxy architecture research, there are some research for improving proxy performance, little research has been done on topic development in improvement of TCP splicing for application proxy performance with kernel support[10] [11] . There is also study in reducing overheads to minimize system costs [12]

2.2 Problem Statement

After P2P connection classification, blocking undesirable applications and bandwidth control on specific applications can be enforced, which all are done within the kernel. However, there is still no content management processing for the P2P traffic. The difficulty is how to redirect the connection from the kernel to an application proxy to perform content filtering after the connection has been established and classified in the kernel. Way to solve this problem and what kinds of management objectives can be reached are described in this work.

2.3 P2P and IM Applications Overview

Popular P2P applications include eDonkey[13] , BitTorrent [14] , FastTrack[15]

Gnutella[16] , etc. Besides, file transfer in the Instant Messenger (IM), say MSN[17] , also works in the P2P mode. Most P2P applications use dynamic ports to circumvent filtering firewalls. Table 1 summarizes the characteristics of these applications.

These P2P applications have two modes when transferring files. One is sequential transfer, which means a peer receives a file sequentially from another peer.

The other is segmented transfer, which means that the segments of a file can be received out-of-order. System administrators may want to scan the transferred file for viruses and record what files are transferred. The data cannot be segmented out-of-order or encrypted in order to perform virus-scanning or recording. According to Table 1, these two actions can only be done for FastTrack, MSNFTP and Gnutella.

If the file name is visible, filtering the file name which contains specific keyword is

possible. Enterprises may not want employees to leak out confidential information by a chatting system like the IM. Therefore, filtering the sensitive keywords or recording the message is needed. Table 2 lists the possible management objectives for each application protocol. The proposed architecture intends to implement these management objectives.

Table 1: The characteristics of P2P and IM applications.

Application Protocol FastTrack eDonkey BitTorrent Gnutella MSN MSNFTP*

Is file transfer sequential? Yes No No Yes N/A Yes

Protocol message encryption

Yes No No No No No

Data transfer encryption No No No No No No

Can use dynamic port? Yes Yes Yes No No Yes

File name visibility? Maybe No Yes Yes Yes No

Default ports 1214 4661-4665 6881-6889 6346-6347 1863 No default

*MSNFTP is a file transfer protocol of MSN. N/A = not available

Table 2: Management objectives for each application protocol

Application Protocol FastTrack eDonkey BitTorrent Gnutella MSN MSNFTP

Connection classification O O O O O O

Filtering undesirable applications O O O O O O

Virus scanning O X X O N/A O

Filtering and auditing of chatting messages and transferred files

O X X O O O

Bandwidth control O O O O O O

在文檔中一個獨特的閘道器架構用來管理使用動態連接埠的點對點連線 (頁 7-13)