• 沒有找到結果。

1.1 Overview

The subject of this thesis is regarding the performance of content search in unstructured peer-to-peer (P2P) systems. Such systems have been used for a variety of applications, including content distribution, file-sharing and video streaming. These applications have been very popular; today’s Internet traffic is mostly contributed by these services with number of users typically in millions.

A P2P network is a logical overlay network on top of a physical network. Each peer corresponds to a node in the P2P network and resides in a node in the physical network. All peers are of equal roles. Figure 1.1 shows the overview of the P2P architecture. The physical path is determined by a routing protocol and composed of one or more physical links. Logical links can be added to the P2P network arbitrarily as long as a corresponding physical path can be found, that is, the physical network is connected.

Figure 1.1 A P2P architecture overview.

2

1.2 Motivation and Purpose

Structured P2P overlay networks can provide efficient and accurate query service but need a lot of effort to maintain the DHT, it leads to frequent peer joining and leaving, also known as churn. Churn is a common phenomenon in P2P overlay networks. Measurement studies of deployed P2P overlay networks show a high rate of churn [21], [22].

Unstructured P2P overlay networks organize peers into an arbitrary network topology, and use flooding or random walks to look up data items. Each peer receiving the flooding packets or random walk packets checks its own database for the data item queried. This approach does not impose any constraint on the network topology. It can perform complex data lookup and support peer heterogeneity. Unstructured P2P networks are resilient to churn. However, queries in unstructured systems can generate a lot of traffic load, thus making such systems unscalable.

Although random walk certainly reduces the amount of traffic, it incurred by queries for files that are available in the network, i.e., for files that are stored by at least one peer [6].

Random walk search protocols still result in a large amount of traffic load when the requested resource does not exist in the overlay.

Unfortunately, searches for files that are not in the system are very common in practice [7].

W. Acosta and S. Chandra observed that roughly half of the queries (between 44% and 55.6%) cannot be matched to any file in the system. One solution used in practice to reduce the amount of query traffic generated by queries for unavailable files is to set the time-to-live (TTL) of query packets to a small value.

However, searching with a small TTL value will only search a small part of the peers in the system and queries are likely to be unsuccessful, even if the requested file is actually available in the network. Measurement studies on actual unstructured P2P networks observed that the query success rate is very small, typically close to 10% [24].

3

Therefore, neither structured P2P overlay networks nor unstructured P2P overlay

networks can provide efficient, flexible, and robust service at the same time. The motivation of this thesis is to combine the two types of P2P networks and provide a hybrid approach which can support scalability and reliability at the same time. To achieve this goal, the approach should inherit the advantages of both types in such a way that their disadvantages are minimized.

In this thesis, we propose Multi-hop Index Replication that can improve search quality for rare objects while minimizing the overhead incurred by participating peers. Not only does index replication incur much lower overhead compared with data replication, previous work has shown it to be effective at improving the scalability of unstructured networks [6], [12], [22].

Our Multi-hop Index Replication with PDG forwarding algorithm eliminates the impacts of redundant query flooding messages and reduces the amount of network traffic generated by searches for unavailable files between the super-peer and ordinary peer layer of the P2P system.

In the proposed hybrid P2P system, bootstrap peer has to maintain a super-peer table. Any peer joining the P2P network and wishing to become a super-peer must first issue a request to the bootstrap peer (BSP). After examining each requesting peer’s bandwidth conditions, the BSP may select the peer as a super-peer, and send the peer the corresponding connection information or register the peer as a redundant super-peer, and provide the peer with a list of super-peers which can be used to connect to the system or just only provide a list of super-peers.

The joining procedure involved in overlay network is shown in Figure 1.2.

4

Figure 1.2 Bootstrapping a new peer.

When the overlay topology has been established, a pure PDG [8-9] forwarding algorithm is used to transport the query messages from the originating super-peer to the other super-peers in the overlay in such a way that each super-peer receives just one message. In addition, each super-peer has to maintain an AVL tree-based index. AVL tree-based index is constructed with a randomly generated key which is the resource name published by the ordinary peers through the SHA1-liked algorithm. Besides, its average case complexity of search, insert, and delete operations is O (log n), where n is the number of sharing files in the overlay network.

In addition, our system also can support to be a storage system. By multi-layer and multi-hop architecture, we can allocate more powerful super-peer (MSP) with large storage space, computational capability and higher bandwidth to manage a whole PDN cluster, and MSP still form a Perfect Difference Network (PDN) in order to further enhance reliability and scalability.

5

1.3 Thesis Organization

The main contributions of this thesis can be summarized as follows:

 We propose a novel and efficient search mechanism for multi-layer unstructured P2P systems, and show that it is not only reliable, i.e., if any content is in the system, it successfully locates all files and the search should success with reasonable guarantees, but also scalable, i.e., the network traffic generated by queries will be limited by super-peer. In addition, we can use more powerful super-peer to connect to other Perfect Difference Network (PDN).

 Multi-hop Index with PDG flooding algorithm eliminates the impacts of redundant query flooding messages and reduces the amount of network traffic generated by searches for unavailable files between the super-peer and ordinary peer layer of the P2P system.

 Evaluate the proposed scheme through extensive simulations. Performance evaluation demonstrates that the proposed flooding algorithm outperforms existing unstructured P2P overlay network in terms of a higher query success ratio, a lower number of lookup query flooding messages and a lower average delay.

We organized the remaining thesis as follows: In Chapter 2 we present the background of the literature relating to structured P2P and unstructured P2P systems. In Chapter 3, we discuss the proposed hybrid P2P system and describe our methods in representing a numeric range and the analytic models. In Chapter 4, we evaluate the effectiveness of our methods and discuss the performance of the system. Finally we give the concluding remark and future work in Chapter 5.

6

相關文件