Chapter 2 Background
2.1 Peer-to-Peer Network
2.1.1 Introduction
In the traditional network architecture, services must be provisioned by specific machine which is connected to network with sufficient resource. Usually the machine is called a “server”, which provides contents such as texts, pictures or multimedia streams which allow other machines, called “clients”, to retrive. The model in which a “client”
accesses data provided by a “server” is called “Client-Server” architecture.
In the Client-Server communication model, a server could be a traffic bottleneck of the service operation. It is not possible for a server to provide service to a huge number of clients due to the restriction of resources such as computing power, network bandwidth, and storage size of the server. Even worse, if the server crashes, all services to the client will be terminated. To reduce the impact of server failure, the so called server cluster or server farm has been used to avoid the single point of failure problem. However, as the number of clients becomes huge, such as the case in Google, Yahoo, and other popular web services, millions of clients may access the server at the same time, so that thousands of servers are required to provide the access service. Therefore, the cost of servers in such system would be too high to be affordable. To reduce the server cost and yet still provide the services to large number of clients, the so-called Peer-to-Peer (P2P) model was developed in late 1990s. Basically, in P2P network, the role of server is replaced by clients.
Actually, the members in a P2P network play the roles of both client and servers simultaneously. They provide contents to and obtain contents from other peers.
2.1.2 P2P on file-sharing
P2P is used for many aspects, one of the P2P application is used for file-sharing.
Every peer has the ability of server which can easily provide contents without assistance of centralized server. There is no restriction on what contents a peer can publish and what contents can be obtained from which peer. As a result, P2P becomes the most popular platform for file-sharing due to the nature of anonymity.
P2P system has become the greatest platform to obtain contents, but it still needs a mechanism for finding out the desired files.
Figure 2.1 The centralized P2P architecture.
The easiest way is to place a centralized server for query regarding the location of the file. As Figure 2.1 shows, peer A and B inform the centralized server that they have “File 1”
and “File 2” respectively. Then, peer C asks the server “where is File 1”. The centralized server tells peer C which then download file 1 from peer A. This type of P2P sytem is called centralized architecture.
The centralized P2P architecture still has bottleneck on the server. If the central server gets broken, the P2P system will be out of services. As a result, researchers proposed a so-called “purely decentralized” architecture. As Figure 2.2 shows, peer C intends to get
“File 1”, it broadcasts query message to all its directly connected peers A, B, D and E, which do not have “File 1” so they broadcast the message to their directly connected peers, recursively. Finally, peer G has “File 1” which is then sent to Peer C.
Figure 2.2 Purely decentralized P2P architecture.
The purely decentralized system can result in unnecessary packet flooding in the network. Works have been done to combine both centralized and decentralized schemes to take advantage of them.
2.1.3 BT revolution
BitTorrent (BT) [1] is a technology used for distribution of files much efficiently and fast. Since BT was developed in 2001, it has become more and more popular on file-sharing application and attracted a large number of users in the Internet world.
The most significant contribution of BT is breaking a file into many smaller pieces in fixed size, called chunks, for sharing. A user can download a file from other peers who own the different chunks of the file simultaneously. The novel idea greatly improves download efficiency and reduces lots of downloading time.
The “Tracker Server” in BT system logs who are downloading the file and helps peers find them out. Once a peer intends to download a file, it needs a “torrent” file which contains the file name, size, hashing information and the URL of the tracker server. As a result, a peer can make a connection to the tracker server by the indication of the torrent file.
Figure 2.3 BT architecture.
As Figure 2.3 shows, every peer in BT has to communicate with tracker server, such as peer A, B, C and R. Suppose that peer A is the “seed”, which is the source of the file. At the moment peer R intends to download the file, it asks tracker server and get a response which indicates that peer A has all chunks, peer B has chunk 2 and 6, peer C has chunk 3 and 4. Now, peer R can receive chunk 1, 5 from A, chunk 2, 6 from B and chunk 3, 4 from C simultaneously. With the aid of tracker server, peer can download different chunks of the file from different peers and greatly improve the efficiency on downloading a file.
The presence of BT causes a large impact on file-sharing applications and other
aspects. The concept of “chunk” is dividing a file into smaller pieces. This is used for many areas. Some P2P IPTV systems employ the concept of BT and divide video stream into fix-sized chunks for peer sharing. We will discuss P2P IPTV system in more detail in the following sections.