• 沒有找到結果。

1.1. Preface

In the age of knowledge explosion, information growth in the Internet is very fast.

Computers today need more powerful processing ability and larger storage capacity to face the information growth. Powerful workstations or servers having limited scalability for storage and limited computing power for service can’t satisfy the requirement, whereas Peer-to-Peer storage system supporting high scalability, good performance and fault tolerance becomes a practical and better solution. However, it is not trivial to design a Peer-to-Peer storage system. In the paper [32] , they indicate that there are several techniques to design a Peer-to-Peer storage system, namely data redundancy, data placement, failure detection and data maintenance. In this thesis, we focus on the data redundancy which considers how to use optimal space and bandwidth cost in Peer-to-Peer storage system to approach the goal of data availability.

1.2. Motivation

Although there are many designs for Peer-to-Peer (P2P) storage systems, such as OceanStore[28] , CFS[19] , PAST[31] and Total Recall[23] , we are interested in the application of P2P storage system. Multimedia Content Discovery and Delivery (mCDN) architecture [33] is an interesting application where P2P storage system plays an important role. mCDN architecture is a new architecture for Content delivery networks (CDNs), and can combine many CDN services to support a variety of business models. It defines three layers (Shown in Figure 1-1 Layered mCDN

architecture, from [33] ). First, Content Service Layer contains all entities providing functionality to the end users of mCDN and content service. Second, Content Distribution Layer is responsible for distributing and maintaining content and metadata. Third, Network Infrastructure Layer contains all components observing network state and trigger mCDN related work events.

Figure 1-1 Layered mCDN architecture. From [33]

A P2P storage system can be used in Content Distribution Layer to provide content service. It can place optimally the content to each peer and allow users access quickly the requested content. For P2P storage system, the data availability will impact directly the service quality, so we are concerned about it in system design. In the web application, the data availability that end users expect is at least three nines (99.9%) [34] . To achieve high availability, it can’t avoid spending more storage space.

Replication scheme and erasure coding scheme were common methods to create redundant data before, but recently Network Coding is gradually popular and is

applied to generate redundant data. Regenerating Code [8] utilizes the property of Network Coding to solve some problems occurring in erasure coding, but meanwhile their scheme is not efficient for requesting. It is very critical for web application to supply fast response to users, so our work is to overcome the drawback to make their scheme more practical in web application.

1.3. Problem Description

For the data redundancy research, the previous comparisons concentrated on replication and erasure coding and the two targets, low storage cost and low bandwidth cost, are the key analysis points in the design of P2P storage system.

Unlike to replication scheme just distributing full replica to other peers, erasure coding scheme requires an encoding process before sending coding blocks. The replication scheme is simpler, but erasure coding scheme offering better storage efficiency is the winner in their comparisons; nevertheless, erasure coding scheme has to exist a full file before the system start to generate new coding blocks, which is a limitation of redundancy mechanism causing erasure coding scheme to keep additionally one full replica in system, which is called hybrid scheme. Hybrid scheme including the advantage of two schemes reduces the cost of bandwidth and storage space and becomes the best one in later analysis.

Regenerating code breaks the limitation by using the linear relationship of Network Coding, which is that coding blocks can be generated through collecting enough existing coding blocks. They surprisingly show the result and find that hybrid scheme complicates the redundancy management and has the bottleneck in disk I/O because storing the full replica, so advocate that coding scheme should only manage one type of redundancy. In their analysis, they conclude that their scheme spend lower cost of storage and bandwidth than hybrid scheme but is useful for backup application

due to lower access performance. Besides, we consider that their generating method has a condition, which is that there must be enough, more than the same setting of erasure coding, peers alive in P2P storage system. In the dynamic environment, the peers will join and leave at any time, so the success probability of their method is lower. For example, we suppose there are 14 coding blocks allocated to 14 different peers in a P2P storage system, and then the probability of 7 of the 14 peers concurrently occurring is higher than the probability of 13 of the 14 peers.

In short, our work is to extend their scheme to support efficient data service in web application.

1.4. Research Objective

Through the observation above, the problem is how to extend their scheme in request aspect by following their central idea, only storing one type of redundancy.

We start at this point and put in other idea next. In the research [27] , they take user download behavior into account. As a result, the system only pays little overhead to make hybrid scheme get better. Similarly, we consider that even the system just owns coding blocks, but in runtime a whole file will be reconstructed in the system when some peer wants to read it. In real system, each peer usually allocates some storage space for its LRU cache to save used data. The full copy will take place there, so our research objective is how to exploit it before it is replaced in cache and the peer leaves the P2P system.

1.5. Research Contribution

We implement our scheme which lets Regenerating Code system more useful for content service in a discrete-event packet level simulator, p2psim [21] . Our scheme basically follows the Regenerating Code scheme and sets a LRU cache with some

fixed size to each peer in the P2P environment. At each access, the system records the information of peers last accessed files and then that will be the first choice when the system wants to generate new coding block or some peer wants to access a file. We do some experiments with two parameters, cache size and peer availability. The experiment results showed that in different peer availability the LRU cache with size of 64 blocks can help to accelerate the most access in Regenerating Code system. In addition, we find that in P2P environment even though we set the LRU cache size very large the benefit is limited because each peer may leave and carry the data away.

Finally, even if the file blocks in cache are not enough to reconstruct the original file, the peer still can decrease the connection times. Then the bandwidth cost is down and the success probability of encoding is higher since the peer only needs to connect to fewer peers.

On the other hand, based on our research, we consider that it is possible that P2P storage systems don’t need to additionally maintain a full file. Although the authors of Regenerating Code want to use the Regenerating Code to achieve the goal, the access performance of the scheme is not acceptable for content service. In our research, we overcome the drawback and then the goal is possible.

1.6. Thesis Outline

We arrange the remainder of this thesis as follows. In chapter 2, we introduce some associated knowledge and show the details of each redundancy scheme, which will assist to understand and analyze the redundancy policy. In chapter 3, we describe how we use the LRU cache to decrease the cost of access and maintenance. In chapter 4, we explain the design of experiments and analyze the improvement. In chapter 5, we discuss how to use P2P storage system in mCDN architecture. Finally, in chapter 6, we conclude our design and analysis.

相關文件