3. Regenerating Code with Cache
3.3. System analysis
Our analyses have two parts. One is about the request and the other is about the generating of a coding block. In the section, there are some figures which show the
analyses of the hybrid scheme (See Figure 3-5 shown below), the Regenerating Code scheme (See Figure 3-6 shown below) and the Regenerating Code scheme with LRU cache scheme (See Figure 3-7 (Case 1) and Figure 3-8 (Case 2) shown below). The setting of (n, k) is (4,2) below. We analyze the differences among these schemes to explain why we design our scheme. The scenario is that there are four peers, Peer 1 to Peer 4, store individually a coding block of the file. When the Peer 4 crashed, Peer 5 is notified to generate a coding block. In addition, Peer 6 shows how to requests the file in each scheme. We want to explain the difference among these schemes.
Figure 3-5 Analysis of the hybrid scheme
In the Figure 3-5, we show the analysis of the hybrid scheme. Peer 7 is an important peer storing a replica in the hybrid scheme. When generating a coding block, Peer 7 can directly forward a coding block to Peer 5, and when requesting the file, Peer 7 also can directly forward the whole file to Peer 6. If Peer 7 crashes, Peer 6 still can access the file through communicating with two of the first three peers. But Peer 5 can’t get a coding block until the file is reconstructed by some peer. Usually the system can wait to index some peer accessed the file, like Peer 6, later; however, if
the waiting time is so long that some of the first thee peer may crash in the period, then the redundancy in the system may be not enough to create a coding block.
Therefore, sometimes the system has to maintain a whole replica additionally. The extra overhead is the drawback of the erasure coding scheme. The advantages of the scheme are that the access only requires connecting to one peer and the bandwidth cost of the generating is one coding block, a half of the file, as the replica is present.
Figure 3-6 Analysis of the Regenerating Code
In the Figure 3-6, it is the analysis of the Regenerating Code. The replica is not maintained in the system. All the requesting and the generating are finished by collecting enough coding blocks. The needed number for the requesting is two, but the number for generating is different. Peer 5 gathers three smaller coding block, a coded packet with the size of a quarter of the file. Consequently, the bandwidth cost is three fourths of the file but the cost is higher than hybrid scheme at ideal case. In the example, the number d in the Regenerating Code is 3. If d is larger, for example, in our experiment the value is 13 and (n, k) = (14,7), then the bandwidth cost will be close to the hybrid scheme where the value of our setting is 13
49 of the file and the
value of hybrid scheme is 1
7 of the file. We can find that the cost is gradually close to the size of the coding block as d is larger, but still larger than hybrid scheme.
Nevertheless, that the number d close to n is larger implies that there must be so many peers simultaneously existing in the system, which is a rigorous condition in dynamic environment. The probabilities of at least any d peers of the n peers are alive concurrently in the P2P environment at different peer availability are shown in Table 3-1 . Here the peer availability is independent and identically distributed. The last two rows are the setting in our experiment.
Table 3-1 The successful probabilities of the encoding of the Regenerating Codes
Peer availability Value of n Value of d Probability
0.9 14 13 0.5846
0.65 21 20 0.0003
0.4 42 41 close to 0
0.65 21 13 0.5237
0.4 42 13 0.7589
We consider that LRU cache can reduce the bandwidth in the system without keeping the entire file, so our scheme additionally index two peers for each file, the last accessed peer and the peer which last accessed the file but not among the registered members, to exploit the cache more sufficiently. The two cases that the LRU cache can decrease the cost are shown in Figure 3-7 and Figure 3-8. One is the indexed peers have partial blocks, the other is they have enough blocks. Another point to use the data is that there is linear relationship between the coding blocks and raw blocks. The requested coding blocks in cache will be decoded for access and then are transformed into the raw blocks. If the raw blocks and the coding blocks are
independent, the two types of the blocks also can reconstruct the file. Furthermore, if there are some coding blocks in the LRU cache of the peer generating a coding block, the peer can use these blocks to create the coded packets by itself; therefore, the peer does not need to collect d coded packets.
Figure 3-7 Analysis of the Regenerating Code with LRU cache (Case 1)
In the Figure 3-7, the indexed peers do not have enough blocks, this case is one block; as a result, they only have to communicate with fewer peers to finish their tasks.
There is still another case like this, which is that the peer last accessed the file holds partial coding blocks but Peer 6 has no blocks. In our (14,7) setting, if it has 4 coding blocks, then Peer 6 only lack 3 coding blocks after just one connection. For the cost, when the connection times become fewer, the bandwidth is less too. Although LRU cache can bring the benefit, there is a problem when the peers use the coding blocks in LRU cache. That is the peer may collect the repeated blocks or dependent blocks which can’t support the decoding, so sometimes not all the blocks in cache are helpful. The requesting peer has to check the coefficients of these blocks before
retrieving the blocks.
Figure 3-8 Analysis of the Regenerating Code with LRU cache (Case 2)
In the Figure 3-8, the indexed peers have entire file. Fortunately, Peer 5 can generate by itself, and Peer 6, like hybrid scheme, just connect to one peer to read the file.
Briefly, if the indexed peers own the whole file, the benefit is equal to save a replica the system. Even if they only have a partial file, the connection cost still goes down.