The RAID-5 uses erasure codes to survive one drive failure and the RAID-6 can survive the failure of two drives. Figure 2.2 and Figure 2.3 show how data are stored in RAID-5 and RAID-6. The parity operation is performed in RAID-5 systems. One parity result is additionally stored to tolerate one erasure error. In RAID-6 systems, a parity computation and a Reed-Solomon encoding are performed over the stored data. As a result, up to two erasure errors a RAID-6 system can tolerate.
Erasure codes are codes that encode the input of k symbols as a codeword of n symbols such that as long as k out of n symbols of the codeword are available, the original k symbols can be decoded back. This code can tolerate (n − k) erasure errors.
Erasure codes are applied in many networked storage systems for data robustness with lower storage overhead. The erasure codes are mainly used by the following framework for the robustness of networked storage systems against the failure of machines when a machine failure is modeled as an erasure error. Assume that there are n storage servers in a networked storage system. A user represents a file as an input of the encode algorithm, and encodes the file as a codeword. Later, each storage server stores a symbol of the codeword. As a result, as long as k out of n storage servers are available, the file can be decoded and retrieved by the user. The scalable distributed
Figure 2.2: RAID-5.
The left figure is RAID-5 with 3 drives and the right figure is RAID-5 with 4 drives.
The parity function is a bitwise XOR over the input. In a RAID-5 system, data are available as long as at most one drive crashes.
Figure 2.3: RAID-6.
The figure shows a RAID-6 with 4 drives. In a RAID-6 system, data are available as long as at most two drives crash.
storage [15] uses the Lincoln Erasure Codes, a class of erasure codes, to provide the data availability.
The central authority of a networked storage system can choose special coding method to obtain a better ability on tolerating errors or a better coding performance. We categorize the storage systems that use erasure codes by what operations the erasure codes employ.
Algebraic Operation based Erasure Codes
One of the most well-known erasure codes is the Reed-Solomon codes and the storage system in [16] uses Reed-Solomon codes to tolerate both erasure and faulty errors. A simple Reed-Solomon code is described as follows and illustrated in Figure 2.4. A message m is represented as k elements in a finite field, i.e. m = (m1, m2, . . . , mk). Consider a polynomial function f with
Figure 2.4: An encoding example of a Reed Solomon code.
The message defines a polynomial function and a codeword is defined by the values of the polynomial on n points.
Figure 2.5: An example of a storage system that uses a linear code.
The central authority encodes the messages into a codeword (C1, C2, C3) and sends a distinct symbol to a storage server for storage.
degree k − 1 where f(x) = m1+ m2x + m3x2+ · · · + mkxk−1. A codeword c with length n is (f (1), f (2), . . . , f (n)), where the polynomial function is computed over certain finite field. As long as k elements in the codeword are available, the polynomial function f can be recovered as well as the message m.
In a storage system using a linear code to provide the data robustness, the central authority can encode messages and recover them back. Figure 2.5 shows the example where there are 2 messages and 3 storage servers and the coding is operated in a finite field. After the central authority gets the two messages, he generates a generator matrix and encodes the messages via the
Figure 2.6: The EVENODD encoding.
The data are represented as a (p − 1) × p table. An entry is a bit of the data. After the encoding, the codeword is represented as a (p − 1) × (p + 2) table. Each storage server stores one column of the resulting table.
generator matrix. The codeword contains 3 symbols and each storage server stores one of them. After the central authority retrieves 2 out of 3 codeword symbols, he performs the decoding process and sends the messages back to the user.
XOR Operation based Erasure Codes
Some erasure codes are proposed for their excellent performance. They only use exclusive-or operations. As a result, the storage systems also have good
Figure 2.7: A system using the EVENODD encoding.
Each storage server stores a column of the encoded table. This storage system tolerates the failure of two servers.
performance on the data storage and retrieval processes. The EVENODD code [17, 18] and the STAR code [19] are proposed for tolerating 2 and 3 erasure errors, respectively. The encoding of the EVENODD codes and the STAR codes are efficient, but the codes can only tolerate constant number of erasure errors. Figure 2.6 illustrates the EVENODD encoding and Figure 2.7 shows how a system stores the encoded data. There are (p+2) storage servers SS1,SS2,...,SSp+2in the system and each of them stores a column data, which is p bits. The STAR codes use an additional parity column for tolerating one more erasure error. Figure 2.8 shows the STAR encoding and how a system stores the encoded data.
Many low-density parity check (LDPC) codes have also more efficient encoding and decoding algorithms than other erasure codes that use linear algebraic operations do. A networked storage system that uses a systematic LDPC code is described as follows. Figure 2.9 shows an overview of the storage system. Assume that there are n messages. The storage servers are divided into two groups R and L. The R group consists of n storage servers
Figure 2.8: The STAR encoding and a system storing the encoded data.
STAR codes tolerate the failure of 3 servers. There are 3 columns of parity bits.
Figure 2.9: A networked storage system that uses a LDPC code.
The system contains 7 storage servers and 4 out of them store the original data.
The other 3 storage server store the parity results of the data.
and each of them keeps one message. The L group has the rest of the storage servers and each of them stores a parity over a subset of n messages.
Tornado codes [20, 21] are also a class of LDPC erasure codes that have fast encoding and decoding algorithms. Tornado codes illustrated in Fig-ure 2.10 use irregular bipartite graphs as the encoding structFig-ures. The archival storage [22] uses Tornado codes as the fault-tolerance technique.
Different LDPC codes define different policies on the L group, i.e. how many and which messages are used to produce the parity. The experimental results in [23] show that how differently LDPC codes perform on the ro-bustness ability. When the number of message symbols are below 100, the
Figure 2.10: The encoding structure of Tornado codes.
Tornado codes use cascaded irregular bipartite graphs. The k input message bits are listed as k left vertices of the first bipartite graph. The right vertices of the first bipartite graph are ak parity bits. A similar encoding process is performed on the rest L − 1 cascaded bipartite graphs. The decoding algorithm is simple.
For each right node whose all but one neighbors are known, the missing neighbor can be recovered. The decoding algorithm successfully terminates when all k input message bits are recovered.
systematic LDPC codes perform better (fewer codeword symbols are required for a successful decoding). However, when the number of message symbols equals or greater than 100, irregular repeat-accumulate LDPC codes perform better.