An Area-Efficient Architecture of Reed-Solomon Codec for Advanced
RAID Systems
Min-An Song, I-Feng Lan, and Sy-Yen Kuo
Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan E-mail: [email protected]
!
Abstract
In this paper, a simple codec algorithm based on Reed-Solomon (RS) codes is proposed for erasure correcting in RAID level 6 systems. Unlike conventional RS codes, here this scheme with a mathematical reduction method, called Reduced Static-Checksum Table Approach, could improve coding performance. We used Reed Solomon codes which are designed according to characteristics of advanced RAID systems to handle two disk failures in RAID system. Also, this scheme performs all computations with only simple exclusive-OR (XOR) operators just the same as Even-Odd codes. This new XOR-based RS codes could adapt to implementation in terms of improving reliability and flexibility.
1. Introduction
In storage systems, especially for large disk arrays, reliability is getting critical while storage systems scale up. In [1], it has been demonstrated that disk failures would be a daily event in petabyte-scale file systems. So, how to improve the capability of detecting or even correcting failures has been a significant issue for large storage systems. RAID systems which could be classified from level 0 to 6 are commonly used to achieve this issue. Unlike other levels, RAID level 6, or so-called RAID 6, not only provide correcting capability, but also could recover at least two disk failures simultaneously. Usually each specific algorithm in RAID 6 performs a particular parities distribution [2]. Over the last two decades, lots of erasure-correcting codes’ algorithms in RAID 6 have been proposed such as Even-Odd codes [3], Reed-Solomon (RS) codes [4], and X-codes [5].
RS code, a very popular error control code, has been studied in various applications, especially in communication systems [6]. Also, researchers have suggested some RS based solutions to avoid hazards happening in RAID-like systems [7-10], but those schemes might not be suitable to meet the desire of recovering system as quick as Even-Odd codes.
Therefore, in this paper, we present an XOR-based RS codec scheme, which uses a reduced static-checksum table approach, to manipulate the erasures-only hazard. Basically, in this new scheme, it is similar to the conventional RS codec algorithm [4] that involves pipeline procedure, which consists of Syndrome Calculation (SC),
Key Equation Solver (KES), Chien Search (CS), and Forney Algorithm (FA), but without involving the portions of either KES or CS. Both KES and CS are used to locate errors and need extra cost of finite field operators. For the erasure-only RAID system, system controller is not necessary to locate such errors since the individual disk devices have their own error-control coding mechanisms to recover from errors [2]. Moreover, usually in large disk arrays, failures of a single storage device could be detected by the storage system controllers and then could be marked as well [11]. Since device failures can be marked as erasures, erasure-correcting codes are usually employed to achieve the information recovery, the failed data in disks can be recovered and system still can work as usual without broken. Compared with the traditional RS codec scheme, a simpler scheme is proposed in this paper in terms of less cost, improving flexibility and reliability. The rest of this paper is organized as follows. Section 2 describes general ideas in our encoding algorithm for the erasure-only RS-RAID system and the main feature of our scheme, called reduced static-checksum table approach, is suggested as well. Our decoding approach will be shown in Section 3. Section 4 gives results of performance analysis and also a comparison in the number of XOR operations with the Even-Odd, the traditional RS-RAID structure, and the XOR-based RS code as well. The Hardware implementation of the proposed RS decoder/encoder is described in Section 5. Finally, Section 6 gives the conclusions.
2. RS-RAID Encoder
The encoding procedure in our scheme not only follows the rules of a mapping with systematic codes, but also builds a look-up table with an aspect of the constant multiplier. A mapping, in most RS encoders, and the encoder usually generates systematic codes, namely, message bits of a symbol could be presented explicitly in its corresponding codeword. Equation
cw
(
x
)
b
(
x
)
k n x x
m( ) shows a result after applying the systematic coding method, where cw(x), b(x), and m(x) are codeword, checksum and message respectively. W is the codeword length in the RS code [8]. If W=4, there could be 4 checksum drives and 11 data drives in this system, i.e. b(x)={C1, C2, C3, C4} and m(x)={D1 , D2 , D3 , ……. , D10 , D11}.
2.1 Basic Scheme in RS Encoding
The RS code is a class of linear block codes [4], so its computation must satisfy a linear property, that is to say, we can treat each data symbol (drive) independently. In other words, any change in each data drive would affect checksum symbols (drives) independently. Here, we deduce the linear property of our RS-RAID model using constant multipliers as follows:
g(x)) mod (x m g(x)) mod (x m g(x)) mod (x m g(x)) mod m (x g(x)) mod m (x g(x)) mod m (x g(x) mod ) x m x m x m (m x g(x) mod m(x) x b(x) 1 -n 1 -k 1 k -n 1 k -n 0 1 -k 1 -n 1 1 k -n 0 k -n 1 -k 1 -k 2 2 1 0 k -n k -n (1) For that reason, the effect of each data symbol (drive) could be computed separately to see how it works to checksum symbols first. Then the complete checksum symbols must be computed by accumulating the effects of all independent drives.
Algorithm for Building Checksum Symbols (Encoding Procedure):
Step 1. Premultiply (or shift) the message polynomial m(x) by xn-k.
Step 2. Construction of a static-checksum table: Computing the item: [mixn-k mod g(x)], where each mi
equals to multiplicative identity: 1 in GF(x), would know what the effect is in each location (drive).
Step 3. By using the table we built in Step 2, checksum symbols b(x) would be obtained by multiplying all values of static-checksum table by the practical value of m(x). It
can be presented as ( mod ( ))
0 x g x m n k mod ()) ( 1 1x gx
m nk .We are able to easily apply the constant multiplier to operate all computations after constructing the static-checksum table, because all values of the constant table from Step 2 are fixed.
2.2 Reduced Static-Checksum Table Approach
The encoding process is still very crucial due to operating too many XOR gates, even after constructing the previous look-up table. Therefore, a further work to reduce the number of required XOR gates during the encoding process is proposed in this paper. In case of GF(24 ), for
example, applying the aspects of constant multipliers with only a variable could build a table, called constant- multiplier- coefficient table ( abbreviated as CMC table),
as Table 1, where 3 4 2 3 2 1 a D a D a D a A
is the variable with 4 coefficients
a
1~
a
4, anda
1'
~
a
4'
are coefficients of A’ which is generated after being multiplied by
D
z, where z = 0Д14:Table 1. The constant- multiplier- coefficient table
Now, if we take a generator polynomial:
,
with a capability to tolerate up to two erasures, the checksums b(x)=C2x+C1could be shown as Table 2.In order to obtain
the sixth column, which indicates as the number of XOR operations after the reduction, in Table 2, our approach consists of following steps:
Step 1. For each location of the static-checksum table,
first, two values of checksums, C1 and C2, are marked.
And then in the CMC table, i.e. Table 1, each marked value could be represented as 4 parts of a single row.
Step 2. Comparing each part of the two rows, there might
be some common terms in both rows, which we marked in Step1. If so, we could reduce half of these common terms until there is no more common term between both rows.
Step 3. Finally, the value of the sixth column in Table 2
can be accumulated by the rest of XOR operations in each part of the two marked rows in Table 1.
D D D D D D x x x x x x x g 4 2 2 1 0 ) 1 ( ) )( ( ) (
Table 2. The reduced static-checksum table with m(x) = 1
For instance, to reduce location D2 in the Static-Checksum Table
Step 1. C1=ө5 and C1=ө10, therefore, we marked the
rows A*ө5 and A*
ө10.
Step 2. Through comparing the following two marked
rows,
as we can see,
a3a4, a1a3, a1a2a4, and a2a3 are all the common terms between the two rows.Hence, after applying this approach, the total XOP operations could be reduced by 5 XOR operations.
Step 3. The number of required XOR operations after
processing step 2 is 14-5 = 9.
Besides, this scheme applies the shortened code method as well to achieve a better performance on coding process [4]. With this method, active drives are placed on some exact locations first. This disk location arrangement is based on which disk costs fewer XOR-gates after our reducing approach. That is to say, in the case of Table 2, to reach higher performance of computations, the locations must be arranged with the order, D8, D13, D1, D9, etc.
Let’s assume that a message polynomial, m(x)=ʳ өʾө
4
x4, has to be stored into an empty RS-RAID in GF(24 ). And all data in checksum drives could be computed as follows: B̌ʳ ө, from the location D1, x2
of the Table 2, we could put data
D
*D
and 4*D
D into two checksum drives separately. Similarly, by
D
4, from D5, x6 in Table2, the stored data of the two checksum drives are 4* 7
D
D
and 4 9*
D
D
. Therefore, values stored in the two checksum-drives after the above processes are:C1=
(
*
)
(
4*
7)
D
D
D
D
=D
9 C2=(
*
4)
(
4*
9)
D
D
D
D
=D
7Figure 1 illustrates data placement in our RS-RAID system, where C1 and C2 are checksum drives, D1ДD13 are data drives, and for each column, values of the second row are corresponding symbols to their binary values.
Figure 1. Data allocation in RS-RAID System with Shortened Code method
3. RS-RAID Decoder
In this section, two cases of decoding algorithm are discussed over GF(24 ), and they are carried out by a solving equations method, called crammer rule, directly.
3.1 Single Failed Disk
We take 1 to be one of the roots with consecutive powers in our generator polynomial, i.e., g(x)=(x-ө0
)(x-ө1
). Therefore, in the case of single failed disk condition, the decoding would be performed as easily as the parity scheme of the RAID level 5. From the equation: Failed-Drive= S0=Ӣ(All Normal Drives), recovering the failed disk needs only to do XOR operations in the rest of active disks together. Assuming that only the data-drive D1 has been erased as Figure 2.
Figure2. A RAID with only a failed disk The original information of D1 could be recovered as: D(1,1)=C(1,1)+C(2,1)+D(5,1)=0+1+1=0
D(1,3)=C(1,3)+C(2,3)+D(5,3)=0+0+0= 0 D(1,2)=C(1,2)+C(2,2)+D(5,2)=1+1+1=1 D(1,4)=C(1,4)+C(2,4)+D(5,4)=1+1+0= 0 .
3.2 Two Failed Disks at the Same Time
In this case, in order to recover two disks which simultaneously fail, the decoding procedure in our scheme could be treated as solving a simultaneous linear equation
with two unknown variables. Here the matrix form of this equation is as follows.
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª
»
¼
º
«
¬
ª
1 01
1
S
S
B
A
j iD
D
, where i and j are both the very positions of the two failed disks in this condition, and then syndrome:
kz Z n z k S cw D 1 0
¦ is computed from all normal drives. By the crammer rule, the two variables, A and B, could be represented as follows respectively:
j i j S S A D D D 1 0 , j i i S S B D D D 1 0 . (2)
Furthermore, applying the same idea of the CMC table to build a table fulfilled with inverse-elements of (
D
iD
j) in advance would be more efficient. This table can avoid the extra cost of implementation on designing an ALU. In the circuit implementation:0
S could be computed through XOR all the normal drives. For the syndrome S1, if the implementation of S1’s hardware must be an VLSI chip, it could share the same hardware with the encoder designed, both of them could share the same circuit of the multiplier.
4. Results and Comparisons
In order to demonstrate how the encoding performance of our XOR-based RS algorithm is, we implement both CMC table and reduced static-checksum table in GF(28 ) to count the total number of XOR operators. Besides, a disk drive set {7, 11, 13, 17, 23, 29, 31, 41, 43} is our experimental example. Here, Figure 3 shows corresponding curves to Table 3.
Table 3. # of XOR gates while encoding with the XOR-based RS, the conventional RS and the Even-Odd codes # of Disk Drives Even-Odd codes XOR based Reed-Solomon codes Conventional Reed-Solomon codes 7 11 13 17 23 29 31 41 43 664 1752 2488 4344 8088 12948 14872 26232 28888 1068 3020 4392 7968 15554 25704 29700 54200 60018 954 3250 5112 10624 24442 46648 56250 124000 142002
From both Table 3 and Figure 3, we can see, the Even-Odd codes perform a more efficient encoding capability
than what the XOR-based RS code does. However our approach indeed needs less XOR operators than the conventional RS codes did in [3].
Figure 3. Curves plotted by the # of XOR gates while encoding with the XOR-based RS, the conventional RS
and the Even-Odd code.
Moreover, here Figure 4 shows that traditionally Even-Odd codes need to be implemented by coding through a 3-dimension structure while our algorithm can be easily implemented through a 2-dimension array structure. For Figure 4, if there is a byte data changed, we need to deal with eight codewords from Page 0 to Page 7. Therefore, the data update might be an overhead to Even-Odd codes, but it does not happen in our scheme because we chose 8 bits to be the length of a codeword in the 2D array structure.
Figure 4. The 3-dimension structure of Even-Odd implementation (n 8)
In order to compare RS codes with Even-Odd codes, we use Even-Odd codes proposed in [3] directly, which encodes (m-1) bytes/disk and m data drives, and the estimated number of this Even-Odd code is 8
(
2
m
2m
2
1
)
. That is there are m*(m-1) bytes will be encoded. In order to process the same amount of data, we multiply the data above by (m-1) directly, and the amount of data is also equal to m*(m-1) bytes. The comparisons are listed as follows.˃ ˈ˃˃˃˃ ˄˃˃˃˃˃ ˄ˈ˃˃˃˃ ˄ ˇ ˊ ˄˃ ˄ˆ ˄ˉ ˄ˌ ˅˅ ˅ˈ ˅ˋ ˆ˄ ˆˇ ˆˊ ˇ˃ ˇˆ ˇˉ ˇˌ ʶʳ̂˹ʳ˗˴̇˴ʳ˗̅˼̉˸̆ ʶʳ ̂˹ ʳ˫ ˢ ˥ ʳˢ ̃˸ ̅˴ ̇˼ ̂́̆ ˫ˢ˥ʳ˵˴̆˸˷ʳ˥˦ʳ˖̂˷˸̆ ˘̉˸́ˢ˷˷ʳ˖̂˷˸̆ ˖̂́̉˸́̇˼̂́˴˿ʳ˥˦ʳ˖̂˷˸̆
˥˦ʳ˶̂˷˸ʳʹʳ˘̉˸́ˢ˷˷ʳ˖̂˷˸ ˃ ˈ˃˃˃˃˃ ˄˃˃˃˃˃˃ ˄ˈ˃˃˃˃˃ ˅˃˃˃˃˃˃ ˅ˈ˃˃˃˃˃ ˆ˃˃˃˃˃˃ ˄ ˅˅ ˇˆ ˉˇ ˋˈ ˄˃ˉ ˄˅ˊ ˄ˇˋ ˄ˉˌ ˄ˌ˃ ˅˄˄ ˅ˆ˅ ˅ˈˆ ʶʳ̂˹ʳ˷˴̇˴ʳ˷̅˼̉˸̆ ʶʳ ̂˹ ʳ˫ˢ˥̆ ˥˦ʳ˶̂˷˸ ˘̉˸́ˢ˷˷ʳ˖̂˷˸
Figure 5. The number of XOR gates to encode m*(m-1) bytes
Apparently, the calculating speed of RS code is slower than Even-Odd codes. If there are 5 to 253 hard disks, the calculation amount is from 1.4 to 2.75. However, the main point mentioned here is that from the coding framework, Reed Solomon codes proposed in the paper can process data in parallel. That is because the encoding process of the proposed Reed Solomon Code can calculate the effects of each data drive to checksum drives respectively. Finally, we add the effects of each data drive to checksum drives. Hence, the framework is suitable for parallel processing. Therefore, calculating process can be speeded up and time can be saved
Table 4. A comparison sheet between RS code and Even-Odd codes
Reed Solomon
codes
Even-Odd codes
MDS code Yes Yes
Calculating Complexity Medium Easy Encoding Mapping is easy and intuitive
Mapping is done in tree dimension, hard to do data addressing. Decoding Processes are
simple
large amount of buffers (memory)
Flexibility Yes Yes
Frameworks Parallel process Multi-array parallel process Update complexity # of checksum drives >2 Fault-tolerant capability
Design free Only 2
5. Hardware Implementation of This
RS-RAID Codec
In this section, we use Altera Stratix FPGA Device (EP1S10F484C5) to implement RS Codec, Figure 6. is the Functional Diagram. Codec Encoder Decoder / / clk nrst wr_en wr_done drive_no data / fail fail_no clk nrst wr_en wr_done drive_no data clk nrst fail fail_no C1 C2 C1 C2 / / / C1 C2 data_2B data_2A data_1A 4 4 4 4 4 / /4 4 2
Figure 6. GF(24 ) Codec functional diagram
5.1 The Encoder Block
In the encoder block, we create a “const_MUL” module (a multiplication table ) that will help to generate the checksum data ( C1, C2) as soon as there is any data written to Hard Drive.
clk / / nrst wr_en wr_done drive_no data D1 D2 D12 D13 0100 0000 1101 0110 const_MUL const_MUL const_MUL const_MUL C1_d1_new C2_d1_new C1_d13_new C2_d13_new Xor Xor C1_d1_new C1_d13_new C2_d1_new C2_d13_new C1 C2 C1 C2 4 4 4 4 / / C1 C2
Figure 7. Encoder block diagram
5.2 The decoder block
The decoder block includes two sub modules: (a) FSM_decoder: It is a State Machine to control the
data path for even one or two Hard Drive data errors. (b) datapath: The data path is the function(P-G-Z
algorithmic) to calculate the correct data with C1 ,C2 and other correct Hard Drive Data when there is any Hard Drive Data failed.
Î data_1A : The correct data of the failed hard drive;
Î data_2A , data_2AB: The two correct data of the 2 failed hard drives.
clk / nrst wr_en data wr_done drive_no C1 / / / C2 fail fail_no FSM_decoder datapath data_1A data_2B data_2A data_1A data_2A data_2B 4 /4 4 4 2 / / / 4 4 4 en_1i en_1A en_2i en_2j en_S0 en_S1 en_2A en_2B
Figure 8. Decoder block diagram
During the FPGA implementation, we will use the EDA Tools in Table 5.
Table 5. EDA Tools
During the FPGA Implementation, we will use the EDA Tools in Table5. The detail is described as follows: (a) RTL Coding: We use Verilog HDL to create all the
design files.
(b) Function Simulation: Use ModelSim to verify the Codec design function.
(c) Synthesis and P&R: Use QuartusII to map the Verilog HDL format to Altera Atmos format netlist, and perform the Timing Analyzing.
(d) Timing Simulation: Use ModelSim to verify the Codec design Timing.
(e) Power Estimation: Use QuartusII to estimate the internal and I/O power.
Fig 9. FPGA floorplan
Table 6. Pin name description
Tool Name Function Description
Text Editor RTL Coding
ModelSim Function and Timing Simulatiom
Quartus II
Altera FPGA Compiler for Synthesis, P&R,Timing Analyzing and Power Estimation
Pin name I/O Description
clk I System Clock
nrst I Reset Signal
wr_en I Write Enable Signal wr_done I Write Done Signal
drive_no I Hard Drive No for Write Data data I Data for Write to Hard Drive fail I Hard Drive Fail Signal fail_no I Failed Hard Drive No
C1 O Encoder Checksum Data1
C2 O Encoder Checksum Data2
data_1A O Decoder Recovery Data data_2A O Decoder Recovery Data1 data_2B O Decoder Recovery Data2
fail[0] fail[1]
fail_no[0] fail no[2] fail no[3] data 1A[0] data 1A[1] data 1A[2] data 1A[3] data 2A[0] data 2A[1] data 2A[2] data 2A[3] data 2B[0] data 2B[1] data 2B[2] data 2B[3] C1[1] C1[0] C1[2] C1[3] C2[0] C2[1] C2[3] C2[2] fail_no[1] clk nrst wr en wr_done data no[0] data[0] data no[1] data no[2] data no[3] data[1] data[2] data[3]
Table 7 Summarizes chip characteristics and clarifies whether the structure owns the feature of power efficiency.
Table 7. Chip characteristics
Device : EP1S10F484C5
Total logic elements 1,511
Actual Time 108.66 MHz ( period = 9.203 ns )
Simulation End Time 9.0 us Simulation Netlist Size 1576 nodes Total Number of Transitions 4022
Total Power 114.48 mW
5.3 Simulation Waveform
(A) Encoder
If m(x) = өʳʾʳө̋ˇ
ʳ
Write D1: 0100; D5: 1100. Then results are C1: 0101; C2: 1101, as shown in Figure 10
Figure 10. Encoder simulation waveform
(B) Decoder
If 5th Hard Drive (Drive_NO:4) fail, then the encoder will calculate the correct data (DATA_1A:1100) with all other Hard Drive data and C1, C2. as shown in Figure 11.
Figure 11. Decoder simulation waveform
6. Conclusion
In this paper, we proposed an XOR-only RS-RAID algorithm with two auxiliary tables, the CMC table and the reduced static-checksum table, which not only constructing the XOR-based RS algorithm, but also speeding our scheme up. The above features also make those advanced RAID systems with our scheme be carried out by merely using regular industrial RAID level 5 controllers, which are capable of performing the XOR calculations very well. Therefore a lower cost controller could be applied in our RAID 6 algorithm in stead of a specific designed controller, which usually cost a lot, needed in other RAID 6 algorithms. We proposed an XOR-only RS-RAID algorithm to optimize the coding circuits is suitable for RAID Systems applications where the accuracy, power, speed, and area issues are crucial.
References
[1] Qin Xin, E.L. Miller, T. Schwarz, D.D.E. Long, S.A. Brandt, W. Litwin, “Reliability mechanisms for very large storage systems”, Mass Storage Systems and Technologies,
2003. (MSST 2003). Proceedings 20th IEEE/11th NASA Goddard Conference, 7-10 April 2003, pp.146-156. [2] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A.
Patterson, “RAID: High-Performance, Reliability Secondary Storage”, ACM Computing Surveys, June 1994,
pp. 145-185.
[3] M. Blaum, J. Brady, J. Bruck, and J. Menon, “EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures”, IEEE Transactions on Comput., Feb.
1995, pp. 192-202.
[4] Irving S. Reed, Xuemin Chen, “Error-Control Coding For Data Networks”, Kluwer Academic Publishers, 1999.
[5] L. Xu and J. Bruck, “X-code: MDS array codes with optimal encoding”, IEEE Transactions on Information
Theory, , Jan. 1999 , pp. 272-276.
[6] Telemetry Channel Coding, Recommendation for Space Data Systems Standards, CCSDS 101.0-B-3, Blue Book, Issue 3, May 1992.
[7] J.S. Plank, “Correction to the 1997 Tutorial on Reed-Solomon Coding”, Technical Report UT-CS-03-504,
University of Tennessee, April, 2003.
[8] J.S. Plank, “A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software – Practice& Experience”, September 1997, 27(9):995–1012.
[9] T.K. Truong, J.H. Jeng, T.C. Cheng, “A New Decoding Algorithm for Correcting Both Erasures and Errors of Reed-Solomon Codes”, IEEE Transactions on
Communications, March 2003, pp.381-388.
[10] D.V. Sarwate, N. R. Shanbhag, “High-Speed Architectures for Reed-Solomon Decoders”, IEEE Transactions on VLSI,
2001, pp.641-655.
[11] Lihao Xu, “Highly Available Distributed Storage Systems”, Ph.D. Dissertation, 1999.