An Area-Efficient Architecture of Reed-Solomon Codec for Advanced RAID Systems

(1)

An Area-Efficient Architecture of Reed-Solomon Codec for Advanced

RAID Systems

Min-An Song, I-Feng Lan, and Sy-Yen Kuo

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan E-mail: [email protected]

!

Abstract

In this paper, a simple codec algorithm based on Reed-Solomon (RS) codes is proposed for erasure correcting in RAID level 6 systems. Unlike conventional RS codes, here this scheme with a mathematical reduction method, called Reduced Static-Checksum Table Approach, could improve coding performance. We used Reed Solomon codes which are designed according to characteristics of advanced RAID systems to handle two disk failures in RAID system. Also, this scheme performs all computations with only simple exclusive-OR (XOR) operators just the same as Even-Odd codes. This new XOR-based RS codes could adapt to implementation in terms of improving reliability and flexibility.

1. Introduction

In storage systems, especially for large disk arrays, reliability is getting critical while storage systems scale up. In [1], it has been demonstrated that disk failures would be a daily event in petabyte-scale file systems. So, how to improve the capability of detecting or even correcting failures has been a significant issue for large storage systems. RAID systems which could be classified from level 0 to 6 are commonly used to achieve this issue. Unlike other levels, RAID level 6, or so-called RAID 6, not only provide correcting capability, but also could recover at least two disk failures simultaneously. Usually each specific algorithm in RAID 6 performs a particular parities distribution [2]. Over the last two decades, lots of erasure-correcting codes’ algorithms in RAID 6 have been proposed such as Even-Odd codes [3], Reed-Solomon (RS) codes [4], and X-codes [5].

RS code, a very popular error control code, has been studied in various applications, especially in communication systems [6]. Also, researchers have suggested some RS based solutions to avoid hazards happening in RAID-like systems [7-10], but those schemes might not be suitable to meet the desire of recovering system as quick as Even-Odd codes.

Therefore, in this paper, we present an XOR-based RS codec scheme, which uses a reduced static-checksum table approach, to manipulate the erasures-only hazard. Basically, in this new scheme, it is similar to the conventional RS codec algorithm [4] that involves pipeline procedure, which consists of Syndrome Calculation (SC),

Key Equation Solver (KES), Chien Search (CS), and Forney Algorithm (FA), but without involving the portions of either KES or CS. Both KES and CS are used to locate errors and need extra cost of finite field operators. For the erasure-only RAID system, system controller is not necessary to locate such errors since the individual disk devices have their own error-control coding mechanisms to recover from errors [2]. Moreover, usually in large disk arrays, failures of a single storage device could be detected by the storage system controllers and then could be marked as well [11]. Since device failures can be marked as erasures, erasure-correcting codes are usually employed to achieve the information recovery, the failed data in disks can be recovered and system still can work as usual without broken. Compared with the traditional RS codec scheme, a simpler scheme is proposed in this paper in terms of less cost, improving flexibility and reliability. The rest of this paper is organized as follows. Section 2 describes general ideas in our encoding algorithm for the erasure-only RS-RAID system and the main feature of our scheme, called reduced static-checksum table approach, is suggested as well. Our decoding approach will be shown in Section 3. Section 4 gives results of performance analysis and also a comparison in the number of XOR operations with the Even-Odd, the traditional RS-RAID structure, and the XOR-based RS code as well. The Hardware implementation of the proposed RS decoder/encoder is described in Section 5. Finally, Section 6 gives the conclusions.

2. RS-RAID Encoder

The encoding procedure in our scheme not only follows the rules of a mapping with systematic codes, but also builds a look-up table with an aspect of the constant multiplier. A mapping, in most RS encoders, and the encoder usually generates systematic codes, namely, message bits of a symbol could be presented explicitly in its corresponding codeword. Equation

cw

(

x

)

b

(

x

)

k n x x

m( ) shows a result after applying the systematic coding method, where cw(x), b(x), and m(x) are codeword, checksum and message respectively. W is the codeword length in the RS code [8]. If W=4, there could be 4 checksum drives and 11 data drives in this system, i.e. b(x)={C1, C2, C3, C4} and m(x)={D1 , D2 , D3 , ……. , D10 , D11}.

(2)

2.1 Basic Scheme in RS Encoding

The RS code is a class of linear block codes [4], so its computation must satisfy a linear property, that is to say, we can treat each data symbol (drive) independently. In other words, any change in each data drive would affect checksum symbols (drives) independently. Here, we deduce the linear property of our RS-RAID model using constant multipliers as follows:

g(x)) mod (x m g(x)) mod (x m g(x)) mod (x m g(x)) mod m (x g(x)) mod m (x g(x)) mod m (x g(x) mod ) x m x m x m (m x g(x) mod m(x) x b(x) 1 -n 1 -k 1 k -n 1 k -n 0 1 -k 1 -n 1 1 k -n 0 k -n 1 -k 1 -k 2 2 1 0 k -n k -n (1) For that reason, the effect of each data symbol (drive) could be computed separately to see how it works to checksum symbols first. Then the complete checksum symbols must be computed by accumulating the effects of all independent drives.

Algorithm for Building Checksum Symbols (Encoding Procedure):

Step 1. Premultiply (or shift) the message polynomial m(x) by xn-k_.

Step 2. Construction of a static-checksum table: Computing the item: [mixn-k mod g(x)], where each mi

equals to multiplicative identity: 1 in GF(x), would know what the effect is in each location (drive).

Step 3. By using the table we built in Step 2, checksum symbols b(x) would be obtained by multiplying all values of static-checksum table by the practical value of m(x). It

can be presented as ₍ _mod ₍ ₎₎

0 x g x m n k _mod ₍₎₎ ( 1 1x gx

m nk _{.We are able to easily apply the} constant multiplier to operate all computations after constructing the static-checksum table, because all values of the constant table from Step 2 are fixed.

2.2 Reduced Static-Checksum Table Approach

The encoding process is still very crucial due to operating too many XOR gates, even after constructing the previous look-up table. Therefore, a further work to reduce the number of required XOR gates during the encoding process is proposed in this paper. In case of GF(24_{), for}

example, applying the aspects of constant multipliers with only a variable could build a table, called constant- multiplier- coefficient table ( abbreviated as CMC table),

as Table 1, where 3 4 2 3 2 1 a D a D a D a A

is the variable with 4 coefficients

a

₁

~

a

₄, and

a

1

'

~

a

4

'

are coefficients of A’ which is generated after being multiplied by

D

z, where z = 0Д14:

Table 1. The constant- multiplier- coefficient table

Now, if we take a generator polynomial:

,

with a capability to tolerate up to two erasures, the checksums b(x)=C2x+C1could be shown as Table 2.In order to obtain

the sixth column, which indicates as the number of XOR operations after the reduction, in Table 2, our approach consists of following steps:

Step 1. For each location of the static-checksum table,

first, two values of checksums, C1 and C2, are marked.

And then in the CMC table, i.e. Table 1, each marked value could be represented as 4 parts of a single row.

Step 2. Comparing each part of the two rows, there might

be some common terms in both rows, which we marked in Step1. If so, we could reduce half of these common terms until there is no more common term between both rows.

Step 3. Finally, the value of the sixth column in Table 2

can be accumulated by the rest of XOR operations in each part of the two marked rows in Table 1.

D D D D D D x x x x x x x g 4 2 2 1 0 ) 1 ( ) )( ( ) (

(3)

Table 2. The reduced static-checksum table with m(x) = 1

For instance, to reduce location D2 in the Static-Checksum Table

Step 1. C1=ө5 and C1=ө10, therefore, we marked the

rows A*ө5_{and A*}

ө10_.

Step 2. Through comparing the following two marked

rows,

as we can see,

a₃a₄

,

a₁a₃

,

a₁a₂a₄

, and

a2a3

are all the common terms between the two rows.

Hence, after applying this approach, the total XOP operations could be reduced by 5 XOR operations.

Step 3. The number of required XOR operations after

processing step 2 is 14-5 = 9.

Besides, this scheme applies the shortened code method as well to achieve a better performance on coding process [4]. With this method, active drives are placed on some exact locations first. This disk location arrangement is based on which disk costs fewer XOR-gates after our reducing approach. That is to say, in the case of Table 2, to reach higher performance of computations, the locations must be arranged with the order, D8, D13, D1, D9, etc.

Let’s assume that a message polynomial, m(x)=ʳ өʾө

4

x4, has to be stored into an empty RS-RAID in GF(24 ). And all data in checksum drives could be computed as follows: B̌ʳ ө, from the location D1, x2

of the Table 2, we could put data

D

*

D

and 4

*D

D into two checksum drives separately. Similarly, by

_D

4, from D5, x6_{in Table}

2, the stored data of the two checksum drives are 4_* 7

D

and 4 9

*

D

. Therefore, values stored in the two checksum-drives after the above processes are:

C1=

(

*

)

(

4

*

7

)

D

=

D

9 C2=

(

*

4

)

(

4

*

9

)

D

=

D

7

Figure 1 illustrates data placement in our RS-RAID system, where C1 and C2 are checksum drives, D1ДD13 are data drives, and for each column, values of the second row are corresponding symbols to their binary values.

Figure 1. Data allocation in RS-RAID System with Shortened Code method

3. RS-RAID Decoder

In this section, two cases of decoding algorithm are discussed over GF(24 ), and they are carried out by a solving equations method, called crammer rule, directly.

3.1 Single Failed Disk

We take 1 to be one of the roots with consecutive powers in our generator polynomial, i.e., g(x)=(x-ө0

)(x-ө1

). Therefore, in the case of single failed disk condition, the decoding would be performed as easily as the parity scheme of the RAID level 5. From the equation: Failed-Drive= S0=Ӣ(All Normal Drives), recovering the failed disk needs only to do XOR operations in the rest of active disks together. Assuming that only the data-drive D1 has been erased as Figure 2.

Figure2. A RAID with only a failed disk The original information of D1 could be recovered as: D(1,1)=C(1,1)+C(2,1)+D(5,1)=0+1+1=0

D(1,3)=C(1,3)+C(2,3)+D(5,3)=0+0+0= 0 D(1,2)=C(1,2)+C(2,2)+D(5,2)=1+1+1=1 D(1,4)=C(1,4)+C(2,4)+D(5,4)=1+1+0= 0 .

3.2 Two Failed Disks at the Same Time

In this case, in order to recover two disks which simultaneously fail, the decoding procedure in our scheme could be treated as solving a simultaneous linear equation

(4)

with two unknown variables. Here the matrix form of this equation is as follows.

»

¼

º

«

¬

ª

»

¼

º

«

¬

ª

»

¼

º

«

¬

ª

1 0

1

1 S

S

B

A

j i

_D

D

, where i and j are both the very positions of the two failed disks in this condition, and then syndrome:

kz Z n z k S cw D 1 0

¦ is computed from all normal drives. By the crammer rule, the two variables, A and B, could be represented as follows respectively:

j i j S S A D D D 1 0 , j i i S S B D D D 1 0 _{. (2)}

Furthermore, applying the same idea of the CMC table to build a table fulfilled with inverse-elements of (

D

i

D

j) in advance would be more efficient. This table can avoid the extra cost of implementation on designing an ALU. In the circuit implementation:

0

S could be computed through XOR all the normal drives. For the syndrome S₁, if the implementation of S₁’s hardware must be an VLSI chip, it could share the same hardware with the encoder designed, both of them could share the same circuit of the multiplier.

4. Results and Comparisons

In order to demonstrate how the encoding performance of our XOR-based RS algorithm is, we implement both CMC table and reduced static-checksum table in GF(28 ) to count the total number of XOR operators. Besides, a disk drive set {7, 11, 13, 17, 23, 29, 31, 41, 43} is our experimental example. Here, Figure 3 shows corresponding curves to Table 3.

Table 3. # of XOR gates while encoding with the XOR-based RS, the conventional RS and the Even-Odd codes # of Disk Drives Even-Odd codes XOR based Reed-Solomon codes Conventional Reed-Solomon codes 7 11 13 17 23 29 31 41 43 664 1752 2488 4344 8088 12948 14872 26232 28888 1068 3020 4392 7968 15554 25704 29700 54200 60018 954 3250 5112 10624 24442 46648 56250 124000 142002

From both Table 3 and Figure 3, we can see, the Even-Odd codes perform a more efficient encoding capability

than what the XOR-based RS code does. However our approach indeed needs less XOR operators than the conventional RS codes did in [3].

Figure 3. Curves plotted by the # of XOR gates while encoding with the XOR-based RS, the conventional RS

and the Even-Odd code.

Moreover, here Figure 4 shows that traditionally Even-Odd codes need to be implemented by coding through a 3-dimension structure while our algorithm can be easily implemented through a 2-dimension array structure. For Figure 4, if there is a byte data changed, we need to deal with eight codewords from Page 0 to Page 7. Therefore, the data update might be an overhead to Even-Odd codes, but it does not happen in our scheme because we chose 8 bits to be the length of a codeword in the 2D array structure.

Figure 4. The 3-dimension structure of Even-Odd implementation (n 8)

In order to compare RS codes with Even-Odd codes, we use Even-Odd codes proposed in [3] directly, which encodes (m-1) bytes/disk and m data drives, and the estimated number of this Even-Odd code is 8

(

2 m

2

m

2

1 )

. That is there are m*(m-1) bytes will be encoded. In order to process the same amount of data, we multiply the data above by (m-1) directly, and the amount of data is also equal to m*(m-1) bytes. The comparisons are listed as follows.

˃ ˈ˃˃˃˃ ˄˃˃˃˃˃ ˄ˈ˃˃˃˃ ˄ ˇ ˊ ˄˃ ˄ˆ ˄ˉ ˄ˌ ˅˅ ˅ˈ ˅ˋ ˆ˄ ˆˇ ˆˊ ˇ˃ ˇˆ ˇˉ ˇˌ ʶʳ̂˹ʳ˗˴̇˴ʳ˗̅˼̉˸̆ ʶʳ ̂˹ ʳ˫ ˢ ˥ ʳˢ ̃˸ ̅˴ ̇˼ ̂́̆ ˫ˢ˥ʳ˵˴̆˸˷ʳ˥˦ʳ˖̂˷˸̆ ˘̉˸́ˢ˷˷ʳ˖̂˷˸̆ ˖̂́̉˸́̇˼̂́˴˿ʳ˥˦ʳ˖̂˷˸̆

(5)

˥˦ʳ˶̂˷˸ʳʹʳ˘̉˸́ˢ˷˷ʳ˖̂˷˸ ˃ ˈ˃˃˃˃˃ ˄˃˃˃˃˃˃ ˄ˈ˃˃˃˃˃ ˅˃˃˃˃˃˃ ˅ˈ˃˃˃˃˃ ˆ˃˃˃˃˃˃ ˄ ˅˅ ˇˆ ˉˇ ˋˈ ˄˃ˉ ˄˅ˊ ˄ˇˋ ˄ˉˌ ˄ˌ˃ ˅˄˄ ˅ˆ˅ ˅ˈˆ ʶʳ̂˹ʳ˷˴̇˴ʳ˷̅˼̉˸̆ ʶʳ ̂˹ ʳ˫ˢ˥̆ ˥˦ʳ˶̂˷˸ ˘̉˸́ˢ˷˷ʳ˖̂˷˸

Figure 5. The number of XOR gates to encode m*(m-1) bytes

Apparently, the calculating speed of RS code is slower than Even-Odd codes. If there are 5 to 253 hard disks, the calculation amount is from 1.4 to 2.75. However, the main point mentioned here is that from the coding framework, Reed Solomon codes proposed in the paper can process data in parallel. That is because the encoding process of the proposed Reed Solomon Code can calculate the effects of each data drive to checksum drives respectively. Finally, we add the effects of each data drive to checksum drives. Hence, the framework is suitable for parallel processing. Therefore, calculating process can be speeded up and time can be saved

Table 4. A comparison sheet between RS code and Even-Odd codes

Reed Solomon

codes

Even-Odd codes

MDS code Yes Yes

Calculating Complexity Medium Easy Encoding Mapping is easy and intuitive

Mapping is done in tree dimension, hard to do data addressing. Decoding Processes are

simple

large amount of buffers (memory)

Flexibility Yes Yes

Frameworks Parallel process Multi-array parallel process Update complexity # of checksum drives >2 Fault-tolerant capability

Design free Only 2

5. Hardware Implementation of This

RS-RAID Codec

In this section, we use Altera Stratix FPGA Device (EP1S10F484C5) to implement RS Codec, Figure 6. is the Functional Diagram. Codec Encoder Decoder / / clk nrst wr_en wr_done drive_no data / fail fail_no clk nrst wr_en wr_done drive_no data clk nrst fail fail_no C1 C2 C1 C2 / / / C1 C2 data_2B data_2A data_1A 4 4 4 4 4 / /4 4 2

Figure 6. GF(24 ) Codec functional diagram

5.1 The Encoder Block

In the encoder block, we create a “const_MUL” module (a multiplication table ) that will help to generate the checksum data ( C1, C2) as soon as there is any data written to Hard Drive.

clk / / nrst wr_en wr_done drive_no data D1 D2 D12 D13 0100 0000 1101 0110 const_MUL const_MUL const_MUL const_MUL C1_d1_new C2_d1_new C1_d13_new C2_d13_new Xor Xor C1_d1_new C1_d13_new C2_d1_new C2_d13_new C1 C2 C1 C2 4 4 4 4 / / C1 C2

Figure 7. Encoder block diagram

5.2 The decoder block

The decoder block includes two sub modules: (a) FSM_decoder: It is a State Machine to control the

data path for even one or two Hard Drive data errors. (b) datapath: The data path is the function(P-G-Z

(6)

algorithmic) to calculate the correct data with C1 ,C2 and other correct Hard Drive Data when there is any Hard Drive Data failed.

Î data_1A : The correct data of the failed hard drive;

Î data_2A , data_2AB: The two correct data of the 2 failed hard drives.

clk / nrst wr_en data wr_done drive_no C1 / / / C2 fail fail_no FSM_decoder datapath data_1A data_2B data_2A data_1A data_2A data_2B 4 /4 4 4 2 / / / 4 4 4 en_1i en_1A en_2i en_2j en_S0 en_S1 en_2A en_2B

Figure 8. Decoder block diagram

During the FPGA implementation, we will use the EDA Tools in Table 5.

Table 5. EDA Tools

During the FPGA Implementation, we will use the EDA Tools in Table5. The detail is described as follows: (a) RTL Coding: We use Verilog HDL to create all the

design files.

(b) Function Simulation: Use ModelSim to verify the Codec design function.

(c) Synthesis and P&R: Use QuartusII to map the Verilog HDL format to Altera Atmos format netlist, and perform the Timing Analyzing.

(d) Timing Simulation: Use ModelSim to verify the Codec design Timing.

(e) Power Estimation: Use QuartusII to estimate the internal and I/O power.

Fig 9. FPGA floorplan

Table 6. Pin name description

Tool Name Function Description

Text Editor RTL Coding

ModelSim Function and Timing Simulatiom

Quartus II

Altera FPGA Compiler for Synthesis, P&R,Timing Analyzing and Power Estimation

Pin name I/O Description

clk I System Clock

nrst I Reset Signal

wr_en I Write Enable Signal wr_done I Write Done Signal

drive_no I Hard Drive No for Write Data data I Data for Write to Hard Drive fail I Hard Drive Fail Signal fail_no I Failed Hard Drive No

C1 O Encoder Checksum Data1

C2 O Encoder Checksum Data2

data_1A O Decoder Recovery Data data_2A O Decoder Recovery Data1 data_2B O Decoder Recovery Data2

fail[0] fail[1]

fail_no[0] fail no[2] fail no[3] data 1A[0] data 1A[1] data 1A[2] data 1A[3] data 2A[0] data 2A[1] data 2A[2] data 2A[3] data 2B[0] data 2B[1] data 2B[2] data 2B[3] C1[1] C1[0] C1[2] C1[3] C2[0] C2[1] C2[3] C2[2] fail_no[1] clk nrst wr en wr_done data no[0] data[0] data no[1] data no[2] data no[3] data[1] data[2] data[3]

(7)

Table 7 Summarizes chip characteristics and clarifies whether the structure owns the feature of power efficiency.

Table 7. Chip characteristics

Device : EP1S10F484C5

Total logic elements 1,511

Actual Time 108.66 MHz ( period = 9.203 ns )

Simulation End Time 9.0 us Simulation Netlist Size 1576 nodes Total Number of Transitions 4022

Total Power 114.48 mW

5.3 Simulation Waveform

(A) Encoder

If m(x) = өʳʾʳө̋ˇ

ʳ

Write D1: 0100; D5: 1100. Then results are C1: 0101; C2: 1101, as shown in Figure 10

Figure 10. Encoder simulation waveform

(B) Decoder

If 5th Hard Drive (Drive_NO:4) fail, then the encoder will calculate the correct data (DATA_1A:1100) with all other Hard Drive data and C1, C2. as shown in Figure 11.

Figure 11. Decoder simulation waveform

6. Conclusion

In this paper, we proposed an XOR-only RS-RAID algorithm with two auxiliary tables, the CMC table and the reduced static-checksum table, which not only constructing the XOR-based RS algorithm, but also speeding our scheme up. The above features also make those advanced RAID systems with our scheme be carried out by merely using regular industrial RAID level 5 controllers, which are capable of performing the XOR calculations very well. Therefore a lower cost controller could be applied in our RAID 6 algorithm in stead of a specific designed controller, which usually cost a lot, needed in other RAID 6 algorithms. We proposed an XOR-only RS-RAID algorithm to optimize the coding circuits is suitable for RAID Systems applications where the accuracy, power, speed, and area issues are crucial.

References

[1] Qin Xin, E.L. Miller, T. Schwarz, D.D.E. Long, S.A. Brandt, W. Litwin, “Reliability mechanisms for very large storage systems”, Mass Storage Systems and Technologies,

2003. (MSST 2003). Proceedings 20th IEEE/11th NASA Goddard Conference, 7-10 April 2003, pp.146-156. [2] P.M. Chen, E.K. Lee, G.A. Gibson, R.H. Katz, and D.A.

Patterson, “RAID: High-Performance, Reliability Secondary Storage”, ACM Computing Surveys, June 1994,

pp. 145-185.

[3] M. Blaum, J. Brady, J. Bruck, and J. Menon, “EVENODD: An efficient scheme for tolerating double disk failures in RAID architectures”, IEEE Transactions on Comput., Feb.

1995, pp. 192-202.

[4] Irving S. Reed, Xuemin Chen, “Error-Control Coding For Data Networks”, Kluwer Academic Publishers, 1999.

[5] L. Xu and J. Bruck, “X-code: MDS array codes with optimal encoding”, IEEE Transactions on Information

Theory, , Jan. 1999 , pp. 272-276.

[6] Telemetry Channel Coding, Recommendation for Space Data Systems Standards, CCSDS 101.0-B-3, Blue Book, Issue 3, May 1992.

[7] J.S. Plank, “Correction to the 1997 Tutorial on Reed-Solomon Coding”, Technical Report UT-CS-03-504,

University of Tennessee, April, 2003.

[8] J.S. Plank, “A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems. Software – Practice& Experience”, September 1997, 27(9):995–1012.

[9] T.K. Truong, J.H. Jeng, T.C. Cheng, “A New Decoding Algorithm for Correcting Both Erasures and Errors of Reed-Solomon Codes”, IEEE Transactions on

Communications, March 2003, pp.381-388.

[10] D.V. Sarwate, N. R. Shanbhag, “High-Speed Architectures for Reed-Solomon Decoders”, IEEE Transactions on VLSI,

2001, pp.641-655.

[11] Lihao Xu, “Highly Available Distributed Storage Systems”, Ph.D. Dissertation, 1999.

An Area-Efficient Architecture of Reed-Solomon Codec for Advanced RAID Systems