A
VERY LOW-COST MULTI-MODE REED SOLOMON
DECODER BASED ON PETERSON-GORENSTEIN-
ZIERLER ALGORITHM
Sheng-Feng WangDepartment of Electrical Engineering, National Taiwan University
Taipei 106, TAIWAN, R.O.C.
Huai-Yi Hsu and An-Yeu Wu Institute of Electronics Engineering, National Taiwan University Taipei 106, TAIWAN, R.O.C.
Abstract Reed-Solomon (RS) codes play an important role in providing error protection and data integrity. Among various RS decoding algorithms, the Peterson-Gorenstein-Zierler
(PGZ) in general has the least computational complexity for small t values. However, unlike the iterative approaches (e.g., Berlekump-Mussey algorithm), it will encounter divided-by-zero problems in solving multiple t values. In this paper, we propose a multi-mode hardware architecture for error number ranging from zero to three. We first propose a cost-down techniques to reduce the hardware complexity of a t=3 decoder. Then, we perform algorithmic-level derivation to identify the confgurable feature of our design. With the manipulations, we are able to perform multi-mode RS decoding in one unified VLSI architecture with very simple control scheme. The very low cost and simple datapath make our design a good choice in small-footprint embedded VLSI systems such as Error Control Coding (ECC) in memory systems
INTRODUCTION
Reed-Solomon (RS) code has a widespread use for forward error correcting in digital
transmission and storage systems. It is a special case of BCH codes, and has become a popular choice to provide data integrity due to its good error correction capability for burst transmission errors [ 1][2][3].
Among various RS decoding algorithms, the Peterson-Gorenstein-Zierler (PGZ)
algorithm [4][5] provides the simplest way to realize the RS decoder for t S 3 . It is very cost-effective for systems that require only small correcting capability, e.g., Error Control
Coding (ECC) in processor-memory systems and digital answer machines. Unlike the
iterative RS decoding methods (e.g., Berlekump-Mussey algorithm [6][7]), the major drawback of the conventional PGZ algorithm works for only single correction capability. That is, the PGZ circuit to solve t=3 cannot function correctly if t is 1 or 2. As a result, a
t S 3 PGZ decoder will need three copies of hardware componmts to compute t = I , t=2, and t=3, respectively. The whole circuit is shown in Figure 1 (a).
I-
-
- -
- -
-
-
.
I
I
InputI
I
I
I
M u l t i - m o d e P G Z d e c o d e r for t=0,1,2,3I
!
o u t p u t2
I
I
I
L- -
-
- - -
- -
L- -
- -
- -
-
-
(a) (b)The proposed multi-mode PGZ decoder
Figure 1. (a) Three copies of PGZ decoders based on conventional design approach (b)
Obviously, placing three copies of decoders on a circuit will definitely be a waste of silicon area and cost. We seek a simple way to merge three different decoders into one unified VLSI circuit, In this paper, we derive a configurable VLSI architecture to perform the multi-mode RS code for various correction capabilities (i.e., different t values) based on the Peterson-Gorenstein-Zierler (PGZ) algorithm. We call it Multi-mode PGZ decoder as illustrated in Figure 1 (b). The reconfigurable feature of the proposed multi-mode PGZ
decoder can solve t=O, 1,2,3 errors altogether, which leads to significant saving in
hardware cost.
The rest of this paper is organized as follows. In Sec. 2 , we go through the details of the PGZ decoding algorithm. Then, we derive the reduced-complexity RS decoder for t=3. In Sec. 3 and 4, we present the multi-mode RS decoder. In Sec 5, we discuss the hardware complexity to illustrate the hardware saving of our approach. Finally, we conclude our work in Sec. 6.
REVIEW OF
PGZ
ALGORITHM
Syndrome Calculation
Let polynomial c(x) denote the transmitted code word. Then the received code word, r(x), can be represented as
(1)
r ( x )
=c(x)
+
e(x),
where e(x) represents the error pattern. The syndrome values, denoted by
Si,
are obtained by evaluating the received polynomial r(x) at a,. That is, equation can be written ass; =
r ( a ' )
=c
r,(a')'
We also define Syndrome polynomial as 2r-I
S(x)
=Si+lx.i
i=O (3)
PGZ
algorithmThe PGZ algorithm includes two main steps. Solving Newton Identity is the first step:
That is, the syndrome values are used to solve for o values in Eq. (4). Define the Error location polynomial as
o(x)=oo
+o*x+...+cT,~,xr-l +XI. (5)Then, we can solve the Key equation
o ( x ) S ( x )
= - o ( x )+
j l * x21,where the Error value polynomial is defined as
w(x)
=ao
+ q x + . . . w I - l x r - l . PGZ Algorithm fort =1Given t=Z, from Eq. (4), we have
Then we can compute the error location as
o ( x ) =(To
+
xNext, we can solve the key equation for t=l a(x)S(x) =
-w(x)
+
j l.
x 2o(x) =
-(oo
+x)(Sl+
S,x)modx2 where the error value polynomial isNx)=%,
and O O = C T O s l .PGZ Algorithm for t=2
For t=2, Eq. (4) is reduced to
Then, we have
Then the error location polynomial can be written as o(x)=CTo+CT,x+x2
Solving the key equation for t=2 yields
o ( x ) S ( x ) = --o(x)
+
p . x4o ( x ) = -(Do
+
O I X+
x 2 ISl+
S,x+
S,x2+
S,x3)rnod x4Then, the error value polynomial can be represented as
PGZ Algorithm for t=;3 Similarly, for t=3, we have
-
s,
-s,
-
s3(14)
Then, the error location polynomial can be written as
(22)
o(n)
= 0,+
0,x +o,x2+
x3The key equation for t=3 can be written as
(23) o ( x ) S ( x ) = - w ( x )
+
p .
x6and
t (24)
~ ( x ) = -(o0
+
Q,X+
(T,X~+
x3 ISl+
S,x+ S3xz+
S,x3+
S,x4+
S6xs)mod x6where the error value polynomial is
(25)
w( x ) = 0,
+
q
x+
O2X20, = 0
s
U, =0,s,
+
CTJ, w2 = DOS3+ 0,s2
+
0,s,
with l , ,
Obviously, Eq. (21) turns out to be very complicated compared with Eqs. (8) and (14). The direct implementation of Eq. (21) will be tedious and complicated. Hence, in what follows, we provide a method to calculate to
oo,
ol,
and o2 in a cost-efficient way. The Reduced-Complexity Decoder Architecture for t=3According to our observation, in Eq. (21) the denominator have two S3S4S5 terms, which can be cancelled out on Finite-field addition. This condition can be applied to the numerator of
oo,
which contains two S2S3S4 terms. We also discover that the term, SzS5,appears quite often in Eq. (21), e.g., it is the common term of S2S2S5, S2S3S5, S2S4S3, S2SsS5. Thus, if we calculate S2S5 first, the overall computation complexity can be reduced significantly. Similarly, we can identify other common terms, such as S&, S&, S3S3, S2S5, S,S5, S&, and calculate them first, which leads to cost-efficient architecture as
illustrated in Figure 2. When oo,
o],
and o2 are available,a,
Q, and 02 can be obtainedfrom Eq. (26), as illustrated in Figure 3.
Figure 2. The block diagram of the t=3 PGZ architecture (opart).
6
?as'
€ 6Figure 3. The block diagram of the t=3 PGZ architecture ( w part).
MULTI-MODE PGZ ALGORITHM AND ARCHITECTURE
Problems of t=3 PGZArchitecture
when t=lor
2The block diagram introduced in Sec 2.D can function correctly only when the received code word has exactly three errors. However, if the error number is less than three, divided-by-zero problem will occur. Specifically, for t=3, we have to solve
If the error number is less than 3, the three columns of the 3-by-3 matrix will become linearly dependent, that is
I;;]
=.[%:I
=P [
;;I
, where a and
p
are constants. (28) Consequently, the denominator term and three numerator terms of the Eq. (21) are all equal to zero.s2s,s,
+
s,s,s,
+
s,s,s,
+
s,s,s,
= 0s,s,s5 + s,s,s,
+ s,s,s,
+ s,s,s,
= 0 S,S,S,+
s,s,s,
+
s,s,s,
+
s,s,s,
+
s,s,s,
+
s,s,s,
= 0s,s,s,
+
s,s,s5
+
s,s,s5
+
s,s,s,
+
s,s,s,
+
s,s,s,
= 0 (29) Similarly, the denominator term and two numerator terms of cb in Eq. (14) also become zero as long as the error number is less than 2.s,s,
+
s3s3
= 0s,s3
+
s,s,
= 0s,s,
+
s,s,
= 0Apparently, the c values are now equal to divided-by-zero numbers, which cannot be manipulated anymore. Hence, the t = 3 architecture above cannot guarantee the right result given that t=l or 2. To overcome this situation, three copies of hardware (Figure l(a)) are needed, together with a specific state machine to check the error status.
The Proposed Multi-mode Decoding Algorithm
In fact, the zero values contain some information to facilitate our derivations. That
is, by recognizing one of four terms in Eq. (29) and one of three terms in Eq. (30) the error number can be decided. For instance, (S2SqSg+SqSqSq+S3S3S6+S2S5Sg) will equal to zero when t=0,1,2; (S2Sq +S3S3 ) will equal to zero when t=0,1 and S2 will equal to zero when t=O. Consequently, we employ these three terms to detect the error number t. Figure 4 shows the flowchart to detect the error number.
/ How many terms
\
are equal to zero?The error number
Figure 4. The flowchart to detect the error number in the proposed RS decoder. By examining Eq. (14) carefully, we can discover that SI&, S2S2, S2S4,
S3S3,
S2S3,SJ4, and SlS3+S2S2, S2S4+S3S3, S2S3+SIS4 are generated when calculating CJ for t=2. Our
approach is to compute CJ for t=3 using these terms as basis. Meanwhile, as we mention in
Sec 2.D, two S3S4S5 terms and two S2S3S4 terms can be neglected, which helps a lot in
reducing the overall complexity. Although there are more hardware, the multi-mode PGZ
decoder will generate the term needed to calculate different CJ for t=1,2,3 at the same
time. Providing that we know the error number, the correct term to calculate CJ value can
be chosen. Multiplexors in the multi-mode decoder will perform this selection. Figure 5 and Figure 6 show the block diagram of the proposed multi-mode PGZ decoder architecture. The algorithm of the controller will base on the flowchart in Figure 4.
Figure 5. The block diagram of the multi-mode PGZ architecture (opart).
MULTI-MODE CHIEN'S SEARCH
&
FORNEY
'S
METHOD
After locating all (T and w values, the error location polynomial of Eq. ( 5 ) and the
error value polynomial of Eq. (7) can be formed. According to Chien's search, the error location 1 satisfies the equation below.
a(a-')=ao+o,x+a2x2+.. .+o,x'
= o
9 0 , = 1 , O s l s 2 " - 1 . ( 3 1 )
where I denotes the error location. The error location polynomial reduces to Eqs. ( 3 2 ) , ( 3 3 ) , and ( 3 4 ) , for t = I , t=2, and t=3, respectively:
, t=2, =
+
o,xl+
l x 2+
ox3
, t=3. a(x) = Oo
+
o,xl+
0 2 x 2+
ix3( 3 3 )
( 3 4 ) Suppose that we build a circuit to solve the equation for t=3 case, deliberately setting (T to I for t=2 case, and (T to 0, (T to 1 for t=I case, the roots of Eqs. ( 3 2 ) ( 3 3 )
( 3 4 ) can be searched, no matter what t is. The outcome from the multiplexor of the multi- mode decoder will pick up appropriate
o
values.Meanwhile, Forney's method is applied to find error value El, which corresponds to
the error location 1. Then we have
where
Setting t = l , 2 or 3 , error value equations for three special cases can be expressed as 0,
+
ox'
+
o x 2
1+ox2
,E,
= WO+
OIX1+
ox2
0, + o x 2E,
= , t=2, ( 3 7 )WO
+
U,XI+
W 2 X 2 0, +1x2E, =
, t=3. (39)
The equation for t=3 case is obviously the most complicated. Therefore, once an architecture can resolve it, the equation for t=2 as well as t=I can be calculated only by
changing coefficients. The controller will control the multiplexor to select the proper w
values in Figure 6.
Figure 7 shows the implementation of Chien’s search & Forney’s method. The offset is the corrupted data, which must be added to corresponding error value E, to produce the corrected data.
m
f P
f e
f €
Figure 7. The block diagram of the proposed Chien’s search & Forney’s method.
COMPARISON
OF COMPLEXITY
The main drawback of PGZ algorithm is that its hardware complexity will rise rapidly provided that t is larger than three. Direct implementation of the PGZ algorithm for t=3 without employing any cost-down techniques requires 40 Finite-field multiplier (FFM) and 16 Finite-field adder (FFA). By exploiting the special properties of the finite field operations in Sec 2.D, we had derived a reduced-complexity PGZ decoder for t=3. It requires only 21 FFh4 and 11 FFA and the hardware complexity is saved by approximately 50%. Furthermore, the design techniques of the reduced-complexity t = 3 design is applied to our multi-mode PGZ decoder for any t 1 3 . It needs only 24
FFM
and 12 FFA. The comparison of hardware complexity is shown in Table 1. As we can see, compared with the reduced-complexity designs, only three additional FFM and one addition FFA are required in our multi-mode PGZ architecture. That is, our multi-mode PGZ architecture can solve for t=0,1,2,3 errors in one unified VLSI architecture, but with very small hardware overhead.Arc hi tec ture type Direct implementation PGZ algorithm for t = 3
The derived reduced complexity PGZ for t = 3 The proposed Multi-mode
PGZ for t = 0, I , 2, 3
In the paper, we proposed the algorithm derivation and VLSI architecture of a multi- mode PGZ-based RS decoder. It can compute the correct error locations and error values for any t less than four, accompanied by very small complexity. Due to the help of configurable architecture, we can easily perform the RS code for different values of t without re-designing the hardware architecture.
Number of FFM Number of FFA
40 16
21 11
24 12
References
S. B. Wicker, Error Control Systems f o r Digital Communication and Storage. Prentice Hall, 1995.
Wicker and Bhargava, Reed-Solomon codes and applications, IEEE Press, 1994. S. Whitaker, J. Canaris, and K. Cameron, “Reed-Solomon VLSI codec for advanced television,” IEEE Trans. Circuits Syst. Video Technol., vol. 1, pp. 230- 236, June 199 1.
Meera Srinivasan and Dilip V. Sarwate, Malfunction in the Peterson-Gorenstein- Zierler Decoder, IEEE Trans. on Information Theory, vol. 40, no. 5, September
1994.
Son Le-Ngoc, Z. Young, An approach to double error correcting Reed-Solomon decoding without Chien search, Proceedings of the 36th Midwest Symposium, vol.
1, pp. 534-537, 1993.
Kuang Yung Liu, Architecture for VLSI Design of Reed-Solomon Decoders, IEEE Trans. On computers, vol. C-33, no. 2, Feb 1984.
L. Song and K. K. Parhi, Low-Energy Software Reed-Solomon Codecs Using Specialized Finite Field Datapath and Division-Free Berlekamp-Massey Algorithm, IEEE Symp. On Circuits ans Systems, June 1999.