A very low-cost multi-mode Reed Solomon decoder based on Peterson-Gorenstein-Zierler algorithm

(1)

A

VERY LOW-COST MULTI-MODE REED SOLOMON

DECODER BASED ON PETERSON-GORENSTEIN-

ZIERLER ALGORITHM

Sheng-Feng Wang

Department of Electrical Engineering, National Taiwan University

Taipei 106, TAIWAN, R.O.C.

Huai-Yi Hsu and An-Yeu Wu Institute of Electronics Engineering, National Taiwan University Taipei 106, TAIWAN, R.O.C.

Abstract Reed-Solomon (RS) codes play an important role in providing error protection and data integrity. Among various RS decoding algorithms, the Peterson-Gorenstein-Zierler

(PGZ) in general has the least computational complexity for small t values. However, unlike the iterative approaches (e.g., Berlekump-Mussey algorithm), it will encounter divided-by-zero problems in solving multiple t values. In this paper, we propose a multi-mode hardware architecture for error number ranging from zero to three. We first propose a cost-down techniques to reduce the hardware complexity of a t=3 decoder. Then, we perform algorithmic-level derivation to identify the confgurable feature of our design. With the manipulations, we are able to perform multi-mode RS decoding in one unified VLSI architecture with very simple control scheme. The very low cost and simple datapath make our design a good choice in small-footprint embedded VLSI systems such as Error Control Coding (ECC) in memory systems

INTRODUCTION

Reed-Solomon (RS) code has a widespread use for forward error correcting in digital

transmission and storage systems. It is a special case of BCH codes, and has become a popular choice to provide data integrity due to its good error correction capability for burst transmission errors [ 1][2][3].

Among various RS decoding algorithms, the Peterson-Gorenstein-Zierler (PGZ)

algorithm [4][5] provides the simplest way to realize the RS decoder for t S 3 . It is very cost-effective for systems that require only small correcting capability, e.g., Error Control

Coding (ECC) in processor-memory systems and digital answer machines. Unlike the

iterative RS decoding methods (e.g., Berlekump-Mussey algorithm [6][7]), the major drawback of the conventional PGZ algorithm works for only single correction capability. That is, the PGZ circuit to solve t=3 cannot function correctly if t is 1 or 2. As a result, a

t S 3 PGZ decoder will need three copies of hardware componmts to compute t = I , t=2, and t=3, respectively. The whole circuit is shown in Figure 1 (a).

(2)

I-

-

- -

-

.

I

Input

I

M u l t i - m o d e P G Z d e c o d e r for t=0,1,2,3

I

!

o u t p u t

2 I

I

L

- -

-

- - -

- -

L

- -

-

(a) (b)

The proposed multi-mode PGZ decoder

Figure 1. (a) Three copies of PGZ decoders based on conventional design approach (b)

Obviously, placing three copies of decoders on a circuit will definitely be a waste of silicon area and cost. We seek a simple way to merge three different decoders into one unified VLSI circuit, In this paper, we derive a configurable VLSI architecture to perform the multi-mode RS code for various correction capabilities (i.e., different t values) based on the Peterson-Gorenstein-Zierler (PGZ) algorithm. We call it Multi-mode PGZ decoder as illustrated in Figure 1 (b). The reconfigurable feature of the proposed multi-mode PGZ

decoder can solve t=O, 1,2,3 errors altogether, which leads to significant saving in

hardware cost.

The rest of this paper is organized as follows. In Sec. 2 , we go through the details of the PGZ decoding algorithm. Then, we derive the reduced-complexity RS decoder for t=3. In Sec. 3 and 4, we present the multi-mode RS decoder. In Sec 5, we discuss the hardware complexity to illustrate the hardware saving of our approach. Finally, we conclude our work in Sec. 6.

REVIEW OF

PGZ

ALGORITHM

Syndrome Calculation

Let polynomial c(x) denote the transmitted code word. Then the received code word, r(x), can be represented as

(1)

r ( x )

=

c(x)

+

e(x),

where e(x) represents the error pattern. The syndrome values, denoted by

Si,

are obtained by evaluating the received polynomial r(x) at a,. That is, equation can be written as

s; =

r ( a ' )

=

c

r,(a')'

(3)

We also define Syndrome polynomial as 2r-I

S(x)

=

Si+lx.i

i=O ₍₃₎

PGZ

algorithm

The PGZ algorithm includes two main steps. Solving Newton Identity is the first step:

That is, the syndrome values are used to solve for o values in Eq. (4). Define the Error location polynomial as

o(x)=oo

+o*x+...+cT,~,xr-l +XI. (5)

Then, we can solve the Key equation

o ( x ) S ( x )

= - o ( x )

+

j l * x21,

where the Error value polynomial is defined as

w(x)

=ao

+ q x + . . . w I - l x r - l . PGZ Algorithm fort =1

Given t=Z, from Eq. (4), we have

Then we can compute the error location as

o ( x ) =(To

+

x

Next, we can solve the key equation for t=l a(x)S(x) =

-w(x)

+

j l

.

x 2

(4)

o(x) =

-(oo

+x)(Sl

+

S,x)modx2 where the error value polynomial is

Nx)=%,

and O O = C T O s l .

PGZ Algorithm for t=2

For t=2, Eq. (4) is reduced to

Then, we have

Then the error location polynomial can be written as o(x)=CTo+CT,x+x2

Solving the key equation for t=2 yields

o ( x ) S ( x ) = --o(x)

+

p . x4

o ( x ) = -(Do

+

O I X

+

x 2 ISl

+

S,x

+

S,x2

+

S,x3)rnod x4

Then, the error value polynomial can be represented as

PGZ Algorithm for t=;3 Similarly, for t=3, we have

-

s,

-s,

-

s3

(14)

(5)

Then, the error location polynomial can be written as

(22)

o(n)

= 0,

+

0,x +o,x2

+

x3

The key equation for t=3 can be written as

(23) o ( x ) S ( x ) = - w ( x )

+

p .

x6

and

t (24)

~ ( x ) = -(o0

+

Q,X

+

(T,X~

+

x3 ISl

+

S,x+ S3xz

+

S,x3

+

S,x4

+

S6xs)mod x6

where the error value polynomial is

(25)

w( x ) = 0,

+

q

x

+

O2X2

0, = 0

s

U, =

0,s,

+

CTJ, w2 = DOS3

+ 0,s2

+

0,s,

with l , ,

Obviously, Eq. (21) turns out to be very complicated compared with Eqs. (8) and (14). The direct implementation of Eq. (21) will be tedious and complicated. Hence, in what follows, we provide a method to calculate to

oo,

ol,

and o2 in a cost-efficient way. The Reduced-Complexity Decoder Architecture for t=3

According to our observation, in Eq. (21) the denominator have two S3S4S5 terms, which can be cancelled out on Finite-field addition. This condition can be applied to the numerator of

oo,

which contains two S2S3S4 terms. We also discover that the term, SzS5,

appears quite often in Eq. (21), e.g., it is the common term of S2S2S5, S2S3S5, S2S4S3, S2SsS5. Thus, if we calculate S2S5 first, the overall computation complexity can be reduced significantly. Similarly, we can identify other common terms, such as S&, S&, S3S3, S2S5, S,S5, S&, and calculate them first, which leads to cost-efficient architecture as

(6)

illustrated in Figure 2. When oo,

o],

and o2 are available,

a,

Q, and 02 can be obtained

from Eq. (26), as illustrated in Figure 3.

Figure 2. The block diagram of the t=3 PGZ architecture (opart).

6

?as'

€ 6

Figure 3. The block diagram of the t=3 PGZ architecture ( w part).

MULTI-MODE PGZ ALGORITHM AND ARCHITECTURE

Problems of t=3 PGZ

Architecture

when t=l

or

2

The block diagram introduced in Sec 2.D can function correctly only when the received code word has exactly three errors. However, if the error number is less than three, divided-by-zero problem will occur. Specifically, for t=3, we have to solve

(7)

If the error number is less than 3, the three columns of the 3-by-3 matrix will become linearly dependent, that is

I;;]

=.[%:I

=

P [

;;I

, where a and

p

are constants. (28) Consequently, the denominator term and three numerator terms of the Eq. (21) are all equal to zero.

s2s,s,

+

s,s,s,

+

s,s,s,

+

s,s,s,

= 0

s,s,s5 + s,s,s,

+ s,s,s,

= 0 S,S,S,

+

s,s,s,

+

s,s,s,

+

s,s,s,

+

s,s,s,

+

s,s,s,

= 0

s,s,s,

+

s,s,s5

+

s,s,s5

+

s,s,s,

+

s,s,s,

+

s,s,s,

= 0 (29) Similarly, the denominator term and two numerator terms of cb in Eq. (14) also become zero as long as the error number is less than 2.

s,s,

+

s3s3

= 0

s,s3

+

s,s,

= 0

s,s,

+

s,s,

= 0

Apparently, the c values are now equal to divided-by-zero numbers, which cannot be manipulated anymore. Hence, the t = 3 architecture above cannot guarantee the right result given that t=l or 2. To overcome this situation, three copies of hardware (Figure l(a)) are needed, together with a specific state machine to check the error status.

The Proposed Multi-mode Decoding Algorithm

In fact, the zero values contain some information to facilitate our derivations. That

is, by recognizing one of four terms in Eq. (29) and one of three terms in Eq. (30) the error number can be decided. For instance, (S2SqSg+SqSqSq+S3S3S6+S2S5Sg) will equal to zero when t=0,1,2; (S2Sq +S3S3 ) will equal to zero when t=0,1 and S2 will equal to zero when t=O. Consequently, we employ these three terms to detect the error number t. Figure 4 shows the flowchart to detect the error number.

(8)

/ How many terms

\

are equal to zero?

The error number

Figure 4. The flowchart to detect the error number in the proposed RS decoder. By examining Eq. (14) carefully, we can discover that SI&, S2S2, S2S4,

S3S3,

S2S3,

SJ4, and SlS3+S2S2, S2S4+S3S3, S2S3+SIS4 are generated when calculating CJ for t=2. Our

approach is to compute CJ for t=3 using these terms as basis. Meanwhile, as we mention in

Sec 2.D, two S3S4S5 terms and two S2S3S4 terms can be neglected, which helps a lot in

reducing the overall complexity. Although there are more hardware, the multi-mode PGZ

decoder will generate the term needed to calculate different CJ for t=1,2,3 at the same

time. Providing that we know the error number, the correct term to calculate CJ value can

be chosen. Multiplexors in the multi-mode decoder will perform this selection. Figure 5 and Figure 6 show the block diagram of the proposed multi-mode PGZ decoder architecture. The algorithm of the controller will base on the flowchart in Figure 4.

(9)

Figure 5. The block diagram of the multi-mode PGZ architecture (opart).

(10)

MULTI-MODE CHIEN'S SEARCH

&

FORNEY

'S

METHOD

After locating all (T and w values, the error location polynomial of Eq. ( 5 ) and the

error value polynomial of Eq. (7) can be formed. According to Chien's search, the error location 1 satisfies the equation below.

a(a-')=ao+o,x+a2x2+.. .+o,x'

= o

9 0 , = 1 , O s l s 2 " - 1 . ( 3 1 )

where I denotes the error location. The error location polynomial reduces to Eqs. ( 3 2 ) , ( 3 3 ) , and ( 3 4 ) , for t = I , t=2, and t=3, respectively:

, t=2, =

+

o,xl

+

l x 2

+

ox3

, t=3. a(x) = Oo

+

o,xl

+

0 2 x 2

+

ix3

( 3 3 )

( 3 4 ) Suppose that we build a circuit to solve the equation for t=3 case, deliberately setting (T to I for t=2 case, and (T to 0, (T to 1 for t=I case, the roots of Eqs. ( 3 2 ) ( 3 3 )

( 3 4 ) can be searched, no matter what t is. The outcome from the multiplexor of the multi- mode decoder will pick up appropriate

o

values.

Meanwhile, Forney's method is applied to find error value El, which corresponds to

the error location 1. Then we have

where

Setting t = l , 2 or 3 , error value equations for three special cases can be expressed as 0,

+

ox'

+

o x 2

1+ox2

,

E,

= WO

+

OIX1

+

ox2

0, + o x 2

E,

= , t=2, ( 3 7 )

(11)

WO

+

U,XI

+

W 2 X 2 0, +1x2

E, =

, t=3. ₍₃₉₎

The equation for t=3 case is obviously the most complicated. Therefore, once an architecture can resolve it, the equation for t=2 as well as t=I can be calculated only by

changing coefficients. The controller will control the multiplexor to select the proper w

values in Figure 6.

Figure 7 shows the implementation of Chien’s search & Forney’s method. The offset is the corrupted data, which must be added to corresponding error value E, to produce the corrected data.

m

f P

f e

f €

Figure 7. The block diagram of the proposed Chien’s search & Forney’s method.

COMPARISON

OF COMPLEXITY

The main drawback of PGZ algorithm is that its hardware complexity will rise rapidly provided that t is larger than three. Direct implementation of the PGZ algorithm for t=3 without employing any cost-down techniques requires 40 Finite-field multiplier (FFM) and 16 Finite-field adder (FFA). By exploiting the special properties of the finite field operations in Sec 2.D, we had derived a reduced-complexity PGZ decoder for t=3. It requires only 21 FFh4 and 11 FFA and the hardware complexity is saved by approximately 50%. Furthermore, the design techniques of the reduced-complexity t = 3 design is applied to our multi-mode PGZ decoder for any t 1 3 . It needs only 24

FFM

and 12 FFA. The comparison of hardware complexity is shown in Table 1. As we can see, compared with the reduced-complexity designs, only three additional FFM and one addition FFA are required in our multi-mode PGZ architecture. That is, our multi-mode PGZ architecture can solve for t=0,1,2,3 errors in one unified VLSI architecture, but with very small hardware overhead.

(12)

Arc hi tec ture type Direct implementation PGZ algorithm for t = 3

The derived reduced complexity PGZ for t = 3 The proposed Multi-mode

PGZ for t = 0, I , 2, 3

In the paper, we proposed the algorithm derivation and VLSI architecture of a multi- mode PGZ-based RS decoder. It can compute the correct error locations and error values for any t less than four, accompanied by very small complexity. Due to the help of configurable architecture, we can easily perform the RS code for different values of t without re-designing the hardware architecture.

Number of FFM Number of FFA

40 16

21 11

24 12

References

S. B. Wicker, Error Control Systems f o r Digital Communication and Storage. Prentice Hall, 1995.

Wicker and Bhargava, Reed-Solomon codes and applications, IEEE Press, 1994. S. Whitaker, J. Canaris, and K. Cameron, “Reed-Solomon VLSI codec for advanced television,” IEEE Trans. Circuits Syst. Video Technol., vol. 1, pp. 230- 236, June 199 1.

Meera Srinivasan and Dilip V. Sarwate, Malfunction in the Peterson-Gorenstein- Zierler Decoder, IEEE Trans. on Information Theory, vol. 40, no. 5, September

1994.

Son Le-Ngoc, Z. Young, An approach to double error correcting Reed-Solomon decoding without Chien search, Proceedings of the 36th Midwest Symposium, vol.

1, pp. 534-537, 1993.

Kuang Yung Liu, Architecture for VLSI Design of Reed-Solomon Decoders, IEEE Trans. On computers, vol. C-33, no. 2, Feb 1984.

L. Song and K. K. Parhi, Low-Energy Software Reed-Solomon Codecs Using Specialized Finite Field Datapath and Division-Free Berlekamp-Massey Algorithm, IEEE Symp. On Circuits ans Systems, June 1999.