PC On

(1)

An fficient ^ow Comple ity PC ncoder Based On actorization ith Pivoting

ia-ning Su, hou iang, Ke iu, iao-yang

^eng, ao

Min

(ASIC System State Key ab, udan niversity, Shanghai 00433, China) mail:

insu

fudan.edu.cn

Abstract

In this paper, we present an efficient encoder for regular and irregular low-density parity-check ( PC) codes withits comple ity linear to code length. Inspired by the idea of eal, we further e ploit the sparsity of parity check matri of PC codes and use e tended factorizationwith pivoting in encoding process, which is fle ible andsupporting arbitrary matrices, code rate andblock lengths. An P Aimplementation for a rate 1/ regular (3,6) length 1536 PC code encoder is provided with throughput of 31Mbps. An efficient memory organization for storing and performing computationsonsparsematrices is also presented.

1 Introduction

In the past few years, LDPC

codesE11

have received much attention due to their excellent performance and inherent parallelismindecoder. LDPCcodesare nowbeing considered as the most promising candidate forward error correction(FEC) schemeforawiderangeof

applications

in telecommunications and storagedevices. In2004, Europes

digital video broadcasting (DVB) standards group has selected LDPC codes over Turbo codes for the next

generationdigitalsatellite-broadcastingstandard.

Despite the better performance and lower

decoding

complexity compared to Turbo codes, LDPC codes have large encoding complexity, which virtually constitutes the main obstacle preventing its hardware

implementation

in modemcommunications.

Fewattemptshavebeen madetoimplementtheencoder directlythroughdense matrix

operations,

for itwill result in complexity

quadratic

to code

lengths.

The first

encoding

method with linear

complexity

is introduced

by Neal[21,

he used LU decomposition to free the

encoding

process of dense inverse operation, however, it is not easy to finda

good sparse LU

decomposition

for

arbitrary

H matrices.

Richardson also developed an efficient means ofencoding anLDPC

code131,

but hismethodrequires lots ofpreprocess before the encoder can do its job. Other various encoding methods include MacKays lower triangular

method143,

Haleys iterative encoding

method[5],

quasi-cyclic

method161

are all aiming at LDPC codes with specific constructions, notadapted for arbitrary LDPC codes.

Intherestof thispaper, wewill first briefly review the encoding using LU decomposition, then a simple and efficient method forfindingasparseLU decompositionwill beintroduced, followed by the complexity analysisandwe will also present a pipelined encoder architecture with description of main components, and the last part is the implementation results.

Overview

In systematic encoding, the parity check matrix ofan (n,k)LDPCencodercanbepartitionedintotwosubmatrices B and A with dimensions [(n-k) k] and [(n-k) (n-k)]

respectively,that is =[B A].

Correspondingly,

acodeword may be split into systematic form with the first k bits as sourcemessagebitssand theremaining (n-k) bitscasparity checkbits,suchascodeword=[s c].

Ingeneral encoding method,wehave codewordT=O, B A ^K

]=0,thatis

cT=XAl B sT (1)

Inthe above formula,

A71

is a dense matrix and the multiplicationwith dense matrixwill be hardwarecost.

Ourproposed encodingmethod will take

advantage

of thesparsityof A, lets

rewrite

(1)^as: A CT= B

ST,

and let

y=B

ST,

wehave: A cT=y.

(2)

Notice the fact that when A is

nonsingular,

it can be decomposed into alower

triangular

matrix and^anupper triangularmatrix ,thatis A= ,butasit is

always

the casethatnotall the

principle

minors ofAis

nonsingular,

the

0-7803-9210-8/05/$20.00

168

(2)

LU decomposition may not exist, thus an extended LU factorization with pivotingwillbe used here,

P A= (3)

Here P is the permutation matrix that records the row permutationsduringLU

factorization,

substitute (3)into (2)

we can get:

CT

=P.y (4)

Let .CT=z,wehave: z=P y

z canbe obtained through forward substitution, andcT can be obtainedthrough backward substitution. As ^, and P well preserve the sparsity of A, encoding can be completed without anycostlydense matrixoperation.

All thesteps in ourproposedencodingmethod and their computational complexity with regardto code lengthn are listedTable1.

Tab.1 Computation ofcT= [ (P B sT)]

Step Operation Comment Complexity

1 B

sT

Matrix X Vector (n)

2 P B sT Matrix X Vector (n)

3 -1

(P

B.

sT)

Forward substitution (n) 4

-1.

[

1l (P.

^B ^sT)] backward substitution (n)

In actual encoding process, LU factorization will be preprocessed by software, and the matrix multiplication, backwardand forward substitution will be donebyhardware.

The entireencodingframeworkis shown inFig.1.

nm m

m

Messa e Co e

oc nco er or s

Fig.1 TheEncodingFramework

3 Preprocessing

Inpreprocessing step, the nonsingular submatrix A will

be decomposed into a lower triangular matrix and an uppertriangular^matrix ^and ^store ^the ^rowpermutations into Patthe sametime.

Here we introduce a simple and efficientalgorithm to get , and P. First a little example will be given to illustrate the extended LUfactorizationalgorithm.

11100 10110 LetA= 0010 1 1 00 1 0 0011 1 Step1:Initialization

1 0000 1 1 1A 0 0

0 1 000 1 0 1 1 0

00 1 00 00 1 0 1

000 1 0 1 00 1 0

00001 00111

Step2:Gaussianelimination of colunm 1 in 1 0 0 0 0 1 1 1 0 0

1 1 000 0 1 0 1 0

00 1 00 00 1 0 1

1 00 1 0 0 1 1 1 0

00001 00111

Step3: Gaussian elimination ofcolumn2,3in

1 0000 1 1 1 0 0

1 1 000 0 1 0 1 0

00 1 00 00 1 0 1

1 1 1 1 0 0000 1

O 0 1 0 1 0 0 01 0

Step4: Pivoting byswapping row 4 and 5 in U 1 0 0 0 0 1 1 1 0 0

1 1 000 0 1 0 1 0

00 1 00 00 1 0 1

O 0 1 1 0 0 00 1 0

1 1 1 0 1 0000 1

By now we get , and P.

p

10 0 0 0 0 1 0 0 0 00 1 00 000 1 0 0000 1 p10

0100 0000

p10 0100 0000

00 10 0

00 0 00 0 0 11 0

0 00 0 0 01 0 0 1

0 0 0 0 0 0 1 00 00 10 1 0

Now we give the full LU decomposition algorithm, suppose A is anonsingularmatrix ofsize[n n].

J)

Initialization:

=I

n' =A, P=l

n

©)

Fori=1: n, check

Uii

IfU1⁼0

Findthe first

Ul.0,

(k=i+l,i+2, ,n) Exchangerowi and k in and P

Exchangethe first(i-1)elements of row i and kin wheni 1

Go to (

IF

Uli.0,

(i

k<n)

notfound, A is nonsingular Else if

Uii

0,go to

(I

0 Perform Gaussianeliminationincolumn i Add 1 in corresponding positions of Goto 0) when i=n

169

(3)

Fig. 2 Overview of hardware encoder architecture

4 ncoder architecture and sub circuits

4.1 Pipelined encoder architecture

The job left forhardware encoder is to compute the paritycheck bits according to operations listed in table

1,

totally two matrix byvectormultiplications, one forward substitutionand one backwardsubstitution. Anoverview of ourhardware encoder architecture is shown in Fig. 2. The operations aregroupedinto four stages that run in parallel anddouble bufferingisused between the stages. The stages have been carefully partitioned to balance the workloads between the stages, while minimizing the overall latency, idle times and memory requirements in buffering. This flexible pipelined encoder structure can support any rate and block lengths.

In stage 1, we simply write the message block into buffers, as the message block length is k, this stagewill take k clockcycles.Instage 2, weperform operations 1and 2listed in table 1, the results are also buffered beforethey are fed to the next stage. In stage 3, all the remaining operations needed to compute the parity check bits are performed.Instage4,theparitycheck bits and the buffered messagebits arecombinedtogeneratethe codeword.

This architecture is optimized forrate 1/2

codes,

for the lengthsof their message bits andparitycheckbits are the same, which can naturally balance theworkloads and cyclesfor stage 1,4and stage2,3without

special alteration,

and minimize the memory sizes and idle

cycles.

4. Matri

-vector

multiplication

The main operations involved in stage 2 is matrix vector multiplication (MVM), MVM computes * = where isanmatrix, and arevectors, and is what

we want tocompute. Notice the fact that X issparse, it will be efficient to store the locations of ones instead of store the whole matrix directly. In our implementation, the location of ones in each row is stored with an extra bit indicating the end of a row. For example, if

1 1100 10110

A=0 0101 ,it will be stored as shown in Table 2 100 1 0

0011 1

Tab. 2 Thestorage of A in memory

Address 0 1 2 3 4 5 6 7 8 9 10 11 12

Data 1 2 3 1 3 4 3 5 1 4 4 5 6

End

flag

0 0 1 0 0 1 0 1 0 1 0 0 1

x y ^z

Fig.3Circuit for matrix-vectormultiplication Thecircuit for MVM is shown inFig. 3,thelocations of ones areusedasindex for , endflagsareusedasindex for , an XOR gate and a D flip -flop accumulate the results.

170

(4)

4.3 orward and backward substitution

Consider the equation ⁼ where is a sparse triangular matrix, is the vector we want to compute.

Normally we can compute -1. , however matrix inversion will benightmare to hardware implementation.

One better way is to useforward or backward substitution, as Xistriangular. The circuit is shown inFig.4.

x y z

Fig.4Circuit for forward and backward substitution The abovecircuit is similar to that inFig.4,except that thepreviouselements of arealso needed inprocessing, the index calculator computes the location of tobe read and tobe written, the data from X matrix indicates the locations of that are needed to compute the current element of

5 Implementation results

Thepreprocessorhas been implemented using Matlab, the scatter plots of a 768^X1536 Hmatrix afterpreprocess areshown inFig.5 (a)-(d),where ones appear as dots.

Fig.5 (a) Scatterplot of H Fig.5 (b) Scatter plot of L

The code words generated have been verified against Matlab for correctness. An encoder for a rate 1/2 regular (3,6) length 1536 LDPC code has beenimplementedona Altera stratix EPlS8OB596C device. The design takes 5 the logic resource and 20 the memory resource of the device. The encoder runs steadily at the clock rate of 64Mhz,and theequivalentcodewordthroughis 31Mbps.

6 Conclusion

Wehavepresentedthe hardware designofapipelined LDPC encoder based on extended LU factorization with pivoting, which hascomplexity linear to block length,and can support arbitrary H matrices. Efficient software has been written to preprocess the H matrix and generate the L, U, P for hardwareencoders.

Our ongoing work includes implementation of a parameterized LDPC decoder supportarbitraryHmatrices, and optimize it for low powerapplications.

References

[1] R. Gallager. Low-density Parity-Checkco&

rans. on n orm. eory,1962, 8:21-28

[2] R. M. Neal. Sparse matrix methods andprobabilistic algorithm. A ro ram n Codes, Systems, and

rap ic odels, 1999

[3] T. Richardson, R. Urbanke. Efficient encoding of low-density parity-check codes. rans. on

norm.Theory,2001, 47:638-656

[4] D. J. C. Mackay, S. T. Wilson, M. C. Davey, Comparison of constructions of irregular Gallager codes, rans. on Communications, 1999, 47:1449-1454

[5] D. Haley, A. Grant, J. Buetefer, Iterative encoding of low-density parity-check codes, in Australian Communication eory or s op, 2002, Feb.

[6] , A family ofirregular LDPC codes with low encoding complexity. Communication etters, 2003,Feb,7:79-81.

[7] R. L. Townsend, E. J. Weldon. Self-orthogonal quasi-cyclic codes. IEEE Trans. on

Inform.

Theory, 1967, 13:183-195

Fig.5 (c) Scatter plot of U Fig.5 (d) Scatter plot of P

PC On

An fficient ow Comple ity PC ncoder Based On actorization ith Pivoting

ia-ning Su, hou iang, Ke iu, iao-yang

Min

(ASIC System State Key ab, udan niversity, Shanghai 00433, China) mail:

fudan.edu.cn

Abstract

1 Introduction

codesE11

applications

decoding

implementation

operations,

quadratic

lengths.

encoding

complexity

by Neal[21,

encoding

decomposition

arbitrary

code131,

method143,

method[5],

method161

Overview

Correspondingly,

]=0,thatis

A71

advantage

rewrite

ST,

ST,

(2)

nonsingular,

triangular

always

principle

nonsingular,

0-7803-9210-8/05/$20.00

168

factorization,

CT

sT

(P

sT)

-1.

1l (P.

3 Preprocessing

J)

n' =A, P=l

©)

Uii

Ul.0,

Uli.0,

k<n)

Uii

(I

169

4 ncoder architecture and sub circuits

4.1 Pipelined encoder architecture

1,

codes,

special alteration,

cycles.

4. Matri

multiplication

flag

x y z

170

4.3 orward and backward substitution

x y z

5 Implementation results

6 Conclusion

References

Inform.

171

An fficient ^ow Comple ity PC ncoder Based On actorization ith Pivoting

x y ^z