• 沒有找到結果。

PC On

N/A
N/A
Protected

Academic year: 2022

Share "PC On"

Copied!
4
0
0

加載中.... (立即查看全文)

全文

(1)

An fficient ow Comple ity PC ncoder Based On actorization ith Pivoting

ia-ning Su, hou iang, Ke iu, iao-yang

eng, ao

Min

(ASIC System State Key ab, udan niversity, Shanghai 00433, China) mail:

insu

fudan.edu.cn

Abstract

In this paper, we present an efficient encoder for regular and irregular low-density parity-check ( PC) codes withits comple ity linear to code length. Inspired by the idea of eal, we further e ploit the sparsity of parity check matri of PC codes and use e tended factorizationwith pivoting in encoding process, which is fle ible andsupporting arbitrary matrices, code rate andblock lengths. An P Aimplementation for a rate 1/ regular (3,6) length 1536 PC code encoder is provided with throughput of 31Mbps. An efficient memory organization for storing and performing computationsonsparsematrices is also presented.

1 Introduction

In the past few years, LDPC

codesE11

have received much attention due to their excellent performance and inherent parallelismindecoder. LDPCcodesare nowbeing considered as the most promising candidate forward error correction(FEC) schemeforawiderangeof

applications

in telecommunications and storagedevices. In2004, Europes

digital video broadcasting (DVB) standards group has selected LDPC codes over Turbo codes for the next

generationdigitalsatellite-broadcastingstandard.

Despite the better performance and lower

decoding

complexity compared to Turbo codes, LDPC codes have large encoding complexity, which virtually constitutes the main obstacle preventing its hardware

implementation

in modemcommunications.

Fewattemptshavebeen madetoimplementtheencoder directlythroughdense matrix

operations,

for itwill result in complexity

quadratic

to code

lengths.

The first

encoding

method with linear

complexity

is introduced

by Neal[21,

he used LU decomposition to free the

encoding

process of dense inverse operation, however, it is not easy to finda

good sparse LU

decomposition

for

arbitrary

H matrices.

Richardson also developed an efficient means ofencoding anLDPC

code131,

but hismethodrequires lots ofpreprocess before the encoder can do its job. Other various encoding methods include MacKays lower triangular

method143,

Haleys iterative encoding

method[5],

quasi-cyclic

method161

are all aiming at LDPC codes with specific constructions, notadapted for arbitrary LDPC codes.

Intherestof thispaper, wewill first briefly review the encoding using LU decomposition, then a simple and efficient method forfindingasparseLU decompositionwill beintroduced, followed by the complexity analysisandwe will also present a pipelined encoder architecture with description of main components, and the last part is the implementation results.

Overview

In systematic encoding, the parity check matrix ofan (n,k)LDPCencodercanbepartitionedintotwosubmatrices B and A with dimensions [(n-k) k] and [(n-k) (n-k)]

respectively,that is =[B A].

Correspondingly,

acodeword may be split into systematic form with the first k bits as sourcemessagebitssand theremaining (n-k) bitscasparity checkbits,suchascodeword=[s c].

Ingeneral encoding method,wehave codewordT=O, B A K

]=0,thatis

cT=XAl B sT (1)

Inthe above formula,

A71

is a dense matrix and the multiplicationwith dense matrixwill be hardwarecost.

Ourproposed encodingmethod will take

advantage

of thesparsityof A, lets

rewrite

(1)as: A CT= B

ST,

and let

y=B

ST,

wehave: A cT=y.

(2)

Notice the fact that when A is

nonsingular,

it can be decomposed into alower

triangular

matrix andanupper triangularmatrix ,thatis A= ,butasit is

always

the casethatnotall the

principle

minors ofAis

nonsingular,

the

0-7803-9210-8/05/$20.00

©2005 IEEE

168

(2)

LU decomposition may not exist, thus an extended LU factorization with pivotingwillbe used here,

P A= (3)

Here P is the permutation matrix that records the row permutationsduringLU

factorization,

substitute (3)into (2)

we can get:

CT

=P.y (4)

Let .CT=z,wehave: z=P y

z canbe obtained through forward substitution, andcT can be obtainedthrough backward substitution. As , and P well preserve the sparsity of A, encoding can be completed without anycostlydense matrixoperation.

All thesteps in ourproposedencodingmethod and their computational complexity with regardto code lengthn are listedTable1.

Tab.1 Computation ofcT= [ (P B sT)]

Step Operation Comment Complexity

1 B

sT

Matrix X Vector (n)

2 P B sT Matrix X Vector (n)

3 -1

(P

B.

sT)

Forward substitution (n) 4

-1.

[

1l (P.

B sT)] backward substitution (n)

In actual encoding process, LU factorization will be preprocessed by software, and the matrix multiplication, backwardand forward substitution will be donebyhardware.

The entireencodingframeworkis shown inFig.1.

nm m

m

Messa e Co e

oc nco er or s

Fig.1 TheEncodingFramework

3 Preprocessing

Inpreprocessing step, the nonsingular submatrix A will

be decomposed into a lower triangular matrix and an uppertriangularmatrix and store the rowpermutations into Patthe sametime.

Here we introduce a simple and efficientalgorithm to get , and P. First a little example will be given to illustrate the extended LUfactorizationalgorithm.

11100 10110 LetA= 0010 1 1 00 1 0 0011 1 Step1:Initialization

1 0000 1 1 1A 0 0

0 1 000 1 0 1 1 0

00 1 00 00 1 0 1

000 1 0 1 00 1 0

00001 00111

Step2:Gaussianelimination of colunm 1 in 1 0 0 0 0 1 1 1 0 0

1 1 000 0 1 0 1 0

00 1 00 00 1 0 1

1 00 1 0 0 1 1 1 0

00001 00111

Step3: Gaussian elimination ofcolumn2,3in

1 0000 1 1 1 0 0

1 1 000 0 1 0 1 0

00 1 00 00 1 0 1

1 1 1 1 0 0000 1

O 0 1 0 1 0 0 01 0

Step4: Pivoting byswapping row 4 and 5 in U 1 0 0 0 0 1 1 1 0 0

1 1 000 0 1 0 1 0

00 1 00 00 1 0 1

O 0 1 1 0 0 00 1 0

1 1 1 0 1 0000 1

By now we get , and P.

p

10 0 0 0 0 1 0 0 0 00 1 00 000 1 0 0000 1 p10

0100 0000

p10 0100 0000

p10 0100 0000

00 10 0

00 10 0

00 0 00 0 0 11 0

0 00 0 0 01 0 0 1

0 0 0 0 0 0 1 00 00 10 1 0

Now we give the full LU decomposition algorithm, suppose A is anonsingularmatrix ofsize[n n].

J)

Initialization:

=I

n' =A, P=l

n

©)

Fori=1: n, check

Uii

IfU1=0

Findthe first

Ul.0,

(k=i+l,i+2, ,n) Exchangerowi and k in and P

Exchangethe first(i-1)elements of row i and kin wheni 1

Go to (

IF

Uli.0,

(i

k<n)

notfound, A is nonsingular Else if

Uii

0,go to

(I

0 Perform Gaussianeliminationincolumn i Add 1 in corresponding positions of Goto 0) when i=n

169

(3)

Fig. 2 Overview of hardware encoder architecture

4 ncoder architecture and sub circuits

4.1 Pipelined encoder architecture

The job left forhardware encoder is to compute the paritycheck bits according to operations listed in table

1,

totally two matrix byvectormultiplications, one forward substitutionand one backwardsubstitution. Anoverview of ourhardware encoder architecture is shown in Fig. 2. The operations aregroupedinto four stages that run in parallel anddouble bufferingisused between the stages. The stages have been carefully partitioned to balance the workloads between the stages, while minimizing the overall latency, idle times and memory requirements in buffering. This flexible pipelined encoder structure can support any rate and block lengths.

In stage 1, we simply write the message block into buffers, as the message block length is k, this stagewill take k clockcycles.Instage 2, weperform operations 1and 2listed in table 1, the results are also buffered beforethey are fed to the next stage. In stage 3, all the remaining operations needed to compute the parity check bits are performed.Instage4,theparitycheck bits and the buffered messagebits arecombinedtogeneratethe codeword.

This architecture is optimized forrate 1/2

codes,

for the lengthsof their message bits andparitycheckbits are the same, which can naturally balance theworkloads and cyclesfor stage 1,4and stage2,3without

special alteration,

and minimize the memory sizes and idle

cycles.

4. Matri

-vector

multiplication

The main operations involved in stage 2 is matrix vector multiplication (MVM), MVM computes * = where isanmatrix, and arevectors, and is what

we want tocompute. Notice the fact that X issparse, it will be efficient to store the locations of ones instead of store the whole matrix directly. In our implementation, the location of ones in each row is stored with an extra bit indicating the end of a row. For example, if

1 1100 10110

A=0 0101 ,it will be stored as shown in Table 2 100 1 0

0011 1

Tab. 2 Thestorage of A in memory

Address 0 1 2 3 4 5 6 7 8 9 10 11 12

Data 1 2 3 1 3 4 3 5 1 4 4 5 6

End

flag

0 0 1 0 0 1 0 1 0 1 0 0 1

x y z

Fig.3Circuit for matrix-vectormultiplication Thecircuit for MVM is shown inFig. 3,thelocations of ones areusedasindex for , endflagsareusedasindex for , an XOR gate and a D flip -flop accumulate the results.

170

(4)

4.3 orward and backward substitution

Consider the equation = where is a sparse triangular matrix, is the vector we want to compute.

Normally we can compute -1. , however matrix inversion will benightmare to hardware implementation.

One better way is to useforward or backward substitution, as Xistriangular. The circuit is shown inFig.4.

x y z

Fig.4Circuit for forward and backward substitution The abovecircuit is similar to that inFig.4,except that thepreviouselements of arealso needed inprocessing, the index calculator computes the location of tobe read and tobe written, the data from X matrix indicates the locations of that are needed to compute the current element of

5 Implementation results

Thepreprocessorhas been implemented using Matlab, the scatter plots of a 768X1536 Hmatrix afterpreprocess areshown inFig.5 (a)-(d),where ones appear as dots.

Fig.5 (a) Scatterplot of H Fig.5 (b) Scatter plot of L

The code words generated have been verified against Matlab for correctness. An encoder for a rate 1/2 regular (3,6) length 1536 LDPC code has beenimplementedona Altera stratix EPlS8OB596C device. The design takes 5 the logic resource and 20 the memory resource of the device. The encoder runs steadily at the clock rate of 64Mhz,and theequivalentcodewordthroughis 31Mbps.

6 Conclusion

Wehavepresentedthe hardware designofapipelined LDPC encoder based on extended LU factorization with pivoting, which hascomplexity linear to block length,and can support arbitrary H matrices. Efficient software has been written to preprocess the H matrix and generate the L, U, P for hardwareencoders.

Our ongoing work includes implementation of a parameterized LDPC decoder supportarbitraryHmatrices, and optimize it for low powerapplications.

References

[1] R. Gallager. Low-density Parity-Checkco&

rans. on n orm. eory,1962, 8:21-28

[2] R. M. Neal. Sparse matrix methods andprobabilistic algorithm. A ro ram n Codes, Systems, and

rap ic odels, 1999

[3] T. Richardson, R. Urbanke. Efficient encoding of low-density parity-check codes. rans. on

norm.Theory,2001, 47:638-656

[4] D. J. C. Mackay, S. T. Wilson, M. C. Davey, Comparison of constructions of irregular Gallager codes, rans. on Communications, 1999, 47:1449-1454

[5] D. Haley, A. Grant, J. Buetefer, Iterative encoding of low-density parity-check codes, in Australian Communication eory or s op, 2002, Feb.

[6] , A family ofirregular LDPC codes with low encoding complexity. Communication etters, 2003,Feb,7:79-81.

[7] R. L. Townsend, E. J. Weldon. Self-orthogonal quasi-cyclic codes. IEEE Trans. on

Inform.

Theory, 1967, 13:183-195

Fig.5 (c) Scatter plot of U Fig.5 (d) Scatter plot of P

171

參考文獻

相關文件

Part (a) can be established based on the density formula for variable transforma- tions.. Part (b) can be established with the moment

C) protein chains maintained by interactions of peptide backbones D) amino acid sequence maintained by peptide bonds. E) protein structure maintained through multiple hydrogen

The molal-freezing-point-depression constant (Kf) for ethanol is 1.99 °C/m. The density of the resulting solution is 0.974 g/mL.. 21) Which one of the following graphs shows the

18) Consider the general valence electron configuration of ns2np5 and the following statements:. (i) Elements with this electron configuration are expected to form

When planning for a holistic school‐based literacy programme, teachers should be aware of the 

(3) Record of pupils using the Geography Room during lunch time for geographical study and activities should be kept.. (4) Booking record should be kept for using the room

Wang, Solving pseudomonotone variational inequalities and pseudocon- vex optimization problems using the projection neural network, IEEE Transactions on Neural Networks 17

Define instead the imaginary.. potential, magnetic field, lattice…) Dirac-BdG Hamiltonian:. with small, and matrix