An fficient ow Comple ity PC ncoder Based On actorization ith Pivoting
ia-ning Su, hou iang, Ke iu, iao-yang
eng, aoMin
(ASIC System State Key ab, udan niversity, Shanghai 00433, China) mail:
insufudan.edu.cn
Abstract
In this paper, we present an efficient encoder for regular and irregular low-density parity-check ( PC) codes withits comple ity linear to code length. Inspired by the idea of eal, we further e ploit the sparsity of parity check matri of PC codes and use e tended factorizationwith pivoting in encoding process, which is fle ible andsupporting arbitrary matrices, code rate andblock lengths. An P Aimplementation for a rate 1/ regular (3,6) length 1536 PC code encoder is provided with throughput of 31Mbps. An efficient memory organization for storing and performing computationsonsparsematrices is also presented.
1 Introduction
In the past few years, LDPC
codesE11
have received much attention due to their excellent performance and inherent parallelismindecoder. LDPCcodesare nowbeing considered as the most promising candidate forward error correction(FEC) schemeforawiderangeofapplications
in telecommunications and storagedevices. In2004, Europesdigital video broadcasting (DVB) standards group has selected LDPC codes over Turbo codes for the next
generationdigitalsatellite-broadcastingstandard.
Despite the better performance and lower
decoding
complexity compared to Turbo codes, LDPC codes have large encoding complexity, which virtually constitutes the main obstacle preventing its hardwareimplementation
in modemcommunications.Fewattemptshavebeen madetoimplementtheencoder directlythroughdense matrix
operations,
for itwill result in complexityquadratic
to codelengths.
The firstencoding
method with linearcomplexity
is introducedby Neal[21,
he used LU decomposition to free theencoding
process of dense inverse operation, however, it is not easy to findagood sparse LU
decomposition
forarbitrary
H matrices.Richardson also developed an efficient means ofencoding anLDPC
code131,
but hismethodrequires lots ofpreprocess before the encoder can do its job. Other various encoding methods include MacKays lower triangularmethod143,
Haleys iterative encodingmethod[5],
quasi-cyclicmethod161
are all aiming at LDPC codes with specific constructions, notadapted for arbitrary LDPC codes.
Intherestof thispaper, wewill first briefly review the encoding using LU decomposition, then a simple and efficient method forfindingasparseLU decompositionwill beintroduced, followed by the complexity analysisandwe will also present a pipelined encoder architecture with description of main components, and the last part is the implementation results.
Overview
In systematic encoding, the parity check matrix ofan (n,k)LDPCencodercanbepartitionedintotwosubmatrices B and A with dimensions [(n-k) k] and [(n-k) (n-k)]
respectively,that is =[B A].
Correspondingly,
acodeword may be split into systematic form with the first k bits as sourcemessagebitssand theremaining (n-k) bitscasparity checkbits,suchascodeword=[s c].Ingeneral encoding method,wehave codewordT=O, B A K
]=0,thatis
cT=XAl B sT (1)Inthe above formula,
A71
is a dense matrix and the multiplicationwith dense matrixwill be hardwarecost.Ourproposed encodingmethod will take
advantage
of thesparsityof A, letsrewrite
(1)as: A CT= BST,
and lety=B
ST,
wehave: A cT=y.(2)
Notice the fact that when A is
nonsingular,
it can be decomposed into alowertriangular
matrix andanupper triangularmatrix ,thatis A= ,butasit isalways
the casethatnotall theprinciple
minors ofAisnonsingular,
the0-7803-9210-8/05/$20.00
©2005 IEEE168
LU decomposition may not exist, thus an extended LU factorization with pivotingwillbe used here,
P A= (3)
Here P is the permutation matrix that records the row permutationsduringLU
factorization,
substitute (3)into (2)we can get:
CT
=P.y (4)Let .CT=z,wehave: z=P y
z canbe obtained through forward substitution, andcT can be obtainedthrough backward substitution. As , and P well preserve the sparsity of A, encoding can be completed without anycostlydense matrixoperation.
All thesteps in ourproposedencodingmethod and their computational complexity with regardto code lengthn are listedTable1.
Tab.1 Computation ofcT= [ (P B sT)]
Step Operation Comment Complexity
1 B
sT
Matrix X Vector (n)2 P B sT Matrix X Vector (n)
3 -1
(P
B.sT)
Forward substitution (n) 4-1.
[1l (P.
B sT)] backward substitution (n)In actual encoding process, LU factorization will be preprocessed by software, and the matrix multiplication, backwardand forward substitution will be donebyhardware.
The entireencodingframeworkis shown inFig.1.
nm m
m
Messa e Co e
oc nco er or s
Fig.1 TheEncodingFramework
3 Preprocessing
Inpreprocessing step, the nonsingular submatrix A will
be decomposed into a lower triangular matrix and an uppertriangularmatrix and store the rowpermutations into Patthe sametime.
Here we introduce a simple and efficientalgorithm to get , and P. First a little example will be given to illustrate the extended LUfactorizationalgorithm.
11100 10110 LetA= 0010 1 1 00 1 0 0011 1 Step1:Initialization
1 0000 1 1 1A 0 0
0 1 000 1 0 1 1 0
00 1 00 00 1 0 1
000 1 0 1 00 1 0
00001 00111
Step2:Gaussianelimination of colunm 1 in 1 0 0 0 0 1 1 1 0 0
1 1 000 0 1 0 1 0
00 1 00 00 1 0 1
1 00 1 0 0 1 1 1 0
00001 00111
Step3: Gaussian elimination ofcolumn2,3in
1 0000 1 1 1 0 0
1 1 000 0 1 0 1 0
00 1 00 00 1 0 1
1 1 1 1 0 0000 1
O 0 1 0 1 0 0 01 0
Step4: Pivoting byswapping row 4 and 5 in U 1 0 0 0 0 1 1 1 0 0
1 1 000 0 1 0 1 0
00 1 00 00 1 0 1
O 0 1 1 0 0 00 1 0
1 1 1 0 1 0000 1
By now we get , and P.
p
10 0 0 0 0 1 0 0 0 00 1 00 000 1 0 0000 1 p10
0100 0000
p10 0100 0000
p10 0100 0000
00 10 0
00 10 0
00 0 00 0 0 11 0
0 00 0 0 01 0 0 1
0 0 0 0 0 0 1 00 00 10 1 0
Now we give the full LU decomposition algorithm, suppose A is anonsingularmatrix ofsize[n n].
J)
Initialization:=I
n' =A, P=l
n©)
Fori=1: n, checkUii
IfU1=0
Findthe first
Ul.0,
(k=i+l,i+2, ,n) Exchangerowi and k in and PExchangethe first(i-1)elements of row i and kin wheni 1
Go to (
IF
Uli.0,
(ik<n)
notfound, A is nonsingular Else ifUii
0,go to(I
0 Perform Gaussianeliminationincolumn i Add 1 in corresponding positions of Goto 0) when i=n
169
Fig. 2 Overview of hardware encoder architecture
4 ncoder architecture and sub circuits
4.1 Pipelined encoder architecture
The job left forhardware encoder is to compute the paritycheck bits according to operations listed in table
1,
totally two matrix byvectormultiplications, one forward substitutionand one backwardsubstitution. Anoverview of ourhardware encoder architecture is shown in Fig. 2. The operations aregroupedinto four stages that run in parallel anddouble bufferingisused between the stages. The stages have been carefully partitioned to balance the workloads between the stages, while minimizing the overall latency, idle times and memory requirements in buffering. This flexible pipelined encoder structure can support any rate and block lengths.In stage 1, we simply write the message block into buffers, as the message block length is k, this stagewill take k clockcycles.Instage 2, weperform operations 1and 2listed in table 1, the results are also buffered beforethey are fed to the next stage. In stage 3, all the remaining operations needed to compute the parity check bits are performed.Instage4,theparitycheck bits and the buffered messagebits arecombinedtogeneratethe codeword.
This architecture is optimized forrate 1/2
codes,
for the lengthsof their message bits andparitycheckbits are the same, which can naturally balance theworkloads and cyclesfor stage 1,4and stage2,3withoutspecial alteration,
and minimize the memory sizes and idlecycles.
4. Matri
-vectormultiplication
The main operations involved in stage 2 is matrix vector multiplication (MVM), MVM computes * = where isanmatrix, and arevectors, and is what
we want tocompute. Notice the fact that X issparse, it will be efficient to store the locations of ones instead of store the whole matrix directly. In our implementation, the location of ones in each row is stored with an extra bit indicating the end of a row. For example, if
1 1100 10110
A=0 0101 ,it will be stored as shown in Table 2 100 1 0
0011 1
Tab. 2 Thestorage of A in memory
Address 0 1 2 3 4 5 6 7 8 9 10 11 12
Data 1 2 3 1 3 4 3 5 1 4 4 5 6
End
flag
0 0 1 0 0 1 0 1 0 1 0 0 1x y z
Fig.3Circuit for matrix-vectormultiplication Thecircuit for MVM is shown inFig. 3,thelocations of ones areusedasindex for , endflagsareusedasindex for , an XOR gate and a D flip -flop accumulate the results.
170
4.3 orward and backward substitution
Consider the equation = where is a sparse triangular matrix, is the vector we want to compute.Normally we can compute -1. , however matrix inversion will benightmare to hardware implementation.
One better way is to useforward or backward substitution, as Xistriangular. The circuit is shown inFig.4.
x y z
Fig.4Circuit for forward and backward substitution The abovecircuit is similar to that inFig.4,except that thepreviouselements of arealso needed inprocessing, the index calculator computes the location of tobe read and tobe written, the data from X matrix indicates the locations of that are needed to compute the current element of
5 Implementation results
Thepreprocessorhas been implemented using Matlab, the scatter plots of a 768X1536 Hmatrix afterpreprocess areshown inFig.5 (a)-(d),where ones appear as dots.
Fig.5 (a) Scatterplot of H Fig.5 (b) Scatter plot of L
The code words generated have been verified against Matlab for correctness. An encoder for a rate 1/2 regular (3,6) length 1536 LDPC code has beenimplementedona Altera stratix EPlS8OB596C device. The design takes 5 the logic resource and 20 the memory resource of the device. The encoder runs steadily at the clock rate of 64Mhz,and theequivalentcodewordthroughis 31Mbps.
6 Conclusion
Wehavepresentedthe hardware designofapipelined LDPC encoder based on extended LU factorization with pivoting, which hascomplexity linear to block length,and can support arbitrary H matrices. Efficient software has been written to preprocess the H matrix and generate the L, U, P for hardwareencoders.
Our ongoing work includes implementation of a parameterized LDPC decoder supportarbitraryHmatrices, and optimize it for low powerapplications.
References
[1] R. Gallager. Low-density Parity-Checkco&
rans. on n orm. eory,1962, 8:21-28
[2] R. M. Neal. Sparse matrix methods andprobabilistic algorithm. A ro ram n Codes, Systems, and
rap ic odels, 1999
[3] T. Richardson, R. Urbanke. Efficient encoding of low-density parity-check codes. rans. on
norm.Theory,2001, 47:638-656
[4] D. J. C. Mackay, S. T. Wilson, M. C. Davey, Comparison of constructions of irregular Gallager codes, rans. on Communications, 1999, 47:1449-1454
[5] D. Haley, A. Grant, J. Buetefer, Iterative encoding of low-density parity-check codes, in Australian Communication eory or s op, 2002, Feb.
[6] , A family ofirregular LDPC codes with low encoding complexity. Communication etters, 2003,Feb,7:79-81.
[7] R. L. Townsend, E. J. Weldon. Self-orthogonal quasi-cyclic codes. IEEE Trans. on
Inform.
Theory, 1967, 13:183-195Fig.5 (c) Scatter plot of U Fig.5 (d) Scatter plot of P