ChanhoYoon,EunyoungChoi,MinhoCheongandSok-kyuLee ArbitraryBitGenerationandCorrectionTechniqueforEncodingQC-LDPCCodeswithDual-DiagonalParityStructure

(1)

Arbitrary Bit Generation and Correction

Technique for Encoding QC-LDPC Codes with Dual-Diagonal Parity Structure

Chanho Yoon, Eunyoung Choi, Minho Cheong and Sok-kyu Lee Next Generation WLAN Research Team, ETRI

161 Gajeong-dong, Yuseong-gu, Daejeon, 305-700, KOREA Email: [email protected]

Abstract— In this paper, we propose a simple yet low complex systematic LDPC encoding method for class of quasi-cyclic low- density parity-check (QC-LDPC) codes which have an efficient encoding / decoding algorithm due to the simple structure of their parity-check matrices. The proposed encoding method is applicable to parity-check matrices having dual-diagonal parity structure with single column of weight 3. Unlike finding a direct solution for parity bits in schemes [5][6], the proposed scheme first generates arbitrary parity bits. Then, given the parity bits for the first sub-block and exploiting the dual-diagonal structure, all parity bits are found through correction. With slight modification of parity-check matrix H, proposed LDPC encoding scheme is directly applicable to matrices defined in IEEE physical layer standards [1][2] with almost negligible performance loss. More- over, the overall computational complexity involving encoding process is lower than well-known Richardson’s efficient encoding scheme [5].

I. INTRODUCTION

Low-density parity-check (LDPC) codes were first discovered by Gallager and rediscovered by MacKay et al. [4]. LDPC codes attracted attentions mainly due to their performance and low decoding complexity. Its remarkable performance is very close to Shannon capacity limit under assumption of having long codeword length.

In many ways, LDPC codes exhibit an asymptotically better performance and lower complexity than turbo codes in terms of decoding complexity. Recently, many practical efforts to implement LDPC codes have been clearly observable. LDPC may end up as the standard scheme in a host of sectors including communications, broadcasting and even hard disk drives (HDD). In the area of WLAN (Wireless Local Area Network) applications, the IEEE 802.11n Task Group (TGn) has approved the Enhanced Wireless Consortium (EWC) pro- posal for draft version of 802.11n standard, and the upcoming 802.11n standard adopts LDPC channel coding as optional coding scheme to offer higher reliability in PHY level [1].

Along all the advantages of using LDPC codes are listed, there are some weak points in LDPC codes to be mentioned.

They require large memory for processing, and encoding complexity is generally higher than convolutional Turbo codes [4].

In order to reduce both overall complexity regarding memory size and encoding complexity, quasi-cyclic LDPC codes (QC- LDPC) [7] were employed to resolve complexity issues while performance is almost the same as general LDPC codes.

Another advantage of QC-LPDC codes is easier analysis of their code properties than in the case of random LDPC codes.

In fact, the LDPC parity-check matrices in standards such as IEEE 802.11n [1] and IEEE 802.16e [2] are based on QC- LDPC codes.

The quasi-cyclic LDPC codes has memory efficiency due to their algebraic structure, compared to general LDPC codes.

It can be encoded in linear time with shift registers with required memory for storing QC-LDPC codes reduced by a factor 1/Z, if Z × Z circulant permutation matrices are used. Moreover, further efforts have been made to reduce the encoding complexity. Some low-complexity LDPC encoding methods having near-linear complexity are introduced [5] to lower encoding complexity, and its encoding method can be further simplified by employing LDPC codes whose binary base parity-check matrices have dual diagonal structure with a single weight-3 column, also presented in standards such as IEEE 802.11n and IEEE 802.16e.

In this paper, we propose a novel low-complexity correction- based QC-LDPC encoding technique. The proposed encoding method is directly applicable to usual dual-diagonal based QC- LDPC codes if little modification is allowed in parity part of the mother matrix H. Unlike finding the direct solution for parity bits in schemes [5][6], the proposed scheme first generates arbitrary parity bits. Then, given the parity bits for the first sub-block and exploiting the dual-diagonal structure, all parity bits are found. Lastly, arbitrarily generated bits and their sequential results are corrected. We report that proposed encoding scheme is applicable to transmission standards in [1][2], if little modification is allowed to standard H matrices, while complexity of proposed encoding scheme is lower than well-known Richardson’s efficient encoding scheme [5].

The rest of this paper is organized as follows. In section II, we describe the general structure of QC-LDPC codes.

Next, we present conventional and proposed LDPC encoding scheme in section III. In section IV, we evaluate complexity and compare them. In section V, link-level performance of modified version of draft 802.11n parity check matrices H is compared to that of original H matrices. Lastly, conclusion is made on section VI.

(2)

II. QC-LDPC CODES WITHDUAL-DIAGONALPARITY

STRUCTURE

In this section, we first overview the QC-LDPC codes based on circulant sub-matrices. QC-LDPC codes can be encoded with low complexity even in the case of long codes. The encoder of quasi-cyclic LDPC codes can be implemented using shift registers where the complexity of encoding would be linearly proportional to the code length.

In QC-LDPC codes, the parity-check matrix H can be partitioned into square sub-blocks (sub-matrices) of size Z× Z. For example in [1], three sub-block sizes are suggested, as Z=27, Z=54, Z=81. Let P^k_i,j be Z × Z zero sub-block or identity matrix I with permutation located at i-th row and j-th column with k times cyclic shift,0 ≤ k < Z, to the right. With basic sub-block matrices P, mZ× nZ parity check matrix H is defined as

H=







P^k_0,0^0,0 P^k_0,1^0,1 · · · P^k_0,n−1^0,n−1 P^k_1,0^1,0 P^k_1,1^1,1 · · · P^k_1,n−1^1,n−1

... ... · · · ... P^k_m−1,0^m−1,0 P^k_m−1,1^m−1,1 · · · P^k_m−1,n−1^m−1,n−1





 (1)

The QC-LDPC codes suggested in the latest high throughput PHY standards such as [1][2] are systematic, i.e.

it encodes an information block of size k, s = (s₀, s₁, ..., s_k−1)^T into a codeword vector c of size n, c = (s₀, s₁, ..., s_k−1, p₀, p₁, ..., p_n−k−1)^T, by addingn − k parity bits obtained so that it must satisfy following Eq. (2)

H· c = 0 (2)

where H is an parity-check matrix. The matrix H is divided into two region. H_s is the sub-matrix for region where sys- tematic bits are multiplied and H_p represents the region where parity portion of codeword is multiplied to H matrix such that H = [H_s H_p].

The QC-LDPC codes in [1][2] have dual diagonal parity structure. The parity of portion of matrix H_p can be further decomposed into two sub matrices as

H_p = [hp H_p]

=







h0 0

h1 0 0 −

h2 0 . ..

· . ..

· 0

· − 0 0

hm−1 0







(3)

where 0 denotes identity matrix I_Z×Z with zero cyclic shift.

Vector-like sub-matrix h_p is composed of weight-3 columns (e.g. h_p = [1, −, ..., 0, −, ..., 1]^T), while h0 denotes the cyclic shift at 1st row. Consequently, matrix H_p becomes a dual-diagonal structure.

X

W

X Z

ZW

\^

H =

Systematic part (972 bits) Parity (972 bits) R = 1/2

]Y Y_

\]

[\

\Z

Y[

Y

^`

][

]\

]`

W [W

Z_

_ ]]

\Z Y[

W

\W

\^

^W YW

]X

^`

Z

\]

[Y XX

\Y

\^

Z^

]W Z\

W X[

_

X]

ZY Y^

XY

` X[

^

Y^

YY

\\

\]

\W

Z\

\X

^^

ZW

^Y

\Y

\W Y_

^` W

W

W W

W W W

W W

W W W W

W W W

C D E

B

Fig. 1. H matrix for codeword length 1944 of 802.11n draft with R=1/2

W

W Z

ZW

\^

H =

Systematic part (972 bits) Parity (972 bits) R = 1/2

]Y Y_

\]

[\

\Z

Y[

Y

^`

][

]\

]`

W [W

Z_

_ ]]

\Z Y[

W

\W

\^

^W YW

]X

^`

Z

\]

[Y XX

\Y

\^

Z^

]W Z\

W X[

_

X]

ZY Y^

XY

` X[

^

Y^

YY

\\

\]

\W

Z\

\X

^^

ZW

^Y

\Y

\W Y_

^` W

W

W W

W W W

W W

W W W

Replace with zero cyclic shift

Fig. 2. Modified H matrix for codeword length 1944 of 802.11n draft with R=1/2

III. ENCODINGPROCEDURES FORQC-LDPC CODES

With a high value for codeword length n, LDPC coding also imposed a significant increase in computational overhead.

This problem has also been addressed. During coding, LDPC computational overhead is proportional to n² or n³. This means that as n increases, the overhead increases significantly faster than with Reed-Solomon coding. This problem was overcome by T. Richardson and others [5], who in 2001 discovered a method that effectively reduced the amount of computation required in a number of special situations.

Through dual-diagonal parity structure, modified structure of parity search matrix H has greatly reduced encoding com- plexity so that computational overhead is proportion to n, in the same way as in Reed-Solomon coding. In this section, we compare our proposed encoding scheme to well-known Richardson’s scheme.

A. Conventional Efficient Encoding Scheme

The efficient encoding method proposed by Richardson, et al. [5] assumes H as approximate lower triangular form. The parity-check matrix H is in the form

H =

A B T

C D E

(4) where A is Z(m - 1)×Z(n - m), B is Z(m - 1)×Z, T is (m - Z)×(m - Z), C is Z×Z(n - m), D is Z×Z, and, finally, E

(3)

is Z×Z(m - 1). Therefore, sub-matrices A and C corresponds to systematic part H_s, and sub-matrices B, D, T, E come under parity part H_p. Further, vector h_p in Eq.(3) becomes h_p = [B^T D^T]^T. All these sub-matrices are sparse and T is lower triangular with identity matrices along the diagonal.

Through some manipulation of matrix operations, the H matrix in Eq.(4) with relation to codeword vector c in Eq.(1), c = (s^T, p^T₀, p^T₁)^T, is summarized as

As+ Bp₀+ Tp₁= 0 (5) (−ET⁻¹A+ C)s + (−ET⁻¹B+ D)p₀= 0 (6) where −ET⁻¹ is Z × Z sub-block by sub-block addition operation (i.e. I -I +I -I.... -I +I) which accumulates columns of sub-matrix A. Note that−ET⁻¹B+D = I since addition of all sub-block matrices at weight-3 part of matrix H_psuggested in standards such as [1] results simplyZ ×Z identity matrix I.

Solving Eq.(5) and (6) leads to direct solution of parity vectors p₀ and p₁. Thus, each parity bit vectors can be induced as

p₀ = (−ET⁻¹A+ C)s (7)

Tp₁ = As + Bp₀ (8)

In summary, 1st parity vector p₀is obtained through accumu- lation of input bits. 2nd parity vector p₁ is obtained through block accumulation operation plugging p₀to Eq.(8), exploiting dual-diagonal lower triangular matrix T.

B. Proposed Arbitrary Bit-Generation and Correction Encod- ing

In order to describe the encoding process with standard H matrices in [1], modification of parity-check matrix is required. As described in Fig.2, parity portion of weight-3 column is set to all zero cyclic shift. Although modifying the parity matrix H_p could result performance loss due to short cycles which would introduce error-floor, we later prove from simulation in section V that performance degradation do not occur considerably even for a short codeword length with high code rate setting.

There are two main phases to our approach of encoding:

first is the arbitrary parity-bit generation, second is sequential process to find remaining parity-bits exploiting dual-diagonal structure, and third is correction process for parity-bits.

As it is true for all type to LPDC codes, the parity-check result of output code word vector c should meet H· c = 0.

After modification of rate R=1/2 mother matrix H, it can be sectorized into three sub matrices as shown in Fig.3. The information bit region A, parity bit region Q for bit-flipping operation and parity bit region U for non bit-flipping,

H = [A Q U] (9)

x = As (10)

Parity part of matrix H is partitioned into two parts as Q and U. The boundary line is placed between second and third sub-block where three identity matrices are placed in a row.

A Q

H = U

Systematic part (972 bits) Parity (972 bits)

Fig. 3. Sectorized H matrix for codeword length 1944

For example, three sub-blocks with zero cyclic shift is located at ^m₂-th row in Fig.3. Thus, boundary line between Q and U is set at ^m₂-th and (^m₂ + 1)-th column.

The vector x is formed by multiplying information bit vector s to sub-matrix A, as defined in Eq. (10). The proposed LDPC encoder starts encoding by generating Z arbitrary parity bits p0, p1, ..., pZ−1 for first column sub- block in region Q. For example, all zeros can be set for p0, p1, ..., pZ−1. Assuming all zero is correct, parity- bit values for p_Z, p_Z+1, ..., p_2Z−1 are determined since (x₀, x₁, ..., x_Z−1)^T + Q_0...2Z−1 · (p₀, p₁, ..., p_2Z−1)^T = 0 is true for first sub-block row. Next, p_2Z, p_2Z+1, ..., p_3Z−1 are determined sequentially since (x_Z, x_Z+1, ..., x_2Z−1)^T + Q_Z...3Z−1 · (p_Z, p_Z+1, ..., p_3Z−1)^T = 0. Note that p_Z, p_Z+1, ..., p_2Z−1 as well as x_Z, x_Z+1, ..., x_2Z−1 are pre- viously found. Exploiting the dual-diagonal parity structure, this recursive procedure is done until all parity bits (i.e.

p0, ..., p_(mZ−1)) are determined.

After recursion procedure, validity of last sub-block parity bits located at (m-1)-th row, p_(m−1)Z−1, p_(m−1)Z, ..., p_mZ−1, is checked. It must hold true that last Z parity bits must check by satisfying (x_(m−1)Z, x_(m−1)Z+1, ..., x_mZ−1)^T + (p0, p1, ..., pZ−1)^T+(p_(m−1)Z, p_(m−1)Z+1, ..., pmZ−1)^T = 0.

If some parity bits are not correctly generated, their check results are not zero, and check is failed for specific bits. The final check results are stored in a vector f.

f = (x_(m−1)Z, ..., xmZ−1)^T + (p0, p1, ..., pZ−1)^T + (p_(m−1)Z, p_(m−1)Z+1, ..., pmZ+1)^T (11) Thus, vector f is defined as f = (x₈₉₁, ..., x₉₇₁)^T + (p0, p1, ..., p81)^T + (p891, p892, ..., p971)^T in case of R=1/2, codeword length n=1944, sub-block size Z=81 in [1]. With this final check result vector f, parity bits located in region Q are corrected by observing location of ones in f. For example, if f13 and f30 are one, then parity bits p13, p30, p94, p111, p175,p192, p256, p273, p337, p354, p418, p435,p499, and p516

in region Q are flipped. This bit by bit flipping operation can be replaced by simply doing XOR operation to region Q with vector v which is an augmented version of vector f. Let vector v is augmented vector of f with row length of sub-matrix Q. Thus, v is expressed as v = (f^T, f^T, f^T, f^T, f^T, f^T, f^T)^T.

(4)

W W

W X

X W

W X

W W

X W W

X W

X X

W W

X W

W X

W W

W

WGGGGGWGGGGGWGGGGGWGGGGGWGGGGGXGGGGGWGGGGGXGGGGGXGGGGGWGGGGGWGGGGGXGGGGGWGGGGGWGGGGGXGGGGGW

p₄ = x₀ + p₀ p₅ = x₁ + p₁ p6 = x2 + p2

p₇ = x₃ + p₃

x₁₂ + p₀ + p₁₂ = 1 x₁₃ + p₁ + p₁₃ = 0 x₁₄ + p₂ + p₁₄ = 0 x₁₅ + p₃ + p₁₅ = 0 parity bits

GWGGGXGGGYGGGZGGG[GGG\GGG]GGG^GGG_GGG`GGGXWGGGXXGGGXYGGXZGGX[GGX\

X W W W

f

W X W X X X W W X W X X X W X W

X

Q U

Fig. 4. Simple example of proposed encoding process with Z=4 (4 × 4) sub- block size

The parity bits in region Q can be corrected alternatively by vector addition(p₀, ..., p₅₆₆)^T = (p₀, ..., p₅₆₆)^T+ v. Note that no bit-flipping or XOR operation is required for parity bits in region U. The proposed LDPC encoding is summarized as following steps.

Step 1) Form accumulated information-bit vector x by doing matrix operation x = As.

Step 2) Set parity bits p0, p1, ..., pZ−1 as arbitrary binary values. Exploiting the dual-diagonal parity structure, solve unknown parity bits, H · (s^T, p0, ..., pmZ−1)^T = 0, by recursion.

Step 3) Store final check result vector f, (f₀, ..., f_Z−1)^T

= (x_(m−1)Z, ..., x_mZ−1)^T + Q(p₀, ..., p_Z−1) + (p_(m−1)Z, ..., p_mZ−1)^T for correction of initially calculated parity bits, and create an vector v which is an augmented version of vector f with the column length of block Q; v = (f^T, f^T, ..., f^T)^T. The number of final check result vector f to be augmented is

m2 + 1 in case of 802.11n draft standard.

Step 4) Add vector v to parity bits p0, p1..., p₍^m₂_+1)Z−1 in region Q to correct them; (p₀, ..., p₍m

2+1)Z−1)^T = (p₀, ..., p₍^m₂_+1)Z−1)^T + v. Parity bits in block U are not changed.

We note that proposed encoding scheme works if the cyclic shift values at weight-3 column, h_p of H_p in Eq.(3), in parity matrix H_p are all the same. In case of H matrix in Fig.2, zero cyclic shifts are set for column h_p in comparison with Fig.1.

An example of proposed encoding scheme with Z=4, m=4 is illustrated in Fig.4. Arbitrary bits are set to all zeros for p₀, ..., p₃. Given the pre-calculated information bit block- accumulated vector x, the final check result vector f, after going through step 2, has one bit check failure out of four.

TABLE I COMPLEXITYCOMPARISON

Parameters Richardson’s Scheme Proposed Scheme

p₀ 4941 -

As 4455 4941

Bp₀ - -

Tp₁ 972 972

f - 162

v - 486

Total 10368 6561

Therefore, parity bitsp₀, p₄ andp₈ are flipped. Alternatively, bit-correction vector v = (f^T, f^T, f^T)^T is added to parity bit vector p = (p₀, ..., p₁₁)^T. Thus, final parity bits from p₀ to p₁₅are 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, respectively.

IV. COMPLEXITYCOMPARISON

In this section, we compare complexity of our proposed scheme with the Richardson’s scheme [5]. We analyze the number of modulo 2 additions required during encoding process. Since generalizing number of additions depends on av- erage number of edges, we compare their complexity through direct numerical analysis with parity-check matrix H of Fig.1 and 2.

As shown in Table 1, p₀is the parity bit vector correspond- ing to weight-3 column in H_p. In Eq.(7), row vector addition is required. There are 61 non-zero sub-blocks in matrix A and C, as described in Eq.(4). Thus, actual addition operation required in systematic part is 61×81 = 4941 additions to calculate vector p₀. Matrix operation As takes55×81 = 4455 for Richardson’s scheme and 61 × 81 = 4941 for proposed scheme. The Bp₀ operation is just construction of 972 by 1 column parity vector p₀ to cyclically shifted version of vector p₀, requiring no additions but memory. Matrix operation Tp₁ is simply a sequential process for finding solutions for parity bits as any codeword c should satisfy Eq.(2). Therefore it takes total of 12×81 = 972 additions. Extra addition is required for proposed scheme, final check result vector f is needed which takes 162 additions. It is clear that, as listed in Table 1, simple correction-based LDPC encoding scheme requires less complexity than conventional scheme.

V. SIMULATIONRESULTS

In this section, we present performance of LDPC codes by comparing simulation results on the effect of H ma- trix modification to cycle optimized standard H matrix in [1]. The encoding process itself does not make any effect on performance of dual-diagonal QC-LDPC codes, but the modification of column weight-3, illustrated in Fig.2, in H_p leads to degraded performance due to short cycles introduced during decoding process, especially of length four in the Tanner graph; it deteriorates the performance of the decoding algorithm.

The simulation results are based on matrix prototypes of parity-check matrices defined in the draft version of 802.11n standard [1]. All the block sizes, n=648, 1296, 1944, are

(5)

simulated with various code rates. We apply AWGN channel model, and the modulation is fixed to BPSK. The iterative min- sum algorithm, without considering noise varianceLc= 2/σ², is applied for decoding, and the maximum number of iteration is set to 50.

In Fig.5, simulation results on different encoding schemes using parity-check matrices defined for codeword size 648. As expected, the error floor is observable and it is noticeable at BLER10⁻⁴. Also in Fig.6 and 7, BLER performance is close to that of the standard H matrix for low code rates, but as code rate increases or codeword length decreases, error floor due to deviation from cycle optimization design is apparent. However, we estimate that error floor at BLER10⁻⁴ can easily achieve the target PER typically set at10⁻² level in general cases [3].

In other words, it is shown that the unique pattern of parity part of H matrix required by proposed encoding scheme does not induce any noticeable performance degradation in practical point of view.

VI. CONCLUSIONS

In this paper, we proposed a new low-complexity encoding method for QC-LDPC codes. We have demonstrated that overall encoding computational complexity is smaller than conventional efficient encoding scheme. Moreover, the proposed LDPC encoding scheme is directly applicable to current WLAN and WiMAX standards which have dual-diagonal structure with one weight-3 parity column, and its performance, in practical situations, is comparable to the dedicated irregular LDPC codes suggested in the standards. For classes of quasi-cyclic LDPC codes based on circulant permutation matrices, we expect that proposed encoding scheme is suitable for hardware implementation.

VII. ACKNOWLEDGEMENT

This work is supported by Ministry of Information and Communications (MIC) of Korea under Grant 2006-S-002-01.

REFERENCES

[1] IEEE P802.11n^TM/D1.02, “Draft Amendment to STANDARD Infor- mation Technology Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: Enhancements for Higher Throughput,” IEEE 802.11 document, July. 2006.

[2] IEEE P802.16e^TM, “Part 16: Air Interface for Fixed and Mobile Broadband Wireless Access Systems,” IEEE 802.16 document, Feb. 2005.

[3] A. P. Stephens “IEEE 802.11 TGn comparison criteria,” IEEE 802.11 document, doc. no. 11-03-0814-31-000n, July 2004.

[4] D.J.C. MacKay, “Good error correcting codes based on very sparse matrices,” IEEE Trans. Inform. Theory, vol.45, pp. 399-431, Mar. 1999.

[5] T.J. Richardson and R.L. Urbanke, “Efficient encoding of low-density parity-check codes,” IEEE Trans. Inform. Theory, vol.47, no.2 pp. 638- 656, Feb. 2001.

[6] B. Classon, Y. Blankeship, “Modified LDPC Matrix providing improved performance,” IEEE 802.16 document #:C802.16e-04/102r1, May. 2004.

[7] M.P.C. Fossorier, “Quasi-cyclic low density parity check codes from circulant permutation matrices,” IEEE Trans. Inform. Theory, vol.50, pp. 1788-1794, Aug. 2004.

1 1.5 2 2.5 3 3.5 4 4.5 5

10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰

Eb/No[dB]

BLER

Draft 802.11n H matrix: Z=27, Codeword Size = 648

R=1/2 (Standard H) R=2/3 (Standard H) R=3/4 (Standard H) R=5/6 (Standard H) R=1/2 (Modified H) R=2/3 (Modified H) R=3/4 (Modified H) R=5/6 (Modified H)

Fig. 5. BLER of Codeword Size 648 with Different Encoding Schemes

1 1.5 2 2.5 3 3.5 4 4.5

10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰

Eb/No[dB]

BLER

1 1.5 2 2.5 3 3.5 4 4.5

10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰

Eb/No[dB]

BLER