寬頻無線通訊系統的錯誤控制機制之設計(III)

(1)

行政院國家科學委員會專題研究計畫成果報告

寬頻無線通訊系統的錯誤控制機制之設計(3/3)

計畫類別：個別型計畫計畫編號： NSC93-2213-E-009-021- 執行期間： 93 年 08 月 01 日至 94 年 07 月 31 日執行單位：國立交通大學電信工程學系(所) 計畫主持人：蘇育德報告類型：完整報告處理方式：本計畫可公開查詢

中華民國 94 年 10 月 25 日

(2)

中文摘要

蘋狄(Pyndiah)氏所發展的渦輪解碼演算法是乘積碼的基礎，此演算法適用於任何以線性區塊碼構成的乘積碼。這個渦輪式解碼的乘積碼，現在通常被稱為渦輪乘積碼 (TPC) 或是區塊渦輪碼 (BTC) ，有著與其他解碼法不同的優點如解碼法簡單，快速收斂，高碼率 (code rate) ，以及非常低甚至沒有錯誤極限 (error floor)等。雖然其在瀑布區 (waterfall region)的表現不如傳統渦輪碼(turbo codes) 。

本計畫探討了結合區塊渦輪碼及正交分頻多工信號的可能性。我們將部分重點放在涉及到主控了系統錯誤率表現之主要的接收機設計，亦即通道估測與解碼演算法。我們檢測結合了重複估測及解碼近似的表現。至於傳送的波形設計上，我們檢驗了交錯器 (interleaver) 的作用且提出一種新型的區塊渦輪碼，我們稱之為平行串接乘積碼 (PCPC)。平行連鎖乘積碼有類似傳統渦輪碼的結構，然而是用含資料部分的乘積碼所構成。我們也證明了，我們所採用的交錯器的確有助於增加平行串接乘積碼之最小翰明距離。模擬的數值結果也告訴我們，與近似碼率之對應的區塊渦輪碼相比，平行連鎖乘積碼的確有比較好的性能表現。關鍵詞：正交分頻多工，渦輪乘積碼，交錯器，通道估測。

(3)

Abstract

A benchmark in the development of product code is the invention of the iterative (turbo) decoding algorithm by Pyndiah. Pyndiah’s algorithm works for any product code using linear block component codes. The turbo decoded product codes, now often referred to as turbo product codes (TPC) or block turbo codes (BTC), have the distinct merits of simple decoding, fast convergence, high code rate and very low (or no) error floor although their performance at the waterfall region is not as impressive as that of conventional turbo codes.

This report summarize our effort in the past year that explores the feasibility of using BTC in conjunction with the orthogonal frequency division multiplexing (OFDM) signals. We focus on the major receiver design concerns that dominate the error rate performance of the system, namely, the channel estimation and decoding algorithm. We examine the performance of a joint iterative channel estimation and decoding approach. As for the transmitted waveform design, we check the impact of the interleaver and present a new class of BTC that we refer to as parallel concatenated product codes (PCPC). The PCPC has a structure similar to the conventional turbo code with the component codes replaced by systematic product codes. Some of its algebraic properties and related decoding issues are investigated. Numerical results indicate that it does outperform its BTC counterpart with a comparable code rate.

(4)

List of Figures

2.1 _{A two-dimensional product code P = C}1_{⊗ C}2_{. (reprinted from [8]) . . . .} ₇

2.2 Shift-register division of a(x) by g(x) . . . 8 2.3 (a)Shift-register division of 1 + x + x4_{+ x}6 _{by 1 + x + x}3 _{(b)Shift-register}

cell contents during division of 1 + x + x4_{+ x}6 _{by 1 + x + x}3 _{. . . .} ₉

2.4 Example of single error correction using Extended Euclidean algorithm . 11 2.5 Block diagram of elementary block turbo decoder. (reprinted from [8]) . 15 3.1 An OFDM signal with N subcarriers and bandwidth B Hz. (reprinted

from [11]) . . . 18 3.2 A two-dimensional view of a multi-carrier signal; the inter-carrier

separa-tion f0 = any integer multiple of T1. (reprinted from [11]) . . . 19

3.3 Structure of OFDM : (a)Modulator (b)Demodulator (reprinted from [11]) 20 3.4 Effect of multipath with zero signal in the guard time; the delayed

sub-carrier 2 causes ICI on subsub-carrier 1 and vice versa. (reprinted from [11]) 22 3.5 (a)The dotted curve is a delayed replica of the solid curve. (b)Cyclic prefix

(CP): A copy of the last part of OFDM signal is attached to the front of itself, a copy of the first part of OFDM signal is attached to the back of itself. (reprinted from [11]) . . . 23 3.6 Power spectral density (PSD) without windowing for 16, 64, and 256

(7)

3.7 An extended OFDM symbol with cyclic extension and windowing. Ts

is the symbol, T the FFT interval, TG the guard time, Tpref ix the

pre-guard interval, Tpostf ix the postguard interval, and β is the rolloff factor.

(reprinted from [11]) . . . 25 4.1 A typical pilot symbol distribution in the time-frequency plane of an

OFDM signal. (reprinted from [12]) . . . 28 4.2 Two typical OFDM channel responses. They are plotted in the same

figure for the convenience of comparison. The vertical coordinate does not represent the absolute magnitude of each CR surface. (reprinted from [12]) . . . 29 4.3 (a)A conventional OFDM system structure. (b)An OFDM system with

iterative receiver structure. . . 31 4.4 A typical pilot arrangement in the time-frequency plane. . . 32 4.5 _{Double helical interleaving of (64, 57, 4) × (64, 57, 4) TPC . . . 34} 4.6 A block diagram of a double helical interleaved OFDM system with

iter-ative joint channel estimation and TPC decoding. . . 35 4.7 Performance of DHI-permuted system in a fading channel with fmT =

0.0001. . . 36 4.8 (a) Time-frequency channel response with fmT = 0.0001; (b) Channel

response with DHI permutation. . . 37 4.9 Time-frequency response of the Proakis C five-path fading channel with

fmT = 0.001. . . 38

4.10 BER Performance comparison for the channel shown in the above figure. 39 4.11 Performance curve of different pilot insert position in fmT = 0.001 fading

(8)

4.12 Estimated CR using a third-order channel model using pilot patterns (4.16) and (4.17); the true CR (solid star markers) is included for com-parison; fmT = 0.001. . . 41

4.13 Estimated CR using a fourth-order channel model using pilot patterns (4.16) and (4.17). The true CR is marked by solid stars; fmT = 0.001. . . 42

4.14 BER performance curves of the receiver with iterative joint channel esti-mation and TPC decoding in a fading channel with fmT = 0.001. . . 43

5.1 The structure of the PCPC encoder. . . 45 5.2 Block diagram of elementary APP decoder APPt‘. . . 46

5.3 An flow chart showing the message-passing of various component APP decoders and related decoding schedule. . . 47 5.4 Bit error rate performance of two PCPCs and a TPC. . . 52

(9)

Chapter 1 Introduction

Since its invention in 1993, the turbo code [9] has revolutionized the field of error-correcting codes. Over the years, we have come to realize that the extraordinary per-formance offered by such a simple-looking encoding structure is brought about by the efficient iterative decoding algorithm and the random-like interleaver. The iterative algo-rithm provides a message-exchange mechanism that collects extrinsic information from related samples to help enhancing the decision reliability. The interleaver makes sure that the range of message exchange is as large as possible and that short cycles in the associated bi-partie graph be eliminated or the number of these cycles be minimized.

The original turbo code uses two identical recursive systematic convolutional codes as its component codes. The idea of using block codes as component codes was first proposed by Pyndiah [2] who realized that a class of block codes called product codes has inherent parallel concatenation feature with a simple block interleaver. Pyndiah [2] then applied the concept of iterative (turbo) decoding to decode a product code. To distinguish this approach from the classic turbo code, product codes using iterative decoding scheme are referred to as block turbo codes (BTC) or turbo product codes (TPC).

The report concentrates on the designs of some receiver baseband subsystems of an orthogonal frequency division multiplexing (OFDM) system that employs a BTC. In

(10)

particular, we consider the issues of channel estimation, tracking and BTC decoding. Our approach follows the popular turbo paradigm, applying the turbo principle to BTC decoding as well as channel estimation.

To employ turbo decoding, soft output must be generated by the constituent block decoder. When the component codes are BCH or RS codes, to reduce decoding complex-ity, instead of MAP algorithm, one might use the Chase decoder [7] in conjunction with extended Euclid’s algorithm [5] to produce reliability estimates associated with hard-decision decoded bits or symbols. Using this hybrid algorithm, iterative decoding of product codes becomes possible although one can only expect suboptimal performance and iterative decoding gain is more significant at high SNR region. On the other hand, since the minimum distance of a typical BTC is usually much larger than that of a classic turbo code, the associated error performance curve has a much lower error floor, if it does exist.

Cyclic product codes were first introduced by Burton and Weldon in 1965 [1]. These codes enjoy the implementation advantages of cyclic codes and, in addition, possess some important structural properties:

1. Conditions are given which ensure that the product of two, three or arbitrarily finite many cyclic codes is itself a cyclic code.

2. Cyclic product codes are shown to be capable of correcting of both bursts and random errors.

3. The generator polynomial of the cyclic product code is derived and shown to be a simple function of the generator polynomials of the constituent codes.

4. The minimum distance of the resulting code is the product of those of the con-stituent codes.

In order to achieve efficient forced erasure decoders, Hirst et al. [3] re-order the Chase algorithm’s repeated decodings such that the inherent computational redundancy is

(11)

greatly reduced without degrading performance. The result is a highly efficient Fast Chase implementation.

Another key issue related to our investigation is that of channel estimation. We adopt a model-based approach as it is more robust in time-varying fading channels and requires less channel state information like the channel correlation matrix and the average signal-to-noise ratio (SNR) per bit. A two-dimensional (2D) model-based channel estimation is used in our receiver. The 2D model exploits the frequency and time domain correlation so that better channel estimate is obtained. The channel estimation algorithm consists of four steps. By dividing the time-frequency plane into sub-blocks, and with the received samples at the pilot locations, the algorithm first estimates the coefficients associated with a 2D surface model by employing least square (LS) fit on the pilots in each sub-block. Once the model coefficients are known, the frequency responses at non-pilot locations can then be computed.

To further enhance the performance of BTCs, we propose a new class of turbo-like codes called parallel concatenated product codes. PCPC improves the minimum distance property while retaining the merit of low decoding complexity of product codes. We prove that using the Fibonacci interleaver does help increasing the minimum distance. The regularity of the interleaver also reduces the implementation complexity and makes parallel interleaving feasible. A decoding method based on modified Pyndiah algorithm and a proper scheduling is suggested. Numerical examples indicate that the new PCPC outperforms Pyndiah’s product code with comparable code rate.

In summary, this report investigates the feasibility and performance of a product coded OFDM system with simple iterative joint channel estimation and decoding algo-rithms. In chapter 2, we provide an elementary introduction to TPC, followed by a brief discourse on the OFDM scheme in Chapter 3. The OFDM channel estimator and an it-erative decoding algorithm used are presented in Chapter 4 where we also introduce the helical interleaving scheme and explain its effectiveness. In Chapter 5, the class of of

(12)

par-allel concatenated product codes and related properties are presented. Related decoding algorithm and its numerical performance examples are given. Finally, Chapter 6 draws some concluding remarks and suggests a few research topics for further investigations.

(13)

Chapter 2 Introduction to Turbo Product

Codes

This chapter provides some background material for the class of product codes, its constituent codes and related decoding methods.

2.1 Product codes

The concept of product codes is very simple and relatively efficient for building very long block codes by using two or more short block codes. Fig. 2.1 shows a typical (two-dimensional) product coding scheme that arranges the information symbols in a rectangular array and encodes each row and column individually by two linear block codes C1_{and C}2_{. The resulting augmented rectangular array of Fig. 2.1 form a codeword}

of the product code C1_{⊗ C}2 _{with rows and columns being C}1 _{and C}2 _codewords.

In summary, given two systematic linear block codes C1 and C2 with parameter (n1, k1, δ1) and (n2, k2, δ2), the product code P = C1⊗ C2 is obtained ( see Fig.2.1 ) by

1. placing (k1× k2) information bits in an array of k1 rows and k2 columns.

2. coding the k1 rows using code C2.

(14)

The parameters of the product code P are thus given by n = n1× n2, k = k1× k2 and

δ = δ1× δ2, and the code rate R is given by R = R1× R2, where Ri is the code rate of

code Ci_.

The resulting code has a minimum distance equals to the product of the minimum distances of the constituent codes. Therefore, one can build very long block codes with large minimum Hamming distance by combining short codes with small minimum Hamming distance. Note that each information symbol in the rectangular is encoded by both C1 _{and C}2 _{codes and is related to different sets of information symbols. Such}

a encoding scheme is similar to that of classic turbo codes and, as a result, can be iteratively (turbo) decoded. When a turbo-like decoder is used, the class of product codes is referred to as block turbo code (BTC) or turbo product code (TPC).

In contrast to the classic convolutional turbo codes (CTC), BTC has the distinct attractive feature that high rate and large minimum distance codes can be easily found. There is no need of special interleaver design and it is not necessary to search for favor-able puncture patterns. In case elementary linear codes such as Hamming or extended Hamming codes are used as the constituent codes then the corresponding decoding com-plexity is far less than that conventional CTC and the convergence rate is often faster. Moreover, because of parallel independent encoding, the decoding procedure could be carried out in parallel and high decoding throughput is readily realizable.

2.2 A brief review of BCH codes

Using binary BCH codes instead of Hamming codes gives us a more flexible range of choices in code rate R, codeword length and minimum distance.

An (n, k, δ) binary BCH code has codeword length n, k information bits per code-word, and minimum Hamming distance 2t+1 = δ, respectively. Its generator polynomial g(x) of degree r = n − k is an irreducible factor of xn_{− 1, n = 2}q_{− 1 for some q, with}

(15)

Figure 2.1: A two-dimensional product code P = C1_{⊗ C}2_{. (reprinted from [8])}

2.2.1 Systematic encoder for BCH codes

Consider the the binary shift-register (SR) based BCH encoder shown in Fig. 2.2. Let a k-symbol message block m = (m0, m1, . . . , mk₋₁) be associated with the

mes-sage polynomial m(x) = m0 + m1x + · · · + mk₋₁xk−1, and the n-symbol codeword

(c0, c1, . . . , cn₋₁) be associated with the codeword polynomial cm(x) = c0+ c1x + · · · +

cn−1xn−1. Then a systematic encoding procedure can be described as follows.

1. Step 1. Multiply the message polynomial m(x) by xn−k.

2. Step 2. Divide the result of Step 1 by the generator polynomial g(x). Let p(x) be the remainder. Polynomial division is performed through the use of a linear feedback shift register (LFSR). It divides a(x) = a0 + a1x + · · · + an₋₁xn−1 =

m0xn−k+ · · · + mk₋₁xn−1 by g(x) = g0+ g1x + · · · + gr₋₁xr−1+ xr and retains the

remainder p(x) = p0+ p1x + · · · + pr−1xr−1. The symbols a0, a1, . . . , an−2, an−1 are

fed into the shift register one at a time in order of decreasing index. When the last symbol (a0) has been fed into the rightmost SR cell, the SR cells will contain

(16)

3. Step 3. Set c(x) = xn_−k

m(x) − p(x). The code word output is thus given by (cn₋₁, cn₋₂, . . . , c1, c0) = (mk₋₁, mk₋₂, · · · , m0, pn_−k−1, pc_−k−2, · · · , p0).

Figure 2.2: Shift-register division of a(x) by g(x)

Fig. 2.3 shows a example of a (7,4,3) BCH code. The information bits is 1010 and the output code word is 1010011.

2.2.2 Extended Euclidean algorithm

Let α1_{, α}2_{, · · · , α}δ−1 _{be the (δ − 1) roots of g(x), where α is a primitive nth root of}

unity, and denote by r(x) the received polynomial associated with the received vector R = (r0, r1, · · · , rn−1) = C + E, where E = (e0, e1, · · · , en−1) is the error vector whose

nonzero entries are at the ilth positions, l = 1, · · · , v, v ≤ t. We further define the

syndrome generating polynomial by S(x) = P

jSjx

j_{, where S}

j = r(αj) is the jth

syndrome, and the error locator polynomial by Λ(x) =

v

Y

k=1

(1 − Xkx) = Λ0+ Λ1x + Λ2x2· · · + Λvxv, (2.1)

where Xk = αik are error locations. Then one can show that

Λ(x)[1 + S(x)] ≡ Ω(x) ( mod x2t+1) (2.2) where Ω(x) is the error-evaluator polynomial of degree v −1. The above identity is called the key equation for decoding BCH codes. Once we solve the key equation, finding Λ(x)

(17)

(a)

(b)

Figure 2.3: (a)Shift-register division of 1 + x + x4 _{+ x}6 _{by 1 + x + x}3 _{(b)Shift-register}

cell contents during division of 1 + x + x4_{+ x}6 _{by 1 + x + x}3

and Ω(x) that satisfy it, then the error locations can be found by solving Λ(x) (via the so-called Chien Search) and the corresponding error magnitudes (for non-binary codes) can be solved by using the so-called Forney’s algorithm which involves evaluating both Λ(x) and Ω(x).

2.2.3 Decoding BCH codes

The Extended Euclidean algorithm is a simple (but not the most efficient) BCH/RS decoding algorithm. It operates on two elements (ˆa, ˆb) from an Euclidean Domain E at a time. Given the initial conditions ˆr₋₁ = ˆa, ˆr0 = ˆb, ˆs−1 = 1, ˆs0 = 0, ˆt−1 = 0, ˆt0 = 1, it

(18)

proceeds according to the following set of recursion relations. ˆ ri = ˆri−2− ˆqirˆi−1, ˆ si = ˆsi₋₂− ˆqisˆi₋₁, ˆ ti = ˆti−2− ˆqiˆti−1. (2.3)

where ˆri, ˆsi, ˆti, ˆqi are all elements of E. The algorithm terminates when the remainder

ˆ

rn = 0. The remainder ˆrn−1 is then the GCD of ˆa and ˆb. The recursion relations insure

that, at any given point in the algorithm, we have the relation ˆ

siˆa + ˆtiˆb = ˆri (2.4)

The reason why the Extended Euclidean Algorithm can be used to decoding BCH codes can be readily answered by first noting that the key equation implies

Θ(x)x2t+1_{+ Λ(x)[1 + S(x)] = Ω(x)} _(2.5)

and if Ω0_{(x) is the GCD of the two polynomials x}2t+1 _{and 1 + S(x) then there exist Λ}0_(x)

and Θ0_{(x) such that}

Θ0(x)x2t+1+ Λ0(x)[1 + S(x)] = Ω0(x) (2.6) It can be proved that the error-evaluator polynomial Ω(x) and the error-locator poly-nomial Λ(x) are proportional to Ω0_{(x) and Λ}0_{(x), respectively. Hence one can apply}

the Euclidean algorithm with the finite-degree polynomials over GF(2) as the Euclidean Domain of concern to x2t+1 _{and 1 + S(x). The pair (t}

j, rj) def

= (Λj_{(x), Ω}j_{(x)) for some}

proper j will then our solution. The particular solution that corresponds to the error locator and magnitude polynomials is obtained when Ωj_{(x) has degree less than or equal}

to that of Λj_{(x). The decoding steps is summarized in the following steps.}

D1 Compute the syndromes Si, i = 1 ∼ 2t and form the syndrome polynomial S(x).

D2 Set the following initial conditions: ˆr₋₁(x) = x2t+1_{, ˆ}_r

0(x) = 1 + S(x), ˆt−1(x) =

(19)

D3 Using the extended algorithm, compute the successive remainders ˆri(x) and the

corresponding ˆti(x) until the following stopping condition is reached: deg[ˆri(x)] ≤ t.

D4 Find the roots of ˆti(x) = Λ(x), thus determining the error locations.

D5 Determine the magnitude of the errors.

Example 1. Suppose that we have received the vector ˜r = (1011011). The received polynomial is then given by ˜r(x) = 1 + x + x3_{+ x}4_{+ x}5_{. Obviously. Applying the Extend}

Euclidean Algorithm with the initial condition ˆr₋₁(x) = x3 _{and ˆ}_r

0(x) = 1 + S(x) =

1 + α3_{x + α}5_x2_{. Following the above five steps, we obtain Λ(x) = α}5_{+ αx, and find its}

root at α−3 _{(see Fig. 2.4). Since it is a binary code we can just reverse the polarity of}

the error position ˜r3, changing it back to 0.

Figure 2.4: Example of single error correction using Extended Euclidean algorithm

2.2.4 Iterative decoding of product codes

As suggested by Pyndiah, a product code can be iteratively (turbo) decoded. There are a variety of soft-decision decoding algorithms for block codes. The Chase algorithm and its variations offer a good balance and tradeoff between complexity and performance. List-decoding often consists of two stages: (i) finding candidate codewords based on the received samples and (ii) generating soft output. As in a turbo decoder, the soft output

(20)

is then passed to the ensuing decoding round as the extrinsic (a priori) information. We describe the decoding algorithm for a linear block code as follows.

2.2.4.1 Selecting candidate codewords

Suppose an (n, k, δ) linear block code C is BPSK-modulated and transmitted over an additive white Gaussian noise (AWGN) channels. Denote by X = (x1, · · · , xl, · · · , xn)

and R = (r1, · · · , rl, · · · , rn) the transmitted codeword and received vector, where xl ∈

{+1, −1}. Then R = X + E, where the noise vector is E = (e1, e2, · · · , en) in which

ei are i.i.d. zero-mean Gaussian random variables with variance σ2. Denote by A =

(a1, · · · , al, · · · , an) the a priori information of the codeword bits where

al= ln Pr{c

l = +1}

Pr{cl = −1}

. (2.7)

Following the spirit of Chase-II list decoding algorithm [7], we suggest the following three-step algorithm

A1. Use R and A, if available, to obtain the hard decision vector Y = (y1, y2, · · · , yn) as

well as their reliability (extrinsic information). Determine the p = bδ/2c positions associated with the least reliable binary elements of Y, where the reliability of yj

is given by Λ(xj) and is related to the log-likelihood ratio (LLR)

Λ(xj) = lnPr{r j|xj = +1} Pr{rj|xj = −1} + aj = 2 σ2rj + aj (2.8)

A2. Bit-flipping the most unreliable p positions on Y to form the set of 2p _{test patterns}

Tq _{(0 ≤ q ≤ 2}p

− 1).

A3. Form test sequence Zq where zlq = yl⊕ tql and decode Z q

using an algebraic (or hard) decoder and add the decoded codeword Cq _{to subset Ω. Decision D =}

(d1, d2, · · · , dn) of a row (or column) of the product code is then obtained by

(21)

D_{∈ Ω is a local maximum likelihood codeword if} |R − D|2 ≤ |R − Ci|2 ∀ Ci ∈ Ω. (2.9) where |R − Ci |2 = n X l=1 (rl− cil)2 2σ2 − ci l 2al (2.10)

is the metric between R and Ci_.

2.2.4.2 Soft output generation

The reliability of decision dj about the transmitted symbol xj, given the observation

R, is Λ0(xj) = ln Pr{x j = +1|R} Pr{xj = −1|R} (2.11)

Let S_j+1 _{⊂ C be the set of codewords whose jth coordinate c}i

j is +1 and Sj−1 ⊂ C be

the set of codewords with −1 in their jth coordinate. Then we have

Pr{xj = +1|R} = X Ci ∈Sj+1 Pr{X = Ci|R} (2.12) Pr{xj = −1|R} = X Ci∈S−1 j Pr{X = Ci|R}, (2.13) and (2.11) becomes Λ0(xj) = ln   P Ci∈S+1_j p{X=Ci |R} P Ci∈S−1 j p{X=Ci |R}   (2.14) ≈ ln   P Ci∈S_j+1∩Ω p{X=Ci |R} P Ci∈S−1 j ∩Ω p{X=Ci_|R}   (2.15) ≈ ln   max Ci∈S+1 j ∩Ω p{X=Ci |R} max Ci∈S−_j1∩Ω p{X=Ci |R}   (2.16)

(22)

At high SNRs, (2.16) can be approximated by

Λ00(xj) = |R − C−1(j)|2− |R − C+1(j)|2 (2.17)

where C+1(j) _{∈ S}+1

j ∩ Ω and C−1(j) ∈ Sj−1 ∩ Ω are the codewords with the minimum

metric and one of C+1(j) _{and C}−1(j) _{is D. Substituting (2.10) into (2.17), we obtain}

Λ00(xj) = 2rj σ2 + aj+ n X l_=1,l6=j 2rl σ2 + al c+1(j)_l pl (2.18) where pl = ( 0, if c+1(j)_l = c−1(j)_l 1, if c+1(j)_l _{6= c}−1(j)_l . (2.19)

If there is a competing codeword, the soft output corresponding to dj is

Λ00(xj) =

2rj

σ2 + aj + wj (2.20)

and the extrinsic information is wj = n X l_=1,l6=j 2 σ2rl+ al c+1(j)_l pl. (2.21) = Λ00(xj) − 2rj σ2 − aj (2.22)

Before we demonstrate the extrinsic information of position without competing code-word, we define a coordinate set

V = { j | ∃ Ci ∈ Ω s.t. cij 6= dj}. (2.23)

Thus the mean value w of the extrinsic information wj, j ∈ V can be computed by

w = 1 |V |

X

j∈V

|wj|. (2.24)

The extrinsic information for a position without competing codeword can be

wj = β × w, j 6∈ V, (2.25)

where β is a constant varying with iterations and will be given in the next section. With the extrinsic information derived above, the iterative decoding can apply the information as the a priori information of the next APP decoding run.

(23)

2.3 Turbo decoding of product code

Let us consider the decoding of the rows and columns of a product code P transmit-ted over a Gaussian channel by QPSK signaling. On receiving matrix [R] corresponding to a transmitted codeword [X], the first decoder performs the soft decoding of the rows (or columns) of P using as input matrix [R]. Soft-input decoding is performed using the Chase algorithm and extrinsic information [W (2)] is computed using (2.22) or (2.25), where index 2 indicates that we are considering the extrinsic information for the second decoding of P which was computed during the first decoding of P. The soft input for the decoding of the columns (or rows) at the second decoding of P is given by

[R(2)] = [R] + α(2)[W(2)] (2.26)

where α(2) is a scaling factor which takes into account the fact the standard deviation of samples in matrix [R] and in matrix [W] are different (see [9] and [10]). Besides, this scaling factor α is also used to reduce the effect of the extrinsic information in the soft decoder in the first decoding steps when the BER is relatively high. It takes a small value in the first decoding steps and increases as the BER tends to zero. The decoding procedure described above is then generalized by cascading elementary decoders illus-trated in Fig. 2.5. Several rules of thumb for turbo decoding product codes obtained

(24)

through numerical experiments are [8]

1. Test sequences: The number of test patterns is 16 and are generated by the four least reliable bits (p = 4).

2. Weighting factor α: To reduce the dependency of α on the product code, the mean absolute value of the extrinsic information |w| derived using (2.24) is normalized to one. The evolution of α with the decoding number is

α(m) = [0.0, 0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1.0]. (2.27)

3. Reliability factor β: To operate under optimal conditions, the reliability factor should be determined as a function of the BER. For practical considerations, we have fixed the evolution of β with the decoding step to the following values

β(m) = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 1.0]. (2.28)

In the first decoding step β is set to 20% of the mean of the normalized extrinsic informa-tion computed using (2.24) and is gradually increased to 100%. Note that experimental results indicate no significant performance degradation if values of β is modified by ±10%.

(25)

Chapter 3 Orthogonal Frequency Division

Multiplexing Modulation

OFDM is the effective transmission technology to combat frequency-selective fading. By inserting a guard interval larger than expected maximum multipath delay to a reg-ular OFDM symbol, such that the inter-symbol interference (ISI) caused by multipath propagation can be eliminated in the receiving end by discarding the guard interval part. An one-tap compensation in the frequency domain is all one needs to demodulate the transmitted data. Of course, OFDM also has some shortcomings such as its sensitive to frequency offset and phase distortion.

3.1 OFDM Basics

3.1.1 Transmitter and receiver

A frequency domain OFDM signal is shown as in Fig. 3.1. Each subcarrier can be modulated by signals from different constellations such as BPSK, QPSK, 16-QAM etc. Denoted by N , T , fk = _Tk and Xk the number of subcarriers, the carrier frequency, the

(26)

We express an OFDM symbol within the interval [0.T ] as s(t) =        N 2−1 P k=−N 2 Xkφk(t) , 0 6 t 6 T 0 , otherwise (3.1) =   N 2−1 X k=−N 2 Xkφk(t)  uT(t), (3.2)

where φk(t) = √1_Tej2πfkt is the kth subcarrier and

u(t) = (

1 , 0 6 t 6 T

0 , otherwise (3.3)

is a time domain windowing function. Each OFDM symbol contains subcarriers that are nonzero over a T -second interval. Hence, the spectrum of a single symbol is a convolution of a group of Dirac pulses located at the subcarrier frequencies with the spectrum of a square pulse of duration T seconds, sinc(πf T ), which is equal to zero for all frequencies f that are an integer multiple of 1/T .

B

N carriers

f

Figure 3.1: An OFDM signal with N subcarriers and bandwidth B Hz. (reprinted from [11])

(27)

shown in Fig. 3.2, which reminds us of a TPC codeword. For every symbol time, one column of the modulated symbols {Xk} are sent.

Figure 3.2: A two-dimensional view of a multi-carrier signal; the inter-carrier separation f0 = any integer multiple of _T1. (reprinted from [11])

A schematic structure of OFDM modulator and demodulator are plotted in Fig. 3.3. Denoted by r(t) = s(t) + n(t) the received signal where n(t) is the complex additive white Gaussian random process. Projecting r(t) into the jth subcarrier subspace

Yj = hr(t), phij(t)i =

Z T

0

r(t)φ∗_j(t)dt. (3.4)

and ignoring n(t) for the moment, we then recover the frequency domain data

Yj = Z T 0 s(t)φ∗j(t)dt = 1 T N 2−1 X k=−N 2 Xk Z T 0 ej2πk−jT tdt = X_j. (3.5)

3.1.2 Implementation of OFDM system

Multiplexing N orthogonal subcarriers into an OFDM symbol can be realized by taking N -point inverse discrete Fourier transform (IDFT) on a set of N modulation

(28)

(a)

(b)

Figure 3.3: Structure of OFDM : (a)Modulator (b)Demodulator (reprinted from [11])

signals and making parallel-to-serial (P/S) and digital-to-analogous (D/A) conversions. Demultiplexing in the receiving end can be analogously realized by discrete (fast) Fourier transform (DFT) after sampling the received baseband waveform with a rate Rs =

1/Td = N/T . Using the fast algorithms of the transform pair, the required complex is

of order N

2 · log2N .

(29)

process, n[n] = 0, we have s[n] = s(t)|t=nTd =        1 √ N N 2−1 P k₌₋N 2 Xkej2π k Nn 0 6 n 6 N − 1 0 otherwise = IFFT {Xk} (3.6) Yj = FFT {s[n]} = 1 √ N N₋₁ X n=0 s[n]e−j2πNjn= N 2−1 X k=−N 2 Xkδ[k − j] = Xj (3.7)

3.2 Guard time and cyclic extension

Guard interval is introduced, as mentioned before, to eliminate ISI and oftentimes its duration is about two to four times the root-mean-squared channel delay spread. Inserting a silent (virtual subcarriers) guard time between two OFDM symbols results in inter-carrier interference (ICI), induces crosstalks amongst different subcarriers, and destroys the subcarrier orthogonality. The effect is illustrated in Fig. 3.4. In this example, delayed version of subcarrier 2 causes ICI when the OFDM receiver tries to demodulate subcarrier 1, because there is no integer number of cycles difference between subcarriers 1 and 2. To avoid ICI, OFDM symbols are cyclically extended as shown in Fig. 3.5. Although an OFDM receiver only receives the sum of all these component signals, the figure gives separate component signals to illustrate the ISI effect. Suppose the multipath delay is smaller than a guard interval, there are no phase transitions during the DFT (or FFT) interval. Hence, an OFDM receiver “sees” the sum of pure sine waves with different phase offsets. That is to say, the summation does not destroy the orthogonality amongst subcarriers, it only introduces a different phase shift into each subcarrier.

Let Ng be the guard interval length and denote the extended OFDM symbol ( a

regular OFDM symbol with its cyclic extension ) by ˜s[n]. (3.2) will become

˜ s[n] =        1 √ N N 2−1 P k₌₋N 2 Xkej2π k N(n−Ng) 0 6 n 6 N + N_g − 1 0 otherwise (3.8)

(30)

Figure 3.4: Effect of multipath with zero signal in the guard time; the delayed subcarrier 2 causes ICI on subcarrier 1 and vice versa. (reprinted from [11])

The receiver removes CP from ˜r[n] before performing FFT demodulation.

3.3 Windowing

In the previous sections, we have described the basic OFDM system building blocks. Since, as shown in Fig. 3.5.(a), each OFDM signal consists of a number of unfiltered subcarriers and there are discontinuities from symbol to symbol, the out-of-band spec-trum decreases rather slowly, following the sinc(·) envelop. Fig. 3.6 plots the spectra for 16, 64, and 256 subcarriers. We notice that as the number of subcarriers increases, the corresponding out-of-band power decreases. This is because the sidelobe spacing of each subcarrier has become smaller accordingly. Basic digital filter design theory tells us that windowing is an effective means to reduce both the sidelobe hight and the out-of-band power. It also results in smoother PSD. A popular window is the raised cosine window

(31)

(a)

(b)

Figure 3.5: (a)The dotted curve is a delayed replica of the solid curve. (b)Cyclic prefix (CP): A copy of the last part of OFDM signal is attached to the front of itself, a copy of the first part of OFDM signal is attached to the back of itself. (reprinted from [11])

defined by wT(t) =                  {0.5 + 0.5 cos(π + tπ/(βTs)) 0 6 t 6 βTs 1.0 βTs6 t 6 Ts 0.5 + 0.5 cos((t − Ts)π/(βTs)) Ts6 t 6 (1 + β)Ts (3.9)

The symbol interval Ts is shorter than the total symbol duration because we allow

adjacent symbols to be partially overlapped in the roll-off region. The time structure of the OFDM signal now looks like that given in Fig. 3.7 and can be expressed as

s(t) = _√1 TwT(t) N 2−1 X k₌₋N 2 Xkej2πfk(t−Tpref ix) 0 6 t 6 (1 + β)Ts (3.10)

In summary, OFDM symbols are generated as follows:

O1 Nc input (modulation) values are padded with zeros to form an N input sample

(32)

Figure 3.6: Power spectral density (PSD) without windowing for 16, 64, and 256 sub-carriers. (reprinted from [11])

O2 The last Tpref ix samples of the IFFT output are inserted at the start of the OFDM

symbol, and the first Tpostf ix samples are appended at the end.

(33)

Figure 3.7: An extended OFDM symbol with cyclic extension and windowing. Ts is the

symbol, T the FFT interval, TG the guard time, Tpref ix the preguard interval, Tpostf ix

(34)

Chapter 4 A Product Code Based OFDM

System

This chapter considers a product code based OFDM system and the related chan-nel estimation issue; the influence of imperfect chanchan-nel estimation is considered. A model-based approach [12] for channel estimation that exploits the correlation in both frequency- and time-domain is used. The coefficients of model-based channel estimator are obtained by least-squared (LS) fitting.

To further improve the receiver performance, an iterative procedure is introduce for joint channel estimation and data detection. The tentative bit (symbol) decisions from the channel decoder output are used as pilots to re-estimate the channel response and the new channel estimates are used by the decoder to update its decoding decisions.

To begin with, we provide a short review of the model-based channel estimate.

4.1 Model-based multicarrier channel estimation

4.1.1 A mathematical model

Consider an OFDM system transmitting ¯N OFDM symbols which is composed of ¯

M parallel channels. Denoted by Xm,¯¯ n the symbol of the ¯mth sub-channel at the ¯nth

(35)

after FFT demodulator is

Ym¯¯n = Hm¯¯nXm¯¯n+ Nm¯¯n, (4.1)

where Nm¯¯n = NI,m¯¯n+ iNQ,m¯¯n is a zero-mean complex Gaussian random variable with

in-dependent in-phase and quadrature phase components and identical variance var(NI) =

var(NQ) = N0/2T , σ

2 n

2 .

The channel response (CR) can in general be modelled as an LTI system h(t) =

Lp

X

j=1

hj(t)δ(t − τj(t)), (4.2)

where hj(t) and τj(t) remain constant during an extended symbol interval Ts without

ICI. Hence, Hm¯¯n= Lp X j=1 hj[¯n] exp j2π ¯mτj[¯n] T (4.3) represents the channel frequency response at the ¯mth subcarrier during the ¯nth symbol interval.

Eq.(4.1) implies that, if the channel response, Hm¯¯n, is known, a maximum likelihood

(ML) detector would make its decision based on the statistic ˆXm¯¯n = Ym¯¯n/Hm¯¯n. When

the channel response is unknown, the receiver has to estimate the channel response ˆHm¯¯n.

Conventional approach calls for the use of a pilot structure like that given in Fig. 4.1 to assist channel estimation. Initial channel estimate based on pilots located at ( ¯m, ¯n) is obtained by an LS approach [4] ˆ Hm¯¯n,LS = Ym¯¯n Xm¯¯n = Hm¯¯n+ Nm¯¯n Xm¯¯n = Hm¯¯n+ Vm¯¯n (4.4)

where Vm¯¯n is the error term due to the presence of Gaussian noise whose conditional

variance is given by E[|Vm¯¯n|2|Xm¯¯n] = 2σn2/|Xm¯¯n|2.

4.1.2 Regression model based approach

The discrete channel response (CR) Hm¯¯n can be viewed as a samples version of

(36)

rrt rf Data symbol Pilot symbol time freqency time

Figure 4.1: A typical pilot symbol distribution in the time-frequency plane of an OFDM signal. (reprinted from [12])

local Hm¯¯n, obtained by computer simulation, are shown in Fig. 4.2. We first select

an operating block in the time-frequency plane in which ¯N0 × ¯M0 pilot symbols are

uniformly inserted at every rf sub-channel and every rt symbol; see Fig.4.1. Then the

receiver models the true sampled fading process Hm¯¯n in this region by a quadrature

surface

F ( ¯m, ¯n) = a ¯m2+ b ¯m¯n + c¯n2+ d ¯m + e¯n + f

= Hm¯¯n+ g( ¯m, ¯n) (4.5)

where g( ¯m, ¯n) represents the modeling error. For Rician or Rayleigh fading channels, Hm¯¯nis a complex Gaussian process, hence g( ¯m, ¯n) is also complex Gaussian-distributed.

The frequency-domain model of the received samples (4.1) implies that the ML estimates of the coefficients (a, b, c, d, e, f ) , cH _{are chosen such that}

X ( ¯m,¯n)∈P |Ym¯¯n− ˆF ( ¯m, ¯n)Xm¯¯n|2 = X ( ¯m,¯n)∈P |Xm¯¯n|2| ˆHm¯¯n,LS− ˆF ( ¯m, ¯n)|2 (4.6)

(37)

0 2 4 6 8 10 x 105 0 0.002 0.004 0.006 0.008 0.01−3 −2 −1 0 1 2 3 4 5 6 7 frequency (Hz) time (sec) channel response

Figure 4.2: Two typical OFDM channel responses. They are plotted in the same fig-ure for the convenience of comparison. The vertical coordinate does not represent the absolute magnitude of each CR surface. (reprinted from [12])

contains the pilot locations in the operating block (region) P = ( ( ¯m, ¯n) ¯ m = 0, rf, 2rf, · · · , ( ¯M0− 1)rf ¯ n = 0, rt, 2rt, · · · , ( ¯N0− 1)rt ) . (4.7) Rewriting (4.5) as F ( ¯m, ¯n) = cH_{q, where q}T ¯ m¯n , ( ¯m2, ¯m¯n, ¯n2, ¯m, ¯n, 1), we restate the

problem of finding the ML solution of (4.6) as solving min ˆ c X ( ¯m,¯n_)∈P |Ym¯¯n− ˆcHqm¯¯nXm¯¯n|2 (4.8)

4.1.3 Channel estimation procedure

The model-based approach described above leads to the following 2-step channel esti-mation procedure.

E1 Taking the derivative of (4.8) with respect to ˆc and invoking the definitions Q , X

( ¯m,¯n_)∈P

(38)

ˆ b , X ( ¯m,¯n)∈P qm¯¯nXm¯¯nY_m¯_¯∗n = X ( ¯m,¯n_)∈P qm¯¯n|Xm¯¯n|2Hˆm¯∗¯n,LS (4.10) where X∗ ¯

m¯n is the complex conjugate of Xm¯¯n, we obtain the solution

ˆc = Pˆb = X

( ¯m,¯n_)∈P

(Pqm¯¯n|Xm¯¯n|2) ˆHm¯∗¯n,LS (4.11)

E2 The CR estimate, ˆHm¯¯n, for the position ( ¯m, ¯n) is

ˆ Hm¯¯n = F ( ¯ˆ m, ¯n) = qT_m¯_¯nˆc∗ = qT ¯ m¯n X (k,l)∈P (Pqkl|Xkl|2) ˆHkl,LS (4.12)

The above algorithm can be modified to estimate the channel response of either a single-carrier system or any sub-channel of a multisingle-carrier system. This 1-D scheme models the fading process by a single-variable regression function, e.g., F (¯n) = a¯n2_{+ b¯}_{n + c. The}

corresponding parameters are given by cH _{, (a, b, c), q}T ¯

n , (¯n2, ¯n, 1) and P = {¯n|¯n =

0, rt, . . . , ( ¯N0− 1) rt}, respectively.

4.2 Joint channel estimation and TPC decoding

The invention (or re-invention) of the turbo principle by Berrou et al. [10] has far-reaching impacts on many fronts of science and engineering. In particular, we have seen the proliferation of the iterative joint estimation and detection method in designing various communication receivers. The block diagram shown in Fig. 4.3(b) is but one such application example.

Depending on the operating scenario and environment, an OFDM channel might has more sensitive frequency selectivity or time selectivity. In a static and small area application like indoor wireless communications, the frequency responses at adjacent subcarriers are not highly correlated and the coherent bandwidth is only a few subcarrier

(39)

encoder OFDM Channel Estimator OFDM Detector decoder Channel encoder OFDM Channel Estimator OFDM Detector decoder Channel

(a)

(b)

Figure 4.3: (a)A conventional OFDM system structure. (b)An OFDM system with iterative receiver structure.

spacings while the coherent time is likely to be of the order of many symbol durations. On the other hand, for cellular-like applications, we have to expect much smaller coherent times. we shall consider both cases subsequently. Take (32, 26, 4)2 _{TPC as an example,}

pilots are inserted as that given in Fig. 4.4 and the size of a modelling block is (1 × ¯n) (1-D channel estimation), namely, the pilot set P is separated into Pi (i = 0 ∼ 31),

where P0 = ( ( ¯m, ¯n) ¯ m = 0 ¯ n = 0, 4, 8, 13, 17, 21 ) P1 = ( ( ¯m, ¯n) ¯ m = 1 ¯ n = 0, 4, 8, 13, 17, 21 ) ... P31 = ( ( ¯m, ¯n) ¯ m = 31 ¯ n = 0, 4, 8, 13, 17, 21 ) (4.13)

The CR estimates ˆHm¯¯n obtained by (4.4) and (4.12) are used to give soft input for the

TPC decoder. After a few decoding iterations, the TPC decoder outputs hard decisions which are then fed back to the channel estimator to carry out another channel estimation.

(40)

Figure 4.4: A typical pilot arrangement in the time-frequency plane.

In a new channel estimation round, the channel estimation algorithm uses both original pilots and pseudo-pilots (the decisions of the TPC decoder).

4.3 Double helical interleaver

Double helical interleaver can be inserted between the channel encoder and the OFDM modulator, permuting TPC codewords by a before OFDM modulation, to reduce the influence of long deep fade and decorrelate the channel effects on each row and column of a TPC codeword. The receiver permutes the received samples with the reverse double helical mapping and then forwards the permuted samples to the TPC decoder.

4.3.1 Structure of double helical interleaver

The double helical interleaver (DHI) can be viewed as a 2D interleaver that permutes an 2D array. For convenience, we number the entries of an nm¯ × n¯n 2D array via rows

and columns such that the position in the ith row of the jth column (i, j) is indexed by j + i × n¯n. Then a DHI permute the ith entry into the jth entry by the following two

(41)

steps:

j0 = i(nn¯+ 1) (mod nn¯nm¯), i = {0, 1, 2, . . . , nn¯nm¯ − 1}, (4.14)

j = j0(nm¯ + 1) (mod nn¯nm¯), j0 = {0, 1, 2, . . . , nn¯nm¯ − 1}, (4.15)

Fig. 4.5(a) and (b) give an example of the above two DHI processing steps. Fig. 4.5(a) shows the first round of DHI process using (4.14). Entries 0 to 63 are read into the diagonal row, the nearest lower diagonal-parallel (LDP) row consists of entries 64 to 126, and the next symbol is read into upper right corner of farthest upper diagonal-parallel (UDP) row, followed by the row just below the previous LDP row and then one below the previous UDP row. The process continues, alternating between lower and upper diagonal-parallel rows, until the last entry has been filled. The second DHI step follows an order which is “complementary” to the first step. As illustrated in Fig. 4.5(b), it starts at the diagonal row, wrapped around to the nearest UDP row then to the farthest LDP row, followed by the row just above the previous UDP row, ... etc. The process continues, alternating between UDP and LDP rows, until all entries in the array are filled.

4.4 Simulation results

When the double helical interleaver is used in the TPC coded OFDM system of Fig. 4.3(b) as the channel interleaver, we have the system shown in Fig. 4.6. As mentioned above, the OFDM outputs are deinterleaved before TPC decoding commences and the decoded hard decisions are sent back to the channel estimator via the double helical interleaver for iterative channel estimation.

Numerical examples presented in this section assume that the TPC used is composed of two extended (32, 26, 4) BCH codes which is the same as the extended Hamming code and the code rate is 0.5625. We use the parameter p = 4 and Proakis C five paths channel model [14] with power profile {0.227, 0.460, 0.688, 0.460, 0.227} at delays equals

(42)

57 64

Parity check bits Information bits 0 64 128 1 65 129 2 66 130 4095 62 190 127 191 63 126 189 57 64 57 64

Parity check bits Information bits 0‘ 1‘ 2‘ 126‘ 62‘ 63‘ 4095‘ 57 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (a) (b) 64‘ 128‘ 65‘ 127‘ 191‘ 190‘ 129‘ 66‘ 130‘ 189‘

Figure 4.5: Double helical interleaving of (64, 57, 4) × (64, 57, 4) TPC

to {0, T, 2T, 3T, 4T }. Jakes’ model [13] is used to generate independent fading processes associated with each path.

The effectiveness of helical interleaving on the BER performance is demonstrated in Fig. 4.7, assuming fmT = 0.0001, where fm is the maximum doppler shift and T is

the sample period. The dashed-curves represent the performance of the system without channel interleaver and the solid curves represent that of the system with the double helical interleaver. The inclusion of a helical interleaver gives a 3 dB performance gain at BER=10−5_{. The slope of the performance curve is sharper because the interleaver}

de-correlates the time and frequency correlations of the slow fading process; see Fig. 4.8. Fig. 4.8 plots the time-frequency channel response without (a) and with (b) the DHI. It is clear that the channel response associated with different BCH codewords and code bits are less correlated after interleaving.

In general, a higher modelling order provides more accurate channel description at the cost of larger noise-induced variance as more parameters are involved in the estimation process. At high SNRs the perturbation caused by noise is negligible hence

(43)

Figure 4.6: A block diagram of a double helical interleaved OFDM system with iterative joint channel estimation and TPC decoding.

higher-order estimates are preferred if complexity is not of high concern. When SNR is low, however, one would rather use a lower-order estimate as the received vector(s) is corrupted. The choice of the channel model order also depends on the channel dynamic and the modelling block size. For slow fading channels or small modelling blocks, a lower-order estimate is a better choice while for other scenarios, one might consider higher-order estimates. Hence, for a particular channel condition (SNR and coherent time/bandwidth) and modelling block size, there always exists an optimal modelling order.

Fig. 4.9 plots the system performance in a fast-fading channel (fmT = 0.001). The

system performance for this channel is better than that in the same channel with smaller Doppler-time product fmT = 0.0001. When perfect channel estimation is available, it

gives more than 5 dB improvement at BER=10−5 _{without DHI and more than 4 dB}

improvement at BER=10−5 _{with DHI. For such a fast-fading channel, the first-order}

model is clearly not sufficient to characterize the channel response. A higher-order model is needed and the simulation results indicate that a 3rd-order model yields the best performance.

At high SNRs, modelling error dominates the performance of the channel estimator [15] and the performance curves shown in Fig. 4.9 suggest that the BER performance is bounded at approximately 2 × 10−5 _{and 5 × 10}−5 _{for the 3rd-order and 4th-order}

(44)

6 8 10 12 14 16 18 20 22 1E-5 1E-4 1E-3 0.01 0.1

Bits error rate

Average E_b/N₀ (dB) (32,26,4)x(32,26,4) Helical interleaving Perfect CE Order = 0 Order = 1 Order = 2 No interleaving Perfect CE Order = 1 Order = 2

Figure 4.7: Performance of DHI-permuted system in a fading channel with fmT =

0.0001.

channel estimators when there is no channel interleaving. With the DHI in place, both the performance and the corresponding bounds are improved–from 5 × 10−5 _{to 3 × 10}−6

for the 4th-order channel estimator.

Pilot planning influences the performance of a system using pilot-assisted synchro-nization and channel estimation. The proposed system uses polynomials to model the true CR. Such a model tends to become less reliable when extrapolating beyond the pilot distribution boundary. For a given time-frequency block (or a time or frequency interval) within which pilots are inserted, the estimated CR tends to be less reliable in places close to the block or interval boundary. One way to remedy such a shortcoming is to place pilots in the modelling block boundary. The following equations, (4.16) and (4.17), define two pilot patterns Pa(i) and Pb(i). The former pattern has pilots on its

(45)

0 10 20 0 20 −4 −2 0 2 x4 time (ms) x104 frequency (Hz) Channel response 0 10 20 0 20 −4 −2 0 2 x4 time (ms) x104 frequency (Hz) Channel response

Figure 4.8: (a) Time-frequency channel response with fmT = 0.0001; (b) Channel

re-sponse with DHI permutation.

modelling block edges but not the latter. Pa(i) = ( ( ¯m, ¯n) ¯ m = i ¯ n = 0, 4, 8, 13, 17, 21 ) . i = 0, 1, . . . , 31 (4.16) Pb(i) = ( ( ¯m, ¯n) ¯ m = i ¯ n = 2, 5, 9, 13, 16, 19 ) . i = 0, 1, . . . , 31 (4.17) Fig. 4.11 compares the BER performance of the above two pilot patterns. The former pilot pattern results in 0.5 and 4 dB gains at BER=10−4 _{when 3rd-order and}

4th-order models are used. To validate our claim that pilot pattern does affect the system performance and to examine the relation between model order and channel dynamic, we plots CRs of the two pilot patterns of the 3rd-order channel estimate (i.e., the one using a degree-3 polynomial channel model) and the true CR (marked by solid stars) in Fig. 4.12. Fig. 4.13 is similar to Fig. 4.12 except that 4th-order model is used. The square and cross markers on both figures represent CR estimates using two different pilot patterns. It can be seen that high modelling order tends to incur larger estimation errors at positions close to the edge. Using a pilot pattern with pilots in the boundary positions does help reducing the estimation error.

Finally, in Fig. 4.14 we plot the BER performance when the receiver employs joint iterative channel estimation and TPC decoding. A 3rd-order channel estimator is used

(46)

0 5 10 15 20 0 5 10 15 20 25 30 −4 −3 −2 −1 0 1 2 3 4 x4 time(ms) x104 frequency(Hz) Channel response

Figure 4.9: Time-frequency response of the Proakis C five-path fading channel with fmT = 0.001.

and both two- and four-iteration detection are considered. The performance gain with respect to the no-iteration receiver is about 1.0 dB at BER = 10−4 _{when two-iteration}

is used and is increased by another 0.3 dB with a four-iteration receiver. As expected, iterative channel estimation does enhances the system performance, bringing about per-formance closer to that of the ideal receiver with perfect channel estimate.

(47)

6 8 10 12 14 16 18 20 1E-5 1E-4 1E-3 0.01 0.1

Bits error rate

Average E_b/N₀ (dB) (32,26,4)x(32,26,4) Helical interleaving Perfect CE Order = 1 Order = 2 Order = 3 Order = 4 No interleaving Perfect CE Order = 3 Order = 4

(48)

0 2 4 6 8 10 12 14 16 18 20 1E-6 1E-5 1E-4 1E-3 0.01 0.1

Bits error rate

Average E_b/N₀ (dB) (32,26,4)x(32,26,4)

Perfect CE

Pilot added as P_a,i

Order = 3 Order = 4

Pilot added as P_b,i

Order = 3 Order = 4

Figure 4.11: Performance curve of different pilot insert position in fmT = 0.001 fading

(49)

0 10 20 −2 0 2 4 frame 1 at 5dB 0 10 20 −2 0 2 4 frame 2 at 5dB 0 10 20 −2 0 2 4 frame 3 at 5dB 0 10 20 0 2 4 6 frame 1 at 10dB 0 10 20 0 2 4 6 frame 2 at 10dB 0 10 20 0 2 4 6 frame 3 at 10dB 0 10 20 0 2 4 6 frame 1 at 15dB 0 10 20 0 2 4 6 frame 2 at 15dB 0 10 20 0 2 4 6 frame 3 at 15dB 0 10 20 4 6 8 10 frame 1 at 20dB 0 10 20 4 6 8 10 frame 2 at 20dB 0 10 20 4 6 8 10 frame 3 at 20dB True CR Pa Pb

Figure 4.12: Estimated CR using a third-order channel model using pilot patterns (4.16) and (4.17); the true CR (solid star markers) is included for comparison; fmT = 0.001.

(50)

0 10 20 −2 0 2 4 frame 1 at 5 dB 0 10 20 −2 0 2 4 frame 2 at 5 dB 0 10 20 −2 0 2 4 frame 3 at 5 dB 0 10 20 −2 0 2 4 frame 1 at 10 dB 0 10 20 −2 0 2 4 frame 2 at 10 dB 0 10 20 −2 0 2 4 frame 3 at 10 dB 0 10 20 0 2 4 6 frame 1 at 15 dB 0 10 20 0 2 4 6 frame 2 at 15 dB 0 10 20 0 2 4 6 frame 3 at 15 dB 0 10 20 4 6 8 10 frame 1 at 20 dB 0 10 20 4 6 8 10 frame 2 at 20 dB 0 10 20 4 6 8 10 frame 3 at 20 dB True CR Pa Pb

Figure 4.13: Estimated CR using a fourth-order channel model using pilot patterns (4.16) and (4.17). The true CR is marked by solid stars; fmT = 0.001.

(51)

0 2 4 6 8 10 12 14 16 1E-5 1E-4 1E-3 0.01 0.1

Bits error rate

Average E_b/N₀ (dB) (32,26,4)x(32,26,4)

Perfect CE

Order = 3 channel estimation No iteration

Iteration 2 Iteration 4

Figure 4.14: BER performance curves of the receiver with iterative joint channel esti-mation and TPC decoding in a fading channel with fmT = 0.001.

(52)

Chapter 5 Parallel Concatenated Product

Codes

We extend the concept of Pyndiah’s block turbo codes and propose a new class of codes based on product codes called parallel concatenated product codes whose structure is similar to that of turbo codes [9], replacing the constituent convolutional codes in a turbo code by product codes. This new class of codes provides more flexible choices of code rates and component codes. When simple systematic product codes are used as component codes, the corresponding decoding complexity remains relatively low and the achievable performance outperforms that of a turbo product code with comparable rate at high SNR.

The interleaver we use is the Fibonacci interleaver. Besides its simplicity in imple-mentation, we prove that such an interleaver guarantees that the resulting PCPC has a minimum distance larger than that of its constituent product codes if a proper code length is selected.

5.1 Encoder

The structure of parallel concatenated product codes (PCPC) is shown in Fig. 5.1 in which “TPC encoder” is the encoder of a product code given in Fig. 1. A codeword

(53)

TPC encoder

π

Figure 5.1: The structure of the PCPC encoder.

consists of the product codeword of the upper branch and the parity part of the lower branch output. The only interleaver considered here is the Fibonacci interleaver.

5.2 Decoder

The decoder consists of four a posteriori probability (APP) component decoders, each is responsible for decoding a component block code. Fig. 5.2 depicts the structure of a component decoder which is similar to that presented in [8]. Denote by [R] the receiving matrix and by [Wt(m)] the output extrinsic information matrix of the tth APP

decoder at the end of the mth APP decoding round. The a priori information matrix given at input of the tth APP decoder is

Wsum(m) =

X

l_6=t

Wl(m). (5.1)

The message passing operations among these four APP decoders is shown in Fig. 5.3, where I and P are extrinsic information corresponding to information bits and parity check bits, respectively. A complete iteration follows the APP decoding schedule: APP₀ _{→ APP}₁ _{→ APP}₂ _{→ APP}₃. Each APP decoder also sends related extrinsic information to the other two decoders that do not sit next to it in the schedule. Decoding of a component TPC usually converges after 8 APP decoding rounds but a PCPC may takes an average of 16 APP decoding rounds to converge.

(54)

APP

t`

[R]

[W

sum

(m )

]

[W

t`

(m +1)

]

[α (m )]

Figure 5.2: Block diagram of elementary APP decoder APPt‘.

Similar to [8], the choice of the weighting factor α and reliability factor β is very critical in determining the performance. The sets of α and β we used for different decoding rounds are given by

α(m) = [0.0, 0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1.0]. (5.2) β(m) = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.8, 1.0]. (5.3) and α(m) = [0.0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.7, 0.8, 1.0]. (5.4) β(m) = [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.8, 0.9, 1.0]. (5.5)

5.3 Fibonacci interleaver

The position of each bit in the (k1× k2) information array is represented by (¯i, ¯j),

where 0 ≤ ¯i < k1 and 0 ≤ ¯j < k2. Using the definition |W |Z = W (mod) Z, the

Fibonacci interleaver is characterized by the permutation rule

(55)

APP

0

APP

1

APP

2

APP

3

I+P

I I

I

Figure 5.3: An flow chart showing the message-passing of various component APP decoders and related decoding schedule.

For a product code with extended Hamming codes as component codes, a minimum-weight codeword has nonzero entries at the same locations of some four nonzero-minimum-weight rows and columns within the information array, i.e., the parity part of the product codeword has weight 0. The basic requirement of the interleaver used in a PCPC is to make sure the interleaved information array will not contain such a 4 × 4 all-1 subarray. The Fibonacci interleaver does possess such a desired property. In particular, one can show that

Lemma 1. A PCPC using (n1, k1, 4) × (n2, k2, 4) product codes as constituent codes has

a minimum Hamming distance greater than 16 if k1 or k2 is not a multiple of 4.

Proof. Since each component block code of a constituent product code has minimum distance 4, we use a 4 × 4 matrix called minimum-weight error event array to represent the nonzero positions of a minimum weight product codeword. As it is possible that all entries of this matrix lie within the information array of a product codeword, we want the interleaver to guarantee that the permuted information array will not produce an all-zero parity part.

(56)

for four columns and (¯i,¯i+ h1,¯i+ h2,¯i+ h3) for four rows, where 0 < h1 < h2 < h3 < k1

and 0 < g1 < g2 < g3 < k2. We have the error-position array [γw,z]_4×4 denoted by

       (¯i, ¯j) _{· · · ·} (¯i, ¯j + g3) ... . .. ... ... . .. ... (¯i + h3, ¯j) · · · (¯i + h3, ¯j + g3)        (5.7)

where 1 ≤ w, z ≤ 4, γw,z = (¯i + uw, ¯j + vz), u = (u1, u2, u3, u4) = (0, h1, h2, h3) and

v= (v1, v2, v3, v4) = (0, g1, g2, g3).

After Fibonacci interleaving, the positions will be permuted to

˜

γw,z = (|¯i + uw+ ¯j + vz|k1, ||¯i + uw+ ¯j + vz|k1 + ¯j + vz|k2).

which can be decomposed as

˜ γw,z = (|¯i + ¯j|k1, ||¯i + ¯j|k1 + ¯j|k2) M k1×k2 (|uw+ vz|k1, ||uw+ vz|k1 + vz|k2), = ˙γw,z M k1×k2 ¨ γw,z. whereL

k1×k2 represents the operation that takes module k1 and k2 on the first and the

second entries, respectively. Because all entries of the array [ ˙γw,z]_4×4 are all the same,

[˜γw,z]_4×4 does not form a matrix with all entries forming a 4 by 4 position array only if

[¨γw,z]_4×4 does not. Therefore, we only need to consider an error-position array

[¨γw,z]_4×4 = [|uw + vz|k1, ||uw+ vz|k1 + vz|k2]_4×4,

= [ˆγw,z, ˇγw,z]_4×4.

where ˆγw,z and ˇγw,z represents the rows and columns of this position array, respectively.

As we know, the array [¨γw,z]_4×4will form a 4×4 position array without generating parity

(57)

firstly consider an array of the form [ˆγw,z]_4×4=       |0|k1 |g1|k1 |g2|k1 |g3|k1 |h1|k1 |h1+ g1|k1 |h1+ g2|k1 |h1+ g3|k1 |h2|k1 |h2+ g1|k1 |h2+ g2|k1 |h2+ g3|k1 |h3|k1 |h3+ g1|k1 |h3+ g2|k1 |h3+ g3|k1       .

If these 16 positions are permuted to the same four rows 0, g1, g2, g3, then h1, h2, h3

should be moved to g1, g2, g3. If h1 is moved to g2 or g3, either h2 or h3 should be

mapped to g1, which is obviously a contradiction. Therefore, we have h1 = g1. If h2 is

mapped to g3 then h3 should be mapped to g2, which is not possible, therefore, h2 = g2

and h3 = g3 and the resulting array becomes

[ˆγw,z]_4×4 =       |0|k1 |g1|k1 |g2|k1 |g3|k1 |g1|k1 |2g1|k1 |g1 + g2|k1 |g1+ g3|k1 |g2|k1 |g2+ g1|k1 |2g2|k1 |g2+ g3|k1 |g3|k1 |g3+ g1|k1 |g3 + g2|k1 |2g3|k1       .

If ˆγw,w 6= 0 for w = 2, 3, 4, then there are three entries ˆγw,z = 0, w 6= z, 0 < w, z ≤ 4,

which again is a contradiction. Therefore, at least one of ˆγw,w is 0 for w = 2, 3, 4. We

now proceed to discuss these three cases. 1. |2g1|k1 = 0 : g1 =

k1

2 implies g1+ g2 6= 0, g1+ g3 6= 0. Thus g2+ g3 must be equal

to 0 but then we have k1 > g3 > g2 > g1 = k₂1, a contradiction.

2. |g2|k1 = 0 : g2 =

k1

2 implies g2+ g3 6= 0, g1+ g2 6= 0 and g1 + g3 = 0. Therefore

v= (0, g1,k₂1, k1− g1) and [ˆγw,z]_4×4 becomes       0 g1 k₂1 k1− g1 g1 2g1 g1+k₂1 0 k1 2 g1+ k1 2 0 k1 2 − g1 k1− g1 0 k₂1 − g1 −2g1       . (5.8)

Because there should be four entries equal to “k1

2 ”, 2g1 and −2g1 are equal to “ k1

2”

(58)

3. |2g3|k1 = 0 : g3 =

k1

2 implies g1 + g3 6= 0, g2 + g3 6= 0, which forces g1 + g2 to be

equal to 0 but then we have the contradictory result, 0 < g1 < g2 < g3 = k₂1.

Next let us consider the sixteen values of the columns error position array [ˇγw,z]_4×4 =

[||vw+ vz|k1 + vz|k2]_4×4. By substituting v = (0, k1 4, k1 2, 3k1

4 ) into the array, we have

      0 _|k1 2|k2 |k1|k2 | 3k1 2 |k2 |k1 4 |k2 | 3k1 4 |k2 | 5k1 4 |k2 | 3k1 4 |k2 |k1 2 |k2 |k1|k2 | k1 2|k2 |k1|k2 |3k1 4 |k2 | k1 4|k2 | 3k1 4 |k2 | 5k1 4 |k2       . (5.9)

There are four “|3k1

4 |k2”, three “|k1|k2”, three ”| k1 2|k2”, two “| 1k1 4 |k2”, two “| 5k1 4 |k2”, one “|6k1

4 |k2” and one “0” in these entries. Because only four kinds of values are allowed in

these entries and |1k1

4 |k2, |

5k1

4 |k2 and |

6k1

4 |k2 can not be “0”, the remaining situations are

1. |k1|k2 = 0 : k1 = l × k2, l ∈ Z. 0, | lk2 4 |k2, | lk2 2 |k2, | 3lk2

4 |k2 should be different and it

implies l is odd. 2. |k1 2 |k2 = 0 : k1 = 2l × k2, l ∈ Z. This implies | k1 4|k2 = | 3k1 4 |k2 and contradicts.

However, either k1 or k2 is not a multiple of 4 and there is no 4 × 4 array resulting in

the PCPC codeword weight of 16.

Employing an analogous argument we can also prove

Lemma 2. A PCPC using (n1, k1, 4) × (n2, k2, 4) product codes as constituent codes

cannot have a minimum Hamming distance equals to 17, 18 or 19. Lemmas 1 and 2 implies

Theorem 1. A PCPC using (n1, k1, 4) × (n2, k2, 4) product codes as constituent codes

(59)

5.4 Simulation results

We give computer-simulated performance of two PCPCs. The first PCPC is based on (32, 26, 4) extended Hamming codes and is denoted by (32, 26, 4)4_{. The second one is}

build on the product code using the (64, 57, 4) and (16, 11, 4) extended Hamming code as component code, hence it is denoted by (64, 57, 4)2_{× (16, 11, 4)}2_{. These two codes have}

code rates, 262_/[2(322_{) − 26}2_{] = .493, (57 × 11)/[2(64 × 16) − (57 × 11)] = .441, which}

are comparable to that of the (16, 11, 4)2 _{product code which is based on the (16, 11, 4)}

extended Hamming code.

For fair comparison, we define a decoding round as one-pass decoding of a product code APPi → APPi+1. Thus, a decoding iteration of PCPC needs two consecutive

decoding rounds. The performance curves shown in Fig. 5.4 include those obtained by 2 and 4 decoding iterations of the (32, 26, 4)4 _{PCPC and that by 4 decoding rounds of}

the (16, 11, 4)2 _{product code. The performance of the (32, 26, 4)}4 _{PCPC with 4 decoding}

iterations is superior to that with 2 decoding iterations by 0.5 dB at BER=10−5_{. The}

(32, 26, 4)4 _{PCPC with 4 decoding iterations also outperforms the (16, 11, 4)}2 _{TPC with}

4 decoding iterations by 0.5 dB at the BER=10−5_{. The performance gain improves if}

we are interested in lower BER performance. For (16, 11, 4)2 _{TPC, performance}

im-provement becomes negligible when the number of decoding iterations is greater than 4. The PCPCs, however, require longer convergence times. We also find that with the same decoding iterations, the (64, 57, 4)2_{×(16, 11, 4)}2 _{PCPC outperforms the (32, 26, 4)}4

寬頻無線通訊系統的錯誤控制機制之設計(III)

行政院國家科學委員會專題研究計畫 成果報告

寬頻無線通訊系統的錯誤控制機制之設計(3/3)

中 華 民 國 94 年 10 月 25 日

中文摘要

Contents

List of Figures

Chapter 1

Introduction

Chapter 2

Introduction to Turbo Product

Codes

2.1

Product codes

2.2

A brief review of BCH codes

2.2.1

Systematic encoder for BCH codes

2.2.2

Extended Euclidean algorithm

2.2.3

Decoding BCH codes

2.2.4

Iterative decoding of product codes

2.3

Turbo decoding of product code

Chapter 3

Orthogonal Frequency Division

Multiplexing Modulation

3.1

OFDM Basics

3.1.1

Transmitter and receiver

B

N carriers

f

3.1.2

Implementation of OFDM system

(a)

(b)

3.2

Guard time and cyclic extension

3.3

Windowing

(a)

(b)

Chapter 4

A Product Code Based OFDM

System

4.1

Model-based multicarrier channel estimation

4.1.1

A mathematical model

4.1.2

Regression model based approach

4.1.3

Channel estimation procedure

4.2

Joint channel estimation and TPC decoding

(a)

(b)

4.3

Double helical interleaver

4.3.1

Structure of double helical interleaver

4.4

Simulation results

Chapter 5

Parallel Concatenated Product

Codes

5.1

Encoder

π

5.2

Decoder

APP

[R]

[W

(m )

]

行政院國家科學委員會專題研究計畫成果報告

中華民國 94 年 10 月 25 日