一個使用互反雙重柵欄的渦輪解碼器設計

(1)

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

一個使用互反雙重柵欄的渦輪解碼器設計

A Turbo Decoder Design Using Reciprocal Dual Trellis

學生：林振揚

指導教授：張錫嘉教授

(2)

一個使用互反雙重柵欄的渦輪解碼器設計

A Turbo Decoder Design Using Reciprocal Dual

Trellis

研究生：林振揚 Student：Chen-Yang Lin

指導教授：張錫嘉教授 Advisor：Hsie-Chia Chang

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

A Thesis

Submitted to Department of Electronics Engineering & Institute Electronics College of Electrical and Computer Engineering

National Chiao Tung University In Partial Fulfillment of the Requirements

for the Degree of Master of Science

in

Electronics Engineering July 2009

Hsinchu, Taiwan, Republic of China

(3)

一個使用互反雙重柵欄的渦輪解碼器設計

學生：林振揚指導教授：張錫嘉教授

國立交通大學

電子工程學系電子研究所碩士班

摘要

錯誤更正碼一般而言需要比較有彈性的選擇碼率來適應不同的通道環境。為了達到這個需求，不同碼率的解碼器必須要被提出來，而高碼率的錯誤更正碼是需要被採用來提高通道的使用效率及傳輸速度。一般的渦輪解碼器通常使用高基數(high radix)的柵欄結構來解高碼率的碼，因此當碼率升高時，柵欄的複雜度會呈指數函數現象提高。在本論文中，我們引用了互反雙重柵欄的結構來減低柵欄在高碼率解碼器的複雜度。此外，我們採用了 Sign-Magnitude 的數字表示方式來更進一步降低硬體複雜度。我們使用穿孔(puncture)技術在 WCDMA 的渦輪編碼器上來產生不同碼率的碼。在研究了四種碼率 1/3、1/2、2/3、4/5 的穿孔渦輪碼後，模擬結果顯示錯誤更正效能隨著碼率越低而提高。在本論文最後，根據四種不同碼率的 SISO 解碼器合成解果顯示，當碼率提高時，邏輯閘只會有些許的增加。最後，我們提出了一個多重碼率渦輪解碼器的硬體架構，其傳輸速度會隨著操作碼率的提高而上升。根據在 90nm 製程下的實驗結果，所提出的解碼器包含 370k 的運算邏輯閘及 58kb 的儲存單元。在供應電壓 0.9 伏特下，操作在碼率 4/5 的功率消耗是 80mW，並且達到 101Mb/s 的傳輸速度。

(4)

A Turbo Decoder Design Using Reciprocal Dual

Trellis

Student：Chen-Yang Lin

Advisor：Dr. Hsie-Chia Chang

Department of Electronics Engineering

Institute of Electronics

National Chiao Tung University

Abstract

ECC codes generally require the selection of a flexible coding rate to meet different channel characteristics. In addition, the high code rate schemes are required to increase channel efficiency for high throughput systems. Since conventional turbo decoders in high code rate usually apply high radix trellis structure, the complexity of trellis increases exponentially as the code rate rises. In this thesis, we introduce the reciprocal dual trellis to reduce the trellis complexity for high code rate. The sign magnitude representation is introduced to lower the hardware complexity. We apply puncture methodology to turbo code of WCDMA to generate different code rates. After investigate four code rates 1/3, 1/2, 2/3, 4/5 of punctured turbo codes, the simulations results reveal that the performance can be improved by using lower code rate.

The synthesis results of the four code rate SISO decoders show that there is a moderate increase of logic gates as code rate rises. Finally, a multiple code-rate turbo

(5)

decoder architecture using the reciprocal dual trellis is proposed. As the operating code rate rises the throughput also increases. Fabricated by UMC CMOS 90nm 1P9M process, the proposed decoder which contains 370K gates and 58kb storage elements can achieve 101Mb/s with 80mW under code rate 4/5.

(6)

誌謝

韶光荏苒、光陰匆匆、歲月如梭，兩年的碩士研究生涯，有如白駒過隙，終於，我也可以寫到這一頁… 首先要感謝張錫嘉老師，感謝老師不僅能夠在研究上給予我指導，更難能寶貴的是，我可以看到一位領導者應有的風範及態度，西方有句諺語:「要使水手去造船，不如教導他們嚮往大海。」老師一直提醒我們要做一隻會去找草來吃的羊，而不是等待著牧羊人的餵食，老師對於研究總是懷著赤子之心，總是在我困惑的時候，教我如何去思考，我想這是除了專業知識之外，是我從張老師身上找到最好的寶物之一，在此，再次感謝張錫嘉老師。再來要感謝 Ocean 及 Oasis 的成員，實在是相當開心能夠在這裡學習。首先要感謝大頭學長，他總是扮演著老師的角色，對於我的研究成果的分享或是遇到瓶頸的時候，總是隨 call 隨到，並且很有遠見的幫我提一些建議。還有國光學長，除了感謝你常常幫我們處理工作站上的問題之外，也覺得你是個很可以分享生活瑣事的學長。還要感謝跟我一起度過碩士生涯的夥伴們，不論是一起做研究或是慶生，都覺得你們真的是相當窩心的好夥伴，謝謝你們曾經陪伴過我。另外還要感謝新竹地區禪學社及領袖社的夥伴們，深深感受到能夠認識你們真的是我相當大的福氣，我們一起努力籌備著很多活動，讓我這個研究生老人還可以常常體會到年輕人的熱情、活力、還有一些的懵懂，哈哈!有了你們的陪伴，我的研究生涯才不會太枯燥，讓我還能常常保有一顆年輕的心，總是能夠很樂觀的去面對每個挑戰，謝謝你們。最後我要感謝最親愛的家人以及女朋友，在我想退縮的時候拉我一把；在我失意的時候願意給我支持。最後將此論文獻給曾經幫助過我的人，我滿懷著感恩的心，謝謝你們。

(7)

4 Dual-MAP Turbo Decoder Architecture 35 4.1 Architecture overview . . . 35 4.2 Dual-MAP decoder . . . 37 4.2.1 SMM unit . . . 37 4.2.2 SMA unit . . . 38 4.2.3 Extrinsic unit . . . 39 4.3 Throughput evaluation . . . 39 5 Implementation Results 41 5.1 Implementation of Dual-MAP turbo decoders . . . 41

5.1.1 Comparison of diﬀerent high radix MAP decoders . . . 42

5.1.2 Layout speciﬁcation . . . 43

5.2 Comparison of diﬀerent turbo decoders . . . 44

6 Conclusion and Future Works 46 6.1 Conclusion . . . 46

(9)

List of Figures

1.1 Block diagram of digital communication system. . . 2

2.1 Turbo encoder with puncture. . . 4

2.2 Interleaver of turbo decoder. . . 5

2.3 Conventional Turbo decoder. . . 6

2.4 A rate 1/2 memory order 2 RSC encoder and its state transition diagram. 8 2.5 The trellis diagram of the (2,1,2) RSC encoder. . . 8

2.6 Origianl trellis and reciprocal dual trellis comparison. . . 14

2.7 The process diagram of sliding window algorithm. . . 18

2.8 Diﬀerent sliding window length Comparison . . . 19

3.1 Correction term of ¯E operation. . . . 22

3.2 Puncture procedure. . . 24

3.3 Trellis transformation relationship . . . 26

3.4 Performance of using Dual-MAP and MAP under punctured code . . . 27

3.5 Turbo encoder for WCDMA. . . 28

3.6 Performance of diﬀerent code rate with block length 2048. . . 29

3.7 Comparison of diﬀerent sliding window size. . . 30

3.8 Comparison of diﬀerent iteration. . . 31

3.9 Fixed point comparison. . . 32

3.10 Segment method. . . 33

3.11 Performance comparison of diﬀerent segment method. . . 33

3.12 Performance of each code rate with design parameters. . . 34

4.1 Iterative decoding of turbo decoder. . . 36

(10)

4.3 Architecture of SMM unit. . . 37

4.4 Architecture of SMA unit. . . 38

4.5 Radix-2 and radix-4 Log MAP recursion units. . . 39

4.6 Architecture of Extrinsic unit. . . 40

5.1 Area distributin of modules in turbo decoder. . . 42

(11)

List of Tables

3.1 Puncture tables . . . 25

3.2 Summary of ﬁxed representation in MAP decoder . . . 31

5.1 Summary of synthesis result . . . 41

5.2 Compare with radix-4×4 MAX-LOG-MAP circuits . . . 42

5.3 Proposed turbo decoder chip summary . . . 44

(12)

Chapter 1 Introduction

1.1 Motivation

The fundamental block diagram of traditional digital communication system is illus-trated in Fig. 1.1. The system transmit an information source to a destination through an unknown channel. Generally, the communication system is simplified to three compo-nent parts which consists of transmitter, receiver, and channel. The transmitter includes source encoder, channel encoder, and modulator, is used to transmit the information more effectively and more reliably over unknown channels. Furthermore, the receiver will re-verse the signal received by demodulator, channel decoder, and source decoder. Since the channel impairments such as noise, interference and distortion may cause the error in the received signal, the channel encoder is used in the system in order to minimize the trans-mission errors by adding certain redundancy to the source codeword. These redundant bits can be used for error detecting and correcting. Thus, the channel coding eliminate the effects of noise disturbances compared with an uncoded communication system.

However, most channel code schemes are not flexible and efficient for modern data transmission. For videodata, some important parts must be well protected to ensure reconstructed quality. Therefore, we can use more check bits to protect the important part of videodata and use less check bits when transmit less important data. Further, this encode manner can also apply to meet different channel situations. For example, we can use more check bits when data will be transmit through a noisy channel and use less check bits when transmit through better channel.

(13)

Information Source Source Encoder Channel Encoder Modulator Information destination Source Decoder Channel Decoder Demodulator Channel

Figure 1.1: Block diagram of digital communication system.

From the view of the communication system, we integrate the features of data type and channel to design a ﬂexible channel decoder. Furthermore, we focus on turbo decoders because it performs excellently on error correction ability. However, high code rate turbo decoder design is a real challenge since the complexity of its trellis structure. In this thesis, we will apply another decoding concept mentioned in Ref. [1] to slove this problem. Furthermore, we try to apply various code rates to protect diﬀerent kind of data and to achieve unequal error protection. Finally, we want to design a multiple code rate and low hardware complexity turbo decoders.

1.2 Thesis organization

This thesis consists of 6 chapters. In chapter 2, the concepts of several iterative de-coding algorithms of turbo codes will be introduced. In chapter 3, we will apply puncture tables to turbo codes of WCDMA system to achieve high code rate. Furthermore, the simulation analysis and hardware architecture parameters are also described. Chapter 4 introduces the design of reciprocal dual trellis turbo decoder, including the hardware architecture and characteristics of decoder. In chapter 5, the hardware implementation result will be shown. Finally, the conclusions and future works are given in chapter 6.

(14)

Chapter 2 Turbo Code

The parallel concatenated convolutional code, also named turbo code, was invented by C. Berrou, A. Glavieux, and P. Thitimajshima in 1993. It has been proved to have a excellent performance near Shannon limit. The common turbo encoder is composed of two recursive systematic convolutional code with parallel concatenated and separate by a pseudo random interleaver. Turbo code is adopted in 3GPP, 3GPP2, DVB-RCS and WiMAX standards due to its excellent error correction ability. In this chapter, turbo decoding with original trellis and reciprocal dual trellis will be introduced. The error ﬂoor eﬀect in turbo decoding and some decoding techniques will also be interpreted.

2.1 Turbo principle

2.1.1 Encoder of turbo code

The turbo encoder is composed of two recursive systematic convolutional (RSC) en-coders. Which are connected in parallel but separated by an interleaver. The block diagram of the turbo encoder is illustrated in Fig. 2.1. Puncture table is used to select the send bits or is an option to increase the data rate. In the ﬁrst encoder, the information bits are encoded to the systematic part c₀(D) and the parity part c₁(D); thus, c₀(D) =

x(D). The second encoder encodes the bit stream ˜x(D), which are the information bits

passing through the interleaver. However, the systematic part after interleaving ˜x(D) will

be not be send during transmission. Code rate of an encoder is deﬁned: R = (information bits in a codeword)/(total codeword bits).

(15)

RSC1

RSC2

Interleaver

( )

D x

)

(

~ D

x

( )

D

c

₀

( )

D

c

₁

( )

D

c

₂

Puncture

Table

OutStream

Figure 2.1: Turbo encoder with puncture.

The following derivations for R, we do not consider the puncture table. Hence, the

OutStream in Fig. 2.1 is the codeword. Encoder 1 produces p₁ check bits and encoder 2 produced p₂ check bits, and the code rates are R₁ and R₂, respectively. If there are k information bits pass the turbo encoder and the overall turbo encoder code rate R can be derived as :

R = k

k + p₁+ p₂.

And we substitue p₁ and p₂ by applying code rate equations of encoder 1 and encoder 2.

R₁ = k

k + p₁, R2 = k k + p₂

Finally, we can derive the code rate of the overall turbo encoder. 1 R = 1 R₁ + 1 R₂ − 1 (2.1)

After encoding the information sequence, several terminating methods will be applied to stop the encoding process. We briefly describe three terminating methods. First, the simplest method is to truncate the information directly after encoded a block length. Therefore, this terminating approach results some performance loss. The second method is using dummy information bits at the end of the information sequence forcing registers in the encoder back to all zero states. This approach will maintain decoding performance, however, because of transmitting another dummy information bits, the code rate also decreases. The third is tail-biting method. This method encodes information bits twice. The first encoding procedure is aim to find the final register states in the encoder. The

(16)

second encoding procedure is the actual encoding, and the encoder starts at the state which is the ﬁnal state of ﬁrst encoding procedure. Hence, tail-biting approach results the end of the register not necessary to all zero. This method ensures the same error protection as the second method mentioned above, but the code rate is not changed. If the ending state of the decoding trellis was known, we can set initial condition more precisely, and theoretically, the decoding performance can be improved.

2.1.2 Turbo interleaver

This is a process of rearranging the ordering of a data sequence in a one-to-one de-terministic format. In turbo code, the interleaver such as in Fig. 2.2 is an essential component for bit-error-rate performance. A proper coding gain can be achieved with small memory encoders since the interleaver scrambles a long block input symbols. The interleaver de-correlates the input symbol between two encoders, therefore, an iterative decoding algorithm can be applied between two component decoders. The performance upper-bound corresponding to a uniform random interleaver has been evaluated in [2]. Theoretically, the block size (N ) of interleave increase, the performance of bit-error-rate is expected to get better, and the factor 1/N is also called the interleaver gain.

U1 U2 U3 U4 U5 U6

U3 U5 U4 U1 U6 U2

After Interleaving

Figure 2.2: Interleaver of turbo decoder.

2.1.3 Decoder of turbo code

A common iterative turbo decoder is shown in Fig. 2.3. Where rs is the received

systematic information, rp1 is the received parity generated by the ﬁrst component RSC

encoder, and rp2 is also the received parity generated by the second component RSC

encoder. The iterative turbo decoding consists of two series constituent SISO (soft in

(17)

interleaver is used to permute the systematic information and delivers the scrambled data into the second SISO decoder. During this iterative decoding procedure, each constituent SISO delivers the output extrinsic Lex which is the a priori Lin for the next constituent

SISO, therefore, L1_in = L2_ex and L2_in = L1_ex after the interleaver or de-interleaver process. Generally, the performance of bit-error-rate can be improved when the number of decoding iteration increases, however, there is no obvious improvement if a threshold of the iteration number has been reached.

SISO Decoder 1 Interleaver SISO Decoder 2 Interleaver Deinterleaver Deinterleaver

L

ex1

L

in2

L

ex2

L

in1 p1 p2 s Hard Decision

Figure 2.3: Conventional Turbo decoder.

2.1.4 Error ﬂoor eﬀect

Although turbo coding provides an excellent performance, the bit-error-rate (BER) certainly decrease quite slowly and almost saturate at high signal-to-noise ratio (SNR). This phenomenon is due to relative small free distance of turbo codes and is called an error

floor [3]. Consider the relation of minimum free distance and the bit error probability in

turbo coding, which can be expressed by

Pb ∝ Q 2df reeREb N₀ , (2.2)

where df ree is the minimum free distance of the codeword space, R is the code rate, and Eb/N0 is the SNR.

(18)

2.2 Decoding algorithm

In turbo decoding algorithm, the maximum a posteriori probability (MAP) [4] algo-rithm and soft-output Viterbi algoalgo-rithm (SOVA) [5] are commonly applied for the SISO decoders. Unlike the SOVA which uses maximum likelihood (ML) algorithm to minimize the word error probability, whereas, the MAP algorithm exploits the information of code-word to minimizes the symbol error probability. Therefoe, in this section, we will focus on MAP algorithm, because it has been proved that the MAP algorithm is the optimal decoding method for turbo codes compared with SOVA [6]. Moreover, some useful for hardware implementation algorithm such as the Log MAP and Max-Log MAP will also be introduced brieﬂy. Finally, an eﬀective decoding algorithm for high code rate will also be introduced [7].

2.2.1 The MAP algorithm

The MAP decoding algorithm (also called as BCJR algorithm), is introduced in 1974 by Bahl, Cocke, Jelinek, and Raviv [4]. For each transmitted information symbol ut,

the MAP algorithm estimates its a posteriori probabilities (APP) based on the whole received codeword sequence r over a discrete memoryless channel (DMC) and computes the log-likelihood ratio (LLR), which was deﬁned as:

L(ut) = L(ut|r) = log

P (ut = +1|r) P (ut=−1|r)

, (2.3) for 1 ≤ t ≤ N, where N is the received codeword length, and compares this value to a zero threshold to determine the hard estimatimation of ut :

ut= ⎧ ⎨ ⎩ +1, if L(ˆut)≥ 0 −1, otherwise (2.4)

As an example, a rate 1/2 memory order 2 RSC encoder and its state transition are illustrated in Fig. 2.4. Note that the solid lines represent the state transitions correspond-ing to an information bit ut of +1, while the dotted lines represent the state transitions

corresponding to an information bit utof−1. Its decoding trellis diagram is shown in Fig.

(19)

u 00 01 10 11 1/11 1/10 1/10 1/11 0/00 0/00 0/01 0/01 D D Information

Figure 2.4: A rate 1/2 memory order 2 RSC encoder and its state transition diagram. probabilities. Therefore, the equation can be further expressed as :

L(ut) = log P (ut= +1|r) P (ut=−1|r) = log (m,m)∈B+1_t P (St−1 = m, St= m|r) (m_,m_)∈B−1 t P (St−1 = m _{, S}_t_{= m}_|r) = log (m_,m_)∈B+1 t P (St−1 = m _{, S} t= m, r) (m_,m_)∈B−1 t P (St−1 = m _{, S}_t_{= m, r)}, (2.5)

where P (St−1 = m, St = m, r) represents the joint probability of the existing transition

from St−1 at time t to St at time t + 1. B+1t and B−1t is the sets of (m, m), denoted the

state transitions which are due to input bit ut = +1 and ut=−1 respectively.

In order to compute the joint probability required for L(ut) in (2.5), we deﬁne the

following metrics equations:

Forward Path Computing Backward Path Computing

ut = +1 ut = -1

α

β

S

t-1

S

t 00 01 10 11 00 01 10 11 Figure 2.5: The trellis diagram of the (2,1,2) RSC encoder.

(20)

• The forward recursion metric α :

αt(m) = P{St = m, rt₀} (2.6) • The backward recursion metric β :

βt(m) = P{rNt+1−1|St= m} (2.7) • The branch metric γ :

γt(m, m) = P{St= m, rt|St−1 = m} (2.8)

• The joint probability λ :

λt(m, m) = P (St−1 = m, St= m, r) (2.9)

Since we assume the codeword sequence after encoding is transmitted through discrete memoryless channel, the joint probability can be expressed as

λt(m, m) = P (St−1 = m, St = m, rt₀−1, rt, rNt+1−1)

= P (rN_t₊₁−1|St₋₁ = m, St= m, rt₀−1, rt)· P (St = m, rt|St−1 = m, rt₀−1)· P (St−1 = m, rt₀−1)

= P (rN−1_t₊₁ |St= m)· P (St= m, rt|St₋₁ = m)· P (St₋₁ = m, rt−1₀ ).

(2.10) Here, rt₀−1 represents the received codecord sequence at time instance 0 to t− 1, while

rN_t₊₁−1 is at time instance t + 1 to the end of sequence. The second equation of (2.10) results from Bayes’ rule, and the third equation is due to the Markov process in the state transitions. Therefore, the joint probability deﬁned in (2.9) can be expressed by terms of (3.1.2), (3.1.2) and (2.8), hence (2.9) can be written as :

λt(m, m) = αt−1(m)· γt(m, m)· βt(m). (2.11)

Now we will derive the equations (3.1.2), (3.1.2) and (2.8) as follow:

αt(m) = P (St= m, rt₀−1) = m∈S P (St−1 = m, St= m, rt₀−1) = m∈S P (St= m, rt−1|St−1 = m, r₀t−2)· P (St−1 = m, rt₀−2) = m∈S P (St= m, rt−1|St−1 = m)· P (St−1 = m, rt₀−2) = m∈S αt−1(m)· γt(m, m). (2.12)

(21)

Since that the registers of the encoder are all zero in the beginning of encoding process, hence, the initial condition of αt are :

α₀(0) = 1, α₀(m) = 0 for m= 0 (2.13) Similarly, we have βt(m) = P (rNt+1−1|St= m) = m∈S P (St+1 = m, rNt+1−1|St = m) = m∈S P (St+1 = m, rt+1, rtN+2−1, St= m) / P (St= m) = m∈S P (rN_t₊₂−1|St₊₁ = m, rt+1, St= m)· P (St+1 = m, rt+1|St= m) = m∈S P (rN_t₊₂−1|St₊₁ = m)· P (St₊₁= m, rt+1|St= m) = m∈S γt+1(m, m)· βt+1(m), (2.14)

where S represent the set of all states. If the trellis of encoding ﬁnally converges to zero state at t = N− 1, the following initial conditions of βt are :

βN(0) = 1, βN(m) = 0 for m = 0 (2.15)

Note that in Fig. 2.5, the forward metric α and the backward metric β are computed recursively in opposite direction, furthermore, the calculation of them requires the branch metric ﬁrst. Hence, for any existing transitions from state m to m in a trellis stage, the branch transition probability γt(m, m) can be derived as :

γt(m, m) = P (St = m, rt|St₋₁ = m) = P (St−1 = m _{, S} t= m, rt) P (St−1 = m) = P (St−1 = m _{, S} t= m) P (St−1 = m) · P (St−1 = m, St= m, rt) P (St−1 = m, St = m) = P (St = m|St−1 = m)· P (rt|St−1 = m, St= m) = P (ut)· P (rt|vt), (2.16)

Note that P (uk) is the a-prior probability of uk and vt is the codeword associated with

(22)

As a summary of the MAP algorithm, with computation of γt(m, m) in (2.16), we can

derive α and β for each state at diﬀerent time instances. As a result, the joint probability in (2.11) is also available for t = 0, 1,· · · , N − 1. The log-likelihood ratio L(ut) can be calculated by L(ut) = log (m_,m_)∈B+1 t αt−1(m ₎_{· γt}_(m_{, m)}_{· βt}_(m) (m,m)∈B−1_t αt−1(m)· γt(m, m)· βt(m) . (2.17)

2.2.2 The Log-MAP and MAX-Log-MAP algorithm

The MAP algorithm requires large memory and a large number of operations involving exponentiations and multiplications. The hardware realization of MAP decoder will be quite complex and diﬃcult. Therefore, the Log-MAP algorithm is proposed to solve this problem. First, we transfer the branch metrics deﬁned in the MAP algorithm to the logarithmic domain; that is

¯

γt(m, m) = log γt(m, m). (2.18)

Referring to (2.12) and (2.14), the forward path metric ¯αt can be expressed as

¯

αt(m) = log αt(m)

= log

m∈S

e¯αt−1(m)+¯γt(m,m)_, (2.19)

and the backward path metric ¯βt can be expressed as ¯

βt(m) = log βt(m)

= log

m∈S

e¯γt+1(m,m)+ ¯βt+1(m)_. (2.20)

Note that the initial conditions of path metrics also have changed, since all computations work with the logarithm domain.

¯

α₀(0) = 0, α¯₀(m) =−∞ for m = 0 ¯

βN(0) = 0, β¯N(m) =−∞ for m = 0

(2.21) After substituting (2.18), (2.19) and (2.20), the APP information L(ˆut) in (2.17) can be

rewritten as L(ut) = log (m,m)∈B+1_t e¯αt−1(m _)+¯γ_t_(m_,m_{)+ ¯}_β_t_(m) (m,m)∈B−1_t e¯αt−1(m _)+¯γ_t_(m_,m_{)+ ¯}_β_t_(m). (2.22)

(23)

Considering the following Jocobian algorithm [8] log(eδ1 _{+ e}δ2₎≡ max∗₍·)

= max(δ₁, δ₂) + log(1 + e−|eδ2−eδ1|) = max(δ₁, δ₂) + fc(|δ2− δ1|).

(2.23) where fc(·) is a compensation function and thus the performance can be improved. By a

recursive procedure of (2.23), the expression log(eδ1 _{+ e}δ₂ ₊_{· · · + e}δ_n_{) can be computed}

exactly, as follows log(eδ1 _{+ e}δ₂ +· · · + eδn_{) = log(Δ + e}δ_n ), Δ = eδ1 ₊· · · + eδ_n−1 = eδ = max(log Δ, δn) + fc(|log Δ − δn|) = max(δ, δn) + fc(|δ − δn|). (2.24) Now we can use (2.23) to represent forward metrics in (2.19) and backward metrics in (2.20) as ¯ αt(m) = max m∈S ∗_{¯αt −1(m) + ¯γt(m, m)}, (2.25) and ¯ βt(m) = max m∈S ∗_{¯γt +1(m, m) + ¯βt+1(m)}, (2.26)

Therefore, the (2.29) can be expressed as

L(ˆut) = max (m_,m_)∈B+1 t ∗_{¯αt −1(m) + ¯γt(m, m) + ¯βt(m)} − max (m,m)∈B−1_t ∗_{¯αt −1(m) + ¯γt(m, m) + ¯βt(m)}. (2.27) The Log MAP algorithm, the (2.27), are considered to reduce the hardware complexity comparing with MAP algorithm. However, some diﬃculty for hardware implementation still exists since computing fc(·) also involves exponentiations and multiplications. This

problem can be solved by using a look up table, but this approach might result a little bit-error rate degration and increase the size of hardware.

In order to further simplify the complexity, consider the approximation derived in (2.28). As the approximation is used to reduce the complexity of the MAP algorithm, the performance of the Max-Log MAP algorithm is sub-optimal.

log(eδ1 _{+ e}δ₂ ₊_{· · · + e}δn₎_{≈ max}

i∈{1,2,·,n}

(24)

Note that the term fc(·) is ignored in comparison with (2.24). Then we can simplify the equation (2.22) as follows: L(ut) = max (m,m)∈B+1_t {¯αt−1 (m) + ¯γt(m, m) + ¯βt(m)} − max (m,m)∈B−1_t {¯αt−1 (m) + ¯γt(m, m) + ¯βt(m)}. (2.29) Therefore, compared with the MAP algorithm, the Max-Log-MAP algorithm utilizes additions to replace the multiplications and avoids the complicated exponentiations. How-ever, the performance would degrade because of the information loss in (2.28).

2.3 Decoding with reciprocal dual trellis

To meet the growing demand of high data rate at high bandwidth and power eﬃcien-cies, some researches have been focused on high code rate and their decoding algorithms that are powerful in the view of correction ability, yet reasonable complexity. However, for high rate k/n convolutional code (n− k < k), the branch calculation in normal MAP algorithm applying on trellis constructed by encoder polynomial is highly complicated

Radix− 2k. In such case, the MAP algorithm working on the corresponding reciprocal dual code’s trellis is less complexity Radix− 2n−k _{since the number of codeword in}

reciprocal dual code space are less than that of the original code.

Decoding trellis shown as Fig. 2.5 can be treated as a linear block code. All of the paths are possible codewords generated by one of the RSC encoder, and one can calculated by other codewords. For example, for an (n = 3, k = 2) encoder with 2 registers and the decoding trellis using original and reciprocal dual trellis are illustrate in Fig. 2.6. Each state in reciprocal dual trellis is connected to 2 branches and that in original trellis is connected to 4 branches. Thus, if interleaver length is ﬁxed and k gets larger, this decoding procedure seems to be diﬃcult due to the complexity of original decoding trellis.

In [7], a new MAP decoding algorithm for high code rate convolutional codes using reciprocal dual convolutional code is presented. The advantage of this approach is a reduction of the computational complexity since the number of codewords to calculate is decreased for code rate higher than 1/2. According to [7], the log-likelihood ratio of a posterior probability L(ul) can be alternatively calculated by its reciprocal dual codewords

(25)

Original decoding trellis

Reciprocal dual trellis

00

01

10

11

Encoder state

Figure 2.6: Origianl trellis and reciprocal dual trellis comparison. ˜ c⊥_i , 1 i 2N−K_{, that is :} L(ul) = L(cl; yl) + log ˜c⊥ i∈ ˜C⊥ N−1 j=0,j=ltanh (L (cj; yj) /2) ˜c⊥ ij ˜c⊥ i∈ ˜C⊥(−1) ˜c⊥ ilN−1 j=0,j=ltanh (L (cj; yj) /2)˜c ⊥ ij (2.30) Note that at the rightest side of (2.30) is the log-likelihood ratio of extrinsic value L( ul) :

L( ul) = log ˜c⊥ i∈ ˜C⊥ N−1 j=0,j=ltanh (L (cj; yj) /2) ˜c⊥ ij ˜c⊥ i∈ ˜C⊥(−1) ˜c⊥ ilN−1 j=0,j=ltanh (L (cj; yj) /2) ˜c⊥ ij (2.31)

where c = (c₀, c₁, ..., cN−1) is a codeword of a systematic block code C, ˜c⊥ =

˜

c⊥₀, ˜c⊥₁, ..., ˜c⊥_N₋₁

is a codeword of reciprocal dual code ˜C of C, and yl refers to the matched ﬁlter output

associated with cl. L (cj; yj) = ⎧ ⎨ ⎩ L (yj|cj) + L (cj) , if cj is an information bit L (yt|cj) , if cj is an parity check bit

(2.32) Under an additive white Gaussian noise (AWGN) and has the varience σ2 = N₀/(2Es),

the term L (yj|cj) can be written as :

L (yj|cj) = 4 Es

N₀ · yj, where N0/(2Es) is the signal-to-noise ratio (2.33)

And L (cj) ia the LLR of a prior probability, which is denoted : L (cj) = log

P (cj = 0) P (cj = 1)

(26)

2.3.1 Construct reciprocal dual trellis

In this section, we will introduce reciprocal dual trellis from some basic algebric prop-erties of convolutional codes. A rate R = k/n convolutional encoder under the ﬁeld

F = GF (2) generates codeword vt at time t

vt= (vt0, ..., v n−1 t )∈ F n and given ut = (u0t, ..., u k−1

t ) are information bits. Sequences of ut and vt can be written

as u(D) = ∞ t=0 utDt , v(D) = ∞ t=0 vtDt

and the encoder can realize the mapping by the polynomial G(D) such that v(D) =

u(D)G(D).

The dual convolutional code C⊥ of a convolutional code C is a (n− k)-dimension which consists of all code seqence v⊥(D) orthogonal to all v(D) ∈ C. Hence, C⊥ is a (n, n− k) convolutional code generated by H with property G(D)HT(D) = 0.

With a code C, a reciprocal convolutional code ˜C can be obtained by substituting

D−1 for D in G and by multiplying the j−th row with Dd(j)_{, where 1}_{j k and d(j) is}

the degree of the j−th row of G(D). As a result, ˜v(D) ∈ ˜C is equal to the time-reversed

sequence v(D−1).

We summarize the steps to construct reciprocal dual trellis for convolutional codes when its encoder is given by G(D) :

1. Transfer G(D) to equivalent systematic encoder Gsys(D) if G(D) is not systematic.

2. Apply the property G(D)HT(D) = 0 to ﬁnd the corresponding parity check matrix

H(D).

3. Calculate the reciprocal polynomial of H(D), and denote it as ˜H(D).

4. The reciprocal dual trellis of G(D) can be constructed by using ˜H(D).

Here, we show an example. A rate 2/3 convolutional code C is described by the nonsystematic polynomial generator matrix

G (D) = ⎛ ⎝1 + D D 1 + D D 1 1 ⎞ ⎠

(27)

and is generated by the equivalent systematic matrix Gsys(D) = ⎛ ⎝1 0 1+D+D1 2 0 1 _1+D+D1+D2 ₂ ⎞ ⎠ Then the rate 1/3 dual code C⊥ is encoded by

H (D) =

1 1 + D2 1 + D + D2

Note that H(D) is the parity check matrix of G(D) with the property G(D)HT(D) = 0. Hence, the reciprocal dual code ˜C⊥ is generated by ˜H(D)

˜

H(D) = D2· H(D−1) =

D2 1 + D2 1 + D + D2

2.3.2 Decoding based on reciprocal dual trellis structure

In [7], (2.31) can be represented as the relationship of encoder states transition. First of all, we deﬁne two sets SA(s) and SB(s) to describe the possible transitions from a

state s to another state within one trellis stage. SA(s) contains the states si such that

there exits the transition si → s, and SB(s) is the set of destination states sj from state

s (s→ sj). Therefore, SA and SB are the same meaning as α and β. Here, we apply α, β, and γ parameters to represent forward and backward recursions.

Moreover, bits associated with transition s₁ → s₂ are combined in the n tuple, that are (b₀(s₁, s₂) , ..., bn−1(s1, s2)). Using the substitution gj = tanh (L (cj; yj) /2) at t trellis

stage, and deﬁne the partial products :

γ_t(s₁, s₂) =

n−1 j=0

gb_t_×n+jj(s1,s2) (2.35) The forward recursion

αt+1(s) =

s_∈S_A_(s)

αt(s)· γt(s, s) , 0 t < N − 1 (2.36)

The backward recursion

βt−1(s) =

s_∈S_B_(s)

β_t(s)· γt₋₁(s, s) , 2 t N (2.37) If we direct truncate the trellis at the end of receiving a codeword, the boundary conditions are α₀(s) = 1, 0 s < 2v_{, β}

(28)

number of register in one RSC and 2v is the state number of the encoder. Thus (2.31) can be rewitten as (2.39), where the special products ˜γ_t(l, s, s)

˜

γ_t(l, s, s) =

n−1 j=0,j=l−t·n

g_tb_·nj(s₀_+j1,s2) (2.38) The time instant t =l/n depends on index l.

L( ul) = log ₂v₋₁ s1=0 s2∈SB(s1)αt(s1)· ˜γt(l, s1, s2)· βt+1(s2) ₂v₋₁ s1=0 s2∈SB(s1)αt(s1)· (−1) b_l−t·n(s₁,s₂)· ˜γt_{(l, s} 1, s2)· βt+1(s2) (2.39) Finally, the LLR of a posterior probability (APP) can be represented by equations (2.33), (2.34), and (2.39)

L(ut) = L (cj) + 4 Es

N₀ · yj + L( ul). (2.40)

2.4 Sliding window method of turbo code

In the traditional SISO decoding algorithm, the LLR of APP computation requires the path metric values generated by the forward and backward processes. Furthermore, since the backward recursive computation initials from the end of decoding trellis, as shown in Fig. 2.5, the decoding process can be started after the entire block message to be received. If the received sequence length is large, it will lead to long output latency and huge memory requirement for hardware implementation. For example, the maximum block length of 3GPP standard is 5114, which means 5114 LLR values and path metrics should be stored. It is the main disadvantage of turbo code for real applications.

The main problem is that long block length can not be divided into several short sub-block immediately, since the unknown initial condition of backward recursive metrics computations will damage the performance of turbo codes. Therefore, the sliding window approach was proposed [9] to overcome it. This algorithm utilizes the fact that the backward metrics can be highly reliable even without the initial condition if the backward recursion goes long enough. Fig. 2.7 shows the process of the sliding window algorithm and will be further illustrated as follows. First, the received codeword sequence is divided into several sub-blocks of length of W . And W is called the convergence length, which normally is set to be ﬁve times the constraint length of component encoder in turbo code to ensure the reliable initialization. In the sliding window approach, the end of sub-block

(29)

i i+1 i+2 i+3 L(ut) L(ut) L(ut) t1 t2 t3 t4 W

α

β

1 2

β

1

β

1

β

2

β

1

β

2

β

α

2

β

α

Figure 2.7: The process diagram of sliding window algorithm.

is the initial of next sub-block whether the forward or backward recursive operation. Thus, the initial metric values are inherited from the last metrics calculated in the previous sub-block. Note that the dummy backward recursion β₁ is employed to establish the initial condition for the true backward recursion β₂. Although the initial condition for the β₁ is unknown except the last sub-block, we utilize the equally likely condition for the β₁ values at time instance (i + 1)· W :

β₁(m) = 1

M, for all m∈ S (2.41)

where S represents all possible states and M is equal to the total state number. During the forward recursion α proceeds in the i-th sub-block and stores these values into memory, the dummy backward recursion β₁ is performed in the i + 1 sub-block concurrently. As soon as the β₁ computation is ﬁnished, the initial metrics in the i-th sub-block are available for the β₂ recursion. And L(ˆut) can be calculated based on the α metrics in the memory, the β₂ metrics in computation, and the corresponding branches metrics in the i-th sub-block.

For example, we concatenate two truncated component codes deﬁned by

G (D) = ⎛ ⎝1 0 1+D+D1 2 0 1 _1+D+D1+D2₂ ⎞ ⎠

and using a block interleaver with length= 400 to abtain a (800, 400) code of rate= 1/2. Compared to diﬀerent sliding window, the bit error rate(BER) after six iterations is shown in Fig.2.8( SW25 represents window length=25 trellis stage and SW200 represents no sliding window). The reciprocal dual trellis decoding with sliding window is

(30)

signiﬁ-cantly less complex than its optimum counterpart in terms of memory cost, however, it achieves for both codes virtually the same bit error performance.

0 0.5 1 1.5 2 2.5 3 10−6 10−5 10−4 10−3 10−2 10−1 100 Eb/No(db) BER SW25 SW200

(31)

Chapter 3 Reciprocal Dual Trellis Algorithm

3G mobile multimedia communication systems based on WCDMA support a ﬂexible transmission capability to provide packet data service as well as voice service. Mobile mul-timedia communication requires a coding scheme that can accommodate various code rate requirements. In this thesis, we apply puncture skill to get various code rates and adopt WCDMA turbo code as the mother code, and using reciprocal dual trellis as decoding trellis.

3.1 Log domain approach

For high code rate, MAP algorithm working on the reciprocal dual trellis is preferable. However, in (2.39) reqiures both adders and multipliers, it is considered to be much hard-ware complexity. Hence, we further reduce the hardward cost by taking LOG operation of all metrics which is similar to Log-MAP method. However, the challenge involved in log domain implementation of this algorithm when bit metric value tanh (L (cj; yj) /2) is

negative. To present these metrics, some numerical transformation is applied.

3.1.1 Sign magnitude scheme

In [10], the sign magnitude is a simple representation to represent the reciprocal dual trellis metrics. In this, a real number x is represented as x = XSe−XM ≡ [XS, XM], where XS = sign (x) and XM = −log (|x|). The arithmetic operations with this representation

(32)

Negation :

−x ≡ [−XS, XM] (3.1)

Addition :

x + y ≡ min∗(x, y) = [S(min(XM, YM)), min(XM, YM)− log(1 + XSYSe−|XM−YM|)]

where S(XM) = XS or S(YM) = YS (3.2) Multiplication : x× y ≡ Sum∗(x, y) = [XSYS, XM + YM] (3.3) Division : x/y ≡ [XSYS, XM − YM] (3.4)

In the normal log-MAP, additionis implemented by the equivalent E operation. For the addition of two non-negative real numbers a and b, represented in log domain a A and B, we have

a + b≡ AEB = min(A, B) − log(1 + e−|A−B|) (3.5)

the second term− log(1+e−|d|), where d is the diﬀerence between the log domain values, is called the correction term. Another log domain operator that gives the absolute diﬀerence between two non-negative real numbers as :

|a − b| ≡ A¯EB = min(A, B) − log(1 − e−|A−B|₎ _(3.6)

this is also called the correction term. In sign-magnitude representation, the correction terms perform the algebraic addition of two real numbers, could involve either an E or ¯E

operation depending on the relative signs of the two values. The correction term in the

E operation which only takes values in the range [− log(2), 0], however, the ¯E operation

− log(1 − e−|d|_{) shown in Fig. 3.1, can have any value in the range [}_{∞, 0]. This makes the}

ﬁxed point hardware implementation more complex since the representation range should be chosen carefully, and warrants a large look up table (LUT) for the correction terms.

3.1.2 Proposed equation for calculating extrinsic value

In this section, we will discuss the computation of extrinsic value according to log domain scheme. According to (2.31), let us deﬁne the bit metric gj

(33)

0 1 2 3 4 5 6 0 1 2 3 4 5 6 |d| correction term −ln(1−exp(−|d|))

Figure 3.1: Correction term of ¯E operation.

and summation metric U_lb.

U_lb = ˜c⊥ i ∈ ˜C⊥;˜c⊥il=b N−1 j=0 g˜c_j⊥ij (3.7)

• The forward recursion metric α :

αt(st) = min∗(Sum∗_S(αt−1(st−1), γt(st−1, st))) • The backward recursion metric β :

βt−1(st−1) = min∗(Sum∗_S(γt(st−1, st)), βt(st))) • The branch metric γ :

γt(st−1, st) = Sum∗(g

bj(st−1,st)

j , ..., g

bj+n−1(st−1,st)

j+n−1 )

where t = [j/n] is the time index ([x] denotes the integer part of x), b is the transition bit from st−1 to st at decoding index l, and S is the set of possible transitions from state

(34)

it is : 1 gl0 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=0 N−1 j=0 g_j˜c⊥ij + 1 gl1 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=1 (−1)1 N−1 j=0 g_j˜c⊥ij = U_l0 − (U_l1/gl)

The numerator of (2.31) can be written as : 1 gl0 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=0 N−1 j=0 g˜c_j⊥ij+ 1 gl1 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=1 N−1 j=0 g˜c_j⊥ij = U_l0+ (U_l1/gl)

Therefore, (2.31) can be simpliﬁed to :

L( ul) = log 1 + Ul1 U_l0g_l 1− Ul1 U_l0g_l (3.8) Equation (3.8) is also described in [10]. In fact, Ul1

U_l0g_l can be represented as [U l

S, UMl ] due

to sign magnitude representation. Hence, (3.8) can be

L( ul) = log 1 + U_Sle−UMl 1− Ul Se−U l M (3.9) Using the relation tanh(x/2) = (1− e−x)/(1 + e−x), hence, (2.31) can also be written as

L( ul) = ⎧ ⎨ ⎩ − log(|tanh(Ul M/2)|), if U l S = 1 log(|tanh(Ul M/2)|), if USl =−1 (3.10) (3.10) is the proposed equation for calculating extrinsic values. Hence, the extrinsic equation (3.10) is much suitable to hardware design. According to (2.40), the LLR of a posteriori probability L(ul) is written as

L(ul) = log pl(0) pl(1)

+ C× rs+ L( ul), where C is a constant and rs is a recieved symbol.

(3.11) Hence, the rule of decision is made by:

decision = ⎧ ⎨ ⎩ 1, if L(ul) < 0 0, if L(ul)≥ 0 (3.12) For convenient, we call the sign magnitude algorithm to SM and reciprocal dual trellis algorithm to Dual-MAP in the following sections.

(35)

3.2 Punctured convolutional codes

In this section, we want to generate high rate encoder polynomials. A general way to generate high rate convolutional code is to use a low rate (1/2) encoder and delete some of its parity check bits, and this is called puncture method. Some codeword bits might be punctured according to a deterministic puncture pattern or puncture matrix. The bit error rate may be a little diﬀerent because of using diﬀerent puncture pat-terns. The puncture procedure is shown as Fig. 3.2, assume that a source sequence (X₀, X₁, ..., X₈, ..., XK−1) is sent to an encoder of code rate (1/2), and is encoeded to

codeword sequence (X₀, B₀, X₁, B₁, ..., X₈, B₈, ..., XK−1, BK−1). Before modulation, the

codeword is stolen some bits according to a puncture pattern. In Fig. 3.2, the puncture pattern is deﬁned P = ⎛ ⎝1 1 1 1 0 0 0 1 ⎞ ⎠

the column number of this matrix represents the puncture period, and element 1 means

X0 X1 X2 X3 X4 X5 X6 X7 X8 X0 X1 X2 X3 X4 X5 X6 X7 X8 B0 B1 B2 B3 B4 B5 B6 B7 B8 X0 X1 X2 X3 B3 X4 X5 X6 X7 B7 X8 Source Data Encoded Date Punctured Codeword Sent Codeword B0 means punctured X0 X1 X2 X3 X4 X5 X6 X7 X8 B0 B1 B2 B3 B4 B5 B6 B7 B8

Figure 3.2: Puncture procedure.

that the corresponding codeword bit must be sent whereas element 0 means that corre-sponding the bit is punctured or stolen. Finally, this puncture procedure increases the code rate from (1/2) to (4/5) and produces anther codeword sequence (X₀, X₁, X₂, X₃, B₃, ...X₈, ..., XK−3, XK−2, XK−1, BK−1)

We take an example, assume an original systematic encoder of code rate 1/2 is :

Gsys(D) =

1 _1+D+D1+D2₂

(36)

and using the puncture pattern P = ⎛ ⎝1 1 1 0 ⎞ ⎠

an equivalent punctured encoder with rate = 2/3 is deﬁned by :

G (D) = ⎛ ⎝1 + D 1 + D 1 D 0 1 + D ⎞ ⎠

Hence, the reciprocal dual code can be generated by the polynomial generator matrix: ˜

H (D) =

1 + D2 1 + D + D2 1 + D

The relationship of trellis (Gsys(D), G (D), ˜H (D)) is shown in Fig. 3.3. G (D)

trellis is a readix-4 for turbo decoders, however the reciprocal dual code ˜H (D) is a

radix-2 architecture. In Fig. 3.4 shows the performance of decoding algorithm using ˜H (D)

trellis (Dual-MAP) and G (D) (MAP) trellis. The interleaver length is 400 which results a (600, 400) punctured block code. The result does very make sense due to both decoding algorithm reach the same performance. However, using original puncture trellis is not eﬃcient compared to reciprocal dual trellis.

3.2.1 Apply rate compatible punctured turbo code to WCDMA

RCPC(Rate Compatible Punctured Convolutional) codes are one practical solution to adaptive coding mentioned in [11]. RCPC codes use a single rate-(1/n) convolutional encoder/decoder pair and only to share a puncturing table. RCPT(Rate Compatible Punctured Turbo) code can be used in a similar way a RCPC codes. This section will focus on applying RCPC to turbo code in WCDMA. The used puncture tables are shown

Table 3.1: Puncture tables Table P T 1 P T 2 P T 3 P T 4

Systematic 1 11 1111 11111111 Parity 1 1 01 0001 00000001 Parity 2 1 01 0001 00000001 Code rate 1/3 1/2 2/3 4/5

(37)

Nonpuncture trellis

Puncture trellis Reciprocal dual _trellis Reciprocal dual trellis Puncture 00 01 10 11 Encoder state 11 1-| 1-| 0-00 11 11 110 Code rate=1/2

Code rate=2/3 Dual code rate=1/3

Figure 3.3: Trellis transformation relationship

bits are reserved for all code rates, however, the parity bits are reserved for only some period. The ﬁrst row is systematic information, the second row is ﬁrst parity bit, and the third row is the second parity bit. The characteristic of the tables is that codeword of lower code rate must contains codeword of higher rate.

In 3GPP(WCDMA) standard, the scheme of turbo encoder is a Parallel Connected Convolutional Code (PCCC) with 8-state constituent encoders and one interleaver. The nonpuncture code rate of turbo encoder is 1/3. The structure of turbo encoder is shown in Fig. 3.5. The transfer function of the 8-state constituent code for the PCCC is

G (D) =

1 _1+D1+D+D2_+D33

The bit sequence input for a given code block to channel coding by c₀,c₁,c₂,c₃,...,cK−1,

where K is the number of bits to encode. After encoding, the bits are denoted by d(i)₀ ,

d(i)₁ , d(i)₂ , d(i)₃ ,...,d(i)_D₋₁, where D is the number of encoded bits per output stream and i indexes the encoder output stream.

(38)

0 0.5 1 1.5 2 2.5 3 3.5 10−6 10−5 10−4 10−3 10−2 10−1 100 Eb/No(db) BER MAP Dual−MAP

Figure 3.4: Performance of using Dual-MAP and MAP under punctured code The initial value of the shift registers of the 8-state constituent encoders shall be all zero when starting to encode the input bits. The output from the turbo encoder is

d(0)_k = xk,d(1)k = zk,d(2)k = zk for k = 0, 1, 2, ..., K− 1. The bits input to the turbo encoder

are denoted by c₀,c₁,c₂,c₃,...,cK−1, and the bits output from the ﬁrst and second encoder

are denoted by z₀,z₁,z₂,z₃,...,zK−1 andz₀,z₁,z₂,z₃,...,zK −1

3.2.2 Decoding with SM and Dual-MAP algorithm

When puncture table P T 1 is applied, that will produce a nonpunctured generator

G₁(D) which is the same as G (D). We deﬁne G₂(D), G₃(D) ,and G₄(D) are punctured generator when applying P T 2, P T 3, and P T 4, respectively. The systematic punctured generator polynomials are :

G₁(D) = 1 _1+D1+D+D2_+D33 G₂(D) = ⎛ ⎝1 0 1+D+D 2 1+D2_+D3 0 1 1+D+D_1+D₂_+D2+D₃ 3 ⎞ ⎠

(39)

D D D U Xk Zk D D D Z'k QPP Interleaver X'k (Information sequence )

Figure 3.5: Turbo encoder for WCDMA.

G₃(D) = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 0 0 0 _1+D1+D₂_+D2 ₃ 0 1 0 0 _1+D1+D2_+D3 0 0 1 0 _1+D1₂_+D₃ 0 0 0 1 _1+D1+D2_+D3 3 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ G₄(D) = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 0 0 0 0 0 0 0 1+D+D_D_+D₂3 0 1 0 0 0 0 0 0 1+D_D_+D2+D2 3 0 0 1 0 0 0 0 0 D_D2_+D+D₂3 0 0 0 1 0 0 0 0 _D_+DD 2 0 0 0 0 1 0 0 0 _DD_+D2₂ 0 0 0 0 0 1 0 0 _DD_+D3₂ 0 0 0 0 0 0 1 0 D_D+D_+D3₂ 0 0 0 0 0 0 0 1 D+D_D_+D2+D₂ 3 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

And their corresponding reciprocal dual trellis are generated respectively by ˜H₁(D), ˜ H₂(D), ˜H₃(D), and ˜H₄(D) : ˜ H₁(D) = 1 + D + D3 1 + D2+ D3 ˜ H₂(D) = D + D2+ D3 1 + D + D2+ D3 1 + D + D3s

(40)

˜ H₃(D) = D + D3 D2+ D3 D3 1 + D3 1 + D + D3 ˜ H₄(D) = D + D2 D3 D2 D D + D3 D + D2+ D3 D2+ D3 1 + D2+ D3 1 + D + D3 Notice that the row number of these matrix is only one even if their original code rate

are higher than 1/2. Here, we use these reciprocal dual trellis as decoding trellis and the BER performance are shown in Fig. 3.6

0.5 1 1.5 2 2.5 3 3.5 4 10-6 10-5 10-4 10-3 10-2 10-1 Eb/No(db) BER

1/3

1/2

2/3

4/5

Figure 3.6: Performance of diﬀerent code rate with block length 2048.

3.3 Performance analysis

In this section, we will present the simulation results and some parameter setting for hardware implementation. All the simulation results are signal-to-noise(SNR) versus BER under BPSK modulation and AWGN channel. In Fig 3.7, there is about 0.05dB loss between the sliding window size of 32 and 64 at the BER= 10−5 under the ﬁxed 6

(41)

iterations. At code rates 1/3, 1/2, 2/3, and 4/5, the block length of each is 2048, and the performance of diﬀerent iterations are presented in Fig 3.8. The BER performance of each code rate is almost saturate at 6 iterations. Thererfore, we choose iteration number 6 to our design. 0 0.5 1 1.5 10-6 10-5 10-4 10-3 10-2 10-1 100 Eb/No(db) BER

window size=16

window size=32

window size=64

Figure 3.7: Comparison of diﬀerent sliding window size.

The fixed point representation of the internal variable in the SISO decoder is determined from the received symbol quantization. Fig. 3.9 shows the simulation result with different input symbol quantization under block length 2048, code rate 1/2, window size 32, and 6 iterations. Note that a, b in the figure denotes the quantization scheme where a is the integer part, and b is the fractional part. We can observe that the performance loss of (3.3) is quite small compared with (3.4) format. And we decide that the quantized format 6 bits (3.3) is suitable scheme for our design. In addition, the width of extrinsic information, branch metric, and path metric can be derived and we summarize the fixed representations in Table 3.2.

(42)

Figure 3.8: Comparison of diﬀerent iteration.

Table 3.2: Summary of ﬁxed representation in MAP decoder quantities Input Extrinsic Branch State

symbols information metrics metrics width 6 10 12 12

(43)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 10-6 10-5 10-4 10-3 10-2 10-1 100 Eb/No(db) BER

Fix3.3

Fix3.4

Floating

Figure 3.9: Fixed point comparison.

Finally, we talk about the look up table approximation shown in Fig. 3.1. Since the correction term log(1− e−|d|) can has any value in the range [∞, 0] if d goes from 0 to ∞, this makes correction term changes severely when d closes to zero. Therefore, segment method of d is a very signiﬁcant issue. Fig. 3.11 shows performance of two ways to segment the range of d. The simulations are under interleaver length 2048, code rate = 4/5, and sliding window size = 32 trellis stages. In Fig. 3.10, uniform segment is to divide the range to some equal intervals, and nonuniform segment is to divide the range according to the slope of correction term function. Here, according the performance of bit error rate, we shall choose nonuniform segment method to divide the range of d.

The design parameters and methodology are applied to our decoder. The performances of each code rate are shown in Fig. 3.12, and floating point simulations also are list. The fixed point performance loss is 0.3 to 0.5dB compared with floating point at BER = 10−5.

(44)

Uniformly segment Non-uniformly segment

Figure 3.10: Segment method.

0.5 1 1.5 2 2.5 3 3.5 4 10−6 10−5 10−4 10−3 10−2 10−1 Eb/No(db) BER uniform nonuniform floating

(45)

(46)

Chapter 4 Dual-MAP Turbo Decoder

Architecture

According to equation (2.39), there are 1, 2, 4, and 8 extrinsic values can be calculated during a trellis stage when we use P T 1, P T 2, P T 3, and P T 4 puncture tables respectively. In this chapter, we will disscus the architecture of reciprocal dual trellis turbo decoder.

4.1 Architecture overview

The architecture of proposed turbo decoder using reciprocal dual trellis is shown in Fig. 4.1. The SISO decoder performs Dual-MAP algorithm and outputs extrinsic values. The Input Buffers are used to store the received values from channel. The extrinsic memory stores the extrinsic values from SISO. Under each iteration, the SISO computes the extrinsic values which is a priori value estimation for the next iteration. There are two stages in a iteration. In normal stage, data is read from Input Buffer and extrinsic memory in normal order, and the extrinsic values are witten into the extrinsic memory in interleaved order. In interleaver stage, data is read from Input Buffer and extrinsic memory in interleaved order, and the extrinsic values are witten into the extrinsic memory in normal order. However, by using the reciprocal dual trellis, we can just apply radix-2 branch calculational circuit to achieve the turbo decoder design.

We apply sliding window approach mentioned in [9] to the SISO decoder. In Fig. 4.1, Window BUFs store the input soft values for evaluating α and β. The SMM is the

(47)

Output Buffer Input Buffer Decision Terminate parity1 parity 2 Decoder Systematic bit Ext

Window Buffer Window Buffer SMM SMM Buffer Extrinsic sets SMM d

β

SMA-SMA-

α

β

α

Input and intrinsic values

Dual_MAP

SISO

Decision bit Extrinsic Memory Interleaver Normal order Normal order Interleaver Rd Addr Rd Addr Wr Addr Extrinsic

Figure 4.1: Iterative decoding of turbo decoder.

abbreviation of sign magnitude multiplication and calculates the branch metrics. Each SMA is the abbreviation of sign magnitude addition and calculates the path metrics for each recursion. To avoid waiting for the whole codeword for β evaluation, we use SMA-βd

from the end of the next sliding window to evaluate the initial values of β. And the

α-Buﬀer performs the Last-In/First-Out (LIFO) for the reversing output of α.

Fig. 4.2 is the decoding schedule of the SISO decoder. At the ﬁrst time interval T₀, the input values are written into a Window BUF1 in reversing order. At the second time interval T₁, the second input data(W₁) are written into the other Window BUF2 also in reversing order and SMA-βdcalculates βdto evaluate the initial conditions of β.

Simulta-neously, the SMA-α utilizes the data in W₀ to compute α reversely and saves the results in the α-Buﬀer. Finally, the ﬁrst window data will be read at T₂ interval for β calcula-tion and the third window data(W₂) will be written into Window BUF1 simultaneously. When the SMA-β unit starts to calculate, the extrinsic values can also be calculated by Extrinsic sets. As a result, the latency of the SISO decoder is about two sliding window size.

一個使用互反雙重柵欄的渦輪解碼器設計

國立交通大學

電子工程學系 電子研究所碩士班

碩 士 論 文

一 個 使 用 互 反 雙 重 柵 欄 的 渦 輪 解 碼 器 設 計

A Turbo Decoder Design Using Reciprocal Dual Trellis

學生：林振揚

指導教授：張錫嘉教授

一 個 使 用 互 反 雙 重 柵 欄 的 渦 輪 解 碼 器 設 計

A Turbo Decoder Design Using Reciprocal Dual

Trellis

研 究 生：林振揚 Student：Chen-Yang Lin

指導教授：張錫嘉教授 Advisor：Hsie-Chia Chang

國 立 交 通 大 學

電子工程學系 電子研究所 碩士班

碩 士 論 文

一 個 使 用 互 反 雙 重 柵 欄 的 渦 輪 解 碼 器 設 計

國立交通大學

電子工程學系 電子研究所碩士班

摘 要

A Turbo Decoder Design Using Reciprocal Dual

Trellis

Student：Chen-Yang Lin

Advisor：Dr. Hsie-Chia Chang

Department of Electronics Engineering

Institute of Electronics

National Chiao Tung University

Abstract

誌 謝

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Motivation

1.2

Thesis organization

Chapter 2

Turbo Code

2.1

Turbo principle

2.1.1

Encoder of turbo code

RSC1

RSC2

( )

)

(

~ D

x

( )

D

c

( )

D

c

( )

D

c

Puncture

Table

2.1.2

Turbo interleaver

2.1.3

Decoder of turbo code

L

L

L

L

2.1.4

Error ﬂoor eﬀect

2.2

Decoding algorithm

2.2.1

The MAP algorithm

α

β

S

S

電子工程學系電子研究所碩士班

碩士論文

一個使用互反雙重柵欄的渦輪解碼器設計

一個使用互反雙重柵欄的渦輪解碼器設計

研究生：林振揚 Student：Chen-Yang Lin

國立交通大學

電子工程學系電子研究所碩士班

碩士論文

一個使用互反雙重柵欄的渦輪解碼器設計

電子工程學系電子研究所碩士班

摘要

誌謝