國立交通大學
電子工程學系 電子研究所碩士班
碩 士 論 文
一 個 使 用 互 反 雙 重 柵 欄 的 渦 輪 解 碼 器 設 計
A Turbo Decoder Design Using Reciprocal Dual Trellis
學生:林振揚
指導教授:張錫嘉教授
一 個 使 用 互 反 雙 重 柵 欄 的 渦 輪 解 碼 器 設 計
A Turbo Decoder Design Using Reciprocal Dual
Trellis
研 究 生:林振揚 Student:Chen-Yang Lin
指導教授:張錫嘉教授 Advisor:Hsie-Chia Chang
國 立 交 通 大 學
電子工程學系 電子研究所 碩士班
碩 士 論 文
A ThesisSubmitted to Department of Electronics Engineering & Institute Electronics College of Electrical and Computer Engineering
National Chiao Tung University In Partial Fulfillment of the Requirements
for the Degree of Master of Science
in
Electronics Engineering July 2009
Hsinchu, Taiwan, Republic of China
一 個 使 用 互 反 雙 重 柵 欄 的 渦 輪 解 碼 器 設 計
學生:林振揚 指導教授:張錫嘉 教授國立交通大學
電子工程學系 電子研究所碩士班
摘 要
錯誤更正碼一般而言需要比較有彈性的選擇碼率來適應不同的通道環境。為 了達到這個需求,不同碼率的解碼器必須要被提出來,而高碼率的錯誤更正碼是 需要被採用來提高通道的使用效率及傳輸速度。一般的渦輪解碼器通常使用高基 數(high radix)的柵欄結構來解高碼率的碼,因此當碼率升高時,柵欄的複雜度 會呈指數函數現象提高。在本論文中,我們引用了互反雙重柵欄的結構來減低柵 欄在高碼率解碼器的複雜度。此外,我們採用了 Sign-Magnitude 的數字表示方 式來更進一步降低硬體複雜度。我們使用穿孔(puncture)技術在 WCDMA 的渦輪編 碼器上來產生不同碼率的碼。在研究了四種碼率 1/3、1/2、2/3、4/5 的穿孔渦 輪碼後,模擬結果顯示錯誤更正效能隨著碼率越低而提高。 在本論文最後,根據四種不同碼率的 SISO 解碼器合成解果顯示,當碼率提 高時,邏輯閘只會有些許的增加。最後,我們提出了一個多重碼率渦輪解碼器的 硬體架構,其傳輸速度會隨著操作碼率的提高而上升。根據在 90nm 製程下的實 驗結果,所提出的解碼器包含 370k 的運算邏輯閘及 58kb 的儲存單元。在供應電 壓 0.9 伏特下,操作在碼率 4/5 的功率消耗是 80mW,並且達到 101Mb/s 的傳輸 速度。A Turbo Decoder Design Using Reciprocal Dual
Trellis
Student:Chen-Yang Lin
Advisor:Dr. Hsie-Chia Chang
Department of Electronics Engineering
Institute of Electronics
National Chiao Tung University
Abstract
ECC codes generally require the selection of a flexible coding rate to meet different channel characteristics. In addition, the high code rate schemes are required to increase channel efficiency for high throughput systems. Since conventional turbo decoders in high code rate usually apply high radix trellis structure, the complexity of trellis increases exponentially as the code rate rises. In this thesis, we introduce the reciprocal dual trellis to reduce the trellis complexity for high code rate. The sign magnitude representation is introduced to lower the hardware complexity. We apply puncture methodology to turbo code of WCDMA to generate different code rates. After investigate four code rates 1/3, 1/2, 2/3, 4/5 of punctured turbo codes, the simulations results reveal that the performance can be improved by using lower code rate.
The synthesis results of the four code rate SISO decoders show that there is a moderate increase of logic gates as code rate rises. Finally, a multiple code-rate turbo
decoder architecture using the reciprocal dual trellis is proposed. As the operating code rate rises the throughput also increases. Fabricated by UMC CMOS 90nm 1P9M process, the proposed decoder which contains 370K gates and 58kb storage elements can achieve 101Mb/s with 80mW under code rate 4/5.
誌 謝
韶光荏苒、光陰匆匆、歲月如梭,兩年的碩士研究生涯,有如白駒過隙,終 於,我也可以寫到這一頁… 首先要感謝張錫嘉老師,感謝老師不僅能夠在研究上給予我指導,更難能寶 貴的是,我可以看到一位領導者應有的風範及態度,西方有句諺語:「要使水手 去造船,不如教導他們嚮往大海。」老師一直提醒我們要做一隻會去找草來吃的 羊,而不是等待著牧羊人的餵食,老師對於研究總是懷著赤子之心,總是在我困 惑的時候,教我如何去思考,我想這是除了專業知識之外,是我從張老師身上找 到最好的寶物之一,在此,再次感謝張錫嘉老師。 再來要感謝 Ocean 及 Oasis 的成員,實在是相當開心能夠在這裡學習。首先 要感謝大頭學長,他總是扮演著老師的角色,對於我的研究成果的分享或是遇到 瓶頸的時候,總是隨 call 隨到,並且很有遠見的幫我提一些建議。還有國光學長, 除了感謝你常常幫我們處理工作站上的問題之外,也覺得你是個很可以分享生活 瑣事的學長。還要感謝跟我一起度過碩士生涯的夥伴們,不論是一起做研究或是 慶生,都覺得你們真的是相當窩心的好夥伴,謝謝你們曾經陪伴過我。 另外還要感謝新竹地區禪學社及領袖社的夥伴們,深深感受到能夠認識你們 真的是我相當大的福氣,我們一起努力籌備著很多活動,讓我這個研究生老人還 可以常常體會到年輕人的熱情、活力、還有一些的懵懂,哈哈!有了你們的陪伴, 我的研究生涯才不會太枯燥,讓我還能常常保有一顆年輕的心,總是能夠很樂觀 的去面對每個挑戰,謝謝你們。 最後我要感謝最親愛的家人以及女朋友,在我想退縮的時候拉我一把;在我 失意的時候願意給我支持。最後將此論文獻給曾經幫助過我的人,我滿懷著感恩 的心,謝謝你們。Contents
1 Introduction 1 1.1 Motivation . . . 1 1.2 Thesis organization . . . 2 2 Turbo Code 3 2.1 Turbo principle . . . 32.1.1 Encoder of turbo code . . . 3
2.1.2 Turbo interleaver . . . 5
2.1.3 Decoder of turbo code . . . 5
2.1.4 Error floor effect . . . 6
2.2 Decoding algorithm . . . 7
2.2.1 The MAP algorithm . . . 7
2.2.2 The Log-MAP and MAX-Log-MAP algorithm . . . 11
2.3 Decoding with reciprocal dual trellis . . . 13
2.3.1 Construct reciprocal dual trellis . . . 15
2.3.2 Decoding based on reciprocal dual trellis structure . . . 16
2.4 Sliding window method of turbo code . . . 17
3 Reciprocal Dual Trellis Algorithm 20 3.1 Log domain approach . . . 20
3.1.1 Sign magnitude scheme . . . 20
3.1.2 Proposed equation for calculating extrinsic value . . . 21
3.2 Punctured convolutional codes . . . 24
3.2.1 Apply rate compatible punctured turbo code to WCDMA . . . 25
3.3 Performance analysis . . . 29
4 Dual-MAP Turbo Decoder Architecture 35 4.1 Architecture overview . . . 35 4.2 Dual-MAP decoder . . . 37 4.2.1 SMM unit . . . 37 4.2.2 SMA unit . . . 38 4.2.3 Extrinsic unit . . . 39 4.3 Throughput evaluation . . . 39 5 Implementation Results 41 5.1 Implementation of Dual-MAP turbo decoders . . . 41
5.1.1 Comparison of different high radix MAP decoders . . . 42
5.1.2 Layout specification . . . 43
5.2 Comparison of different turbo decoders . . . 44
6 Conclusion and Future Works 46 6.1 Conclusion . . . 46
List of Figures
1.1 Block diagram of digital communication system. . . 2
2.1 Turbo encoder with puncture. . . 4
2.2 Interleaver of turbo decoder. . . 5
2.3 Conventional Turbo decoder. . . 6
2.4 A rate 1/2 memory order 2 RSC encoder and its state transition diagram. 8 2.5 The trellis diagram of the (2,1,2) RSC encoder. . . 8
2.6 Origianl trellis and reciprocal dual trellis comparison. . . 14
2.7 The process diagram of sliding window algorithm. . . 18
2.8 Different sliding window length Comparison . . . 19
3.1 Correction term of ¯E operation. . . . 22
3.2 Puncture procedure. . . 24
3.3 Trellis transformation relationship . . . 26
3.4 Performance of using Dual-MAP and MAP under punctured code . . . 27
3.5 Turbo encoder for WCDMA. . . 28
3.6 Performance of different code rate with block length 2048. . . 29
3.7 Comparison of different sliding window size. . . 30
3.8 Comparison of different iteration. . . 31
3.9 Fixed point comparison. . . 32
3.10 Segment method. . . 33
3.11 Performance comparison of different segment method. . . 33
3.12 Performance of each code rate with design parameters. . . 34
4.1 Iterative decoding of turbo decoder. . . 36
4.3 Architecture of SMM unit. . . 37
4.4 Architecture of SMA unit. . . 38
4.5 Radix-2 and radix-4 Log MAP recursion units. . . 39
4.6 Architecture of Extrinsic unit. . . 40
5.1 Area distributin of modules in turbo decoder. . . 42
List of Tables
3.1 Puncture tables . . . 25
3.2 Summary of fixed representation in MAP decoder . . . 31
5.1 Summary of synthesis result . . . 41
5.2 Compare with radix-4×4 MAX-LOG-MAP circuits . . . 42
5.3 Proposed turbo decoder chip summary . . . 44
Chapter 1
Introduction
1.1
Motivation
The fundamental block diagram of traditional digital communication system is illus-trated in Fig. 1.1. The system transmit an information source to a destination through an unknown channel. Generally, the communication system is simplified to three compo-nent parts which consists of transmitter, receiver, and channel. The transmitter includes source encoder, channel encoder, and modulator, is used to transmit the information more effectively and more reliably over unknown channels. Furthermore, the receiver will re-verse the signal received by demodulator, channel decoder, and source decoder. Since the channel impairments such as noise, interference and distortion may cause the error in the received signal, the channel encoder is used in the system in order to minimize the trans-mission errors by adding certain redundancy to the source codeword. These redundant bits can be used for error detecting and correcting. Thus, the channel coding eliminate the effects of noise disturbances compared with an uncoded communication system.
However, most channel code schemes are not flexible and efficient for modern data transmission. For videodata, some important parts must be well protected to ensure reconstructed quality. Therefore, we can use more check bits to protect the important part of videodata and use less check bits when transmit less important data. Further, this encode manner can also apply to meet different channel situations. For example, we can use more check bits when data will be transmit through a noisy channel and use less check bits when transmit through better channel.
Information Source Source Encoder Channel Encoder Modulator Information destination Source Decoder Channel Decoder Demodulator Channel
Figure 1.1: Block diagram of digital communication system.
From the view of the communication system, we integrate the features of data type and channel to design a flexible channel decoder. Furthermore, we focus on turbo decoders because it performs excellently on error correction ability. However, high code rate turbo decoder design is a real challenge since the complexity of its trellis structure. In this thesis, we will apply another decoding concept mentioned in Ref. [1] to slove this problem. Furthermore, we try to apply various code rates to protect different kind of data and to achieve unequal error protection. Finally, we want to design a multiple code rate and low hardware complexity turbo decoders.
1.2
Thesis organization
This thesis consists of 6 chapters. In chapter 2, the concepts of several iterative de-coding algorithms of turbo codes will be introduced. In chapter 3, we will apply puncture tables to turbo codes of WCDMA system to achieve high code rate. Furthermore, the simulation analysis and hardware architecture parameters are also described. Chapter 4 introduces the design of reciprocal dual trellis turbo decoder, including the hardware architecture and characteristics of decoder. In chapter 5, the hardware implementation result will be shown. Finally, the conclusions and future works are given in chapter 6.
Chapter 2
Turbo Code
The parallel concatenated convolutional code, also named turbo code, was invented by C. Berrou, A. Glavieux, and P. Thitimajshima in 1993. It has been proved to have a excellent performance near Shannon limit. The common turbo encoder is composed of two recursive systematic convolutional code with parallel concatenated and separate by a pseudo random interleaver. Turbo code is adopted in 3GPP, 3GPP2, DVB-RCS and WiMAX standards due to its excellent error correction ability. In this chapter, turbo decoding with original trellis and reciprocal dual trellis will be introduced. The error floor effect in turbo decoding and some decoding techniques will also be interpreted.
2.1
Turbo principle
2.1.1
Encoder of turbo code
The turbo encoder is composed of two recursive systematic convolutional (RSC) en-coders. Which are connected in parallel but separated by an interleaver. The block diagram of the turbo encoder is illustrated in Fig. 2.1. Puncture table is used to select the send bits or is an option to increase the data rate. In the first encoder, the information bits are encoded to the systematic part c0(D) and the parity part c1(D); thus, c0(D) =
x(D). The second encoder encodes the bit stream ˜x(D), which are the information bits
passing through the interleaver. However, the systematic part after interleaving ˜x(D) will
be not be send during transmission. Code rate of an encoder is defined: R = (information bits in a codeword)/(total codeword bits).
RSC1
RSC2
Interleaver( )
D x)
(
~ D
x
( )
D
c
0( )
D
c
1( )
D
c
2Puncture
Table
OutStreamFigure 2.1: Turbo encoder with puncture.
The following derivations for R, we do not consider the puncture table. Hence, the
OutStream in Fig. 2.1 is the codeword. Encoder 1 produces p1 check bits and encoder 2 produced p2 check bits, and the code rates are R1 and R2, respectively. If there are k information bits pass the turbo encoder and the overall turbo encoder code rate R can be derived as :
R = k
k + p1+ p2.
And we substitue p1 and p2 by applying code rate equations of encoder 1 and encoder 2.
R1 = k
k + p1, R2 = k k + p2
Finally, we can derive the code rate of the overall turbo encoder. 1 R = 1 R1 + 1 R2 − 1 (2.1)
After encoding the information sequence, several terminating methods will be applied to stop the encoding process. We briefly describe three terminating methods. First, the simplest method is to truncate the information directly after encoded a block length. Therefore, this terminating approach results some performance loss. The second method is using dummy information bits at the end of the information sequence forcing registers in the encoder back to all zero states. This approach will maintain decoding performance, however, because of transmitting another dummy information bits, the code rate also decreases. The third is tail-biting method. This method encodes information bits twice. The first encoding procedure is aim to find the final register states in the encoder. The
second encoding procedure is the actual encoding, and the encoder starts at the state which is the final state of first encoding procedure. Hence, tail-biting approach results the end of the register not necessary to all zero. This method ensures the same error protection as the second method mentioned above, but the code rate is not changed. If the ending state of the decoding trellis was known, we can set initial condition more precisely, and theoretically, the decoding performance can be improved.
2.1.2
Turbo interleaver
This is a process of rearranging the ordering of a data sequence in a one-to-one de-terministic format. In turbo code, the interleaver such as in Fig. 2.2 is an essential component for bit-error-rate performance. A proper coding gain can be achieved with small memory encoders since the interleaver scrambles a long block input symbols. The interleaver de-correlates the input symbol between two encoders, therefore, an iterative decoding algorithm can be applied between two component decoders. The performance upper-bound corresponding to a uniform random interleaver has been evaluated in [2]. Theoretically, the block size (N ) of interleave increase, the performance of bit-error-rate is expected to get better, and the factor 1/N is also called the interleaver gain.
U1 U2 U3 U4 U5 U6
U3 U5 U4 U1 U6 U2
After Interleaving
Figure 2.2: Interleaver of turbo decoder.
2.1.3
Decoder of turbo code
A common iterative turbo decoder is shown in Fig. 2.3. Where rs is the received
systematic information, rp1 is the received parity generated by the first component RSC
encoder, and rp2 is also the received parity generated by the second component RSC
encoder. The iterative turbo decoding consists of two series constituent SISO (soft in
interleaver is used to permute the systematic information and delivers the scrambled data into the second SISO decoder. During this iterative decoding procedure, each constituent SISO delivers the output extrinsic Lex which is the a priori Lin for the next constituent
SISO, therefore, L1in = L2ex and L2in = L1ex after the interleaver or de-interleaver process. Generally, the performance of bit-error-rate can be improved when the number of decoding iteration increases, however, there is no obvious improvement if a threshold of the iteration number has been reached.
SISO Decoder 1 Interleaver SISO Decoder 2 Interleaver Deinterleaver Deinterleaver
L
ex1L
in2L
ex2L
in1 p1 p2 s Hard DecisionFigure 2.3: Conventional Turbo decoder.
2.1.4
Error floor effect
Although turbo coding provides an excellent performance, the bit-error-rate (BER) certainly decrease quite slowly and almost saturate at high signal-to-noise ratio (SNR). This phenomenon is due to relative small free distance of turbo codes and is called an error
floor [3]. Consider the relation of minimum free distance and the bit error probability in
turbo coding, which can be expressed by
Pb ∝ Q 2df reeREb N0 , (2.2)
where df ree is the minimum free distance of the codeword space, R is the code rate, and Eb/N0 is the SNR.
2.2
Decoding algorithm
In turbo decoding algorithm, the maximum a posteriori probability (MAP) [4] algo-rithm and soft-output Viterbi algoalgo-rithm (SOVA) [5] are commonly applied for the SISO decoders. Unlike the SOVA which uses maximum likelihood (ML) algorithm to minimize the word error probability, whereas, the MAP algorithm exploits the information of code-word to minimizes the symbol error probability. Therefoe, in this section, we will focus on MAP algorithm, because it has been proved that the MAP algorithm is the optimal decoding method for turbo codes compared with SOVA [6]. Moreover, some useful for hardware implementation algorithm such as the Log MAP and Max-Log MAP will also be introduced briefly. Finally, an effective decoding algorithm for high code rate will also be introduced [7].
2.2.1
The MAP algorithm
The MAP decoding algorithm (also called as BCJR algorithm), is introduced in 1974 by Bahl, Cocke, Jelinek, and Raviv [4]. For each transmitted information symbol ut,
the MAP algorithm estimates its a posteriori probabilities (APP) based on the whole received codeword sequence r over a discrete memoryless channel (DMC) and computes the log-likelihood ratio (LLR), which was defined as:
L(ut) = L(ut|r) = log
P (ut = +1|r) P (ut=−1|r)
, (2.3) for 1 ≤ t ≤ N, where N is the received codeword length, and compares this value to a zero threshold to determine the hard estimatimation of ut :
ut= ⎧ ⎨ ⎩ +1, if L(ˆut)≥ 0 −1, otherwise (2.4)
As an example, a rate 1/2 memory order 2 RSC encoder and its state transition are illustrated in Fig. 2.4. Note that the solid lines represent the state transitions correspond-ing to an information bit ut of +1, while the dotted lines represent the state transitions
corresponding to an information bit utof−1. Its decoding trellis diagram is shown in Fig.
u 00 01 10 11 1/11 1/10 1/10 1/11 0/00 0/00 0/01 0/01 D D Information
Figure 2.4: A rate 1/2 memory order 2 RSC encoder and its state transition diagram. probabilities. Therefore, the equation can be further expressed as :
L(ut) = log P (ut= +1|r) P (ut=−1|r) = log (m,m)∈B+1t P (St−1 = m, St= m|r) (m,m)∈B−1 t P (St−1 = m , St= m|r) = log (m,m)∈B+1 t P (St−1 = m , S t= m, r) (m,m)∈B−1 t P (St−1 = m , St= m, r), (2.5)
where P (St−1 = m, St = m, r) represents the joint probability of the existing transition
from St−1 at time t to St at time t + 1. B+1t and B−1t is the sets of (m, m), denoted the
state transitions which are due to input bit ut = +1 and ut=−1 respectively.
In order to compute the joint probability required for L(ut) in (2.5), we define the
following metrics equations:
Forward Path Computing Backward Path Computing
ut = +1 ut = -1
α
β
S
t-1S
t 00 01 10 11 00 01 10 11 Figure 2.5: The trellis diagram of the (2,1,2) RSC encoder.• The forward recursion metric α :
αt(m) = P{St = m, rt0} (2.6) • The backward recursion metric β :
βt(m) = P{rNt+1−1|St= m} (2.7) • The branch metric γ :
γt(m, m) = P{St= m, rt|St−1 = m} (2.8)
• The joint probability λ :
λt(m, m) = P (St−1 = m, St= m, r) (2.9)
Since we assume the codeword sequence after encoding is transmitted through discrete memoryless channel, the joint probability can be expressed as
λt(m, m) = P (St−1 = m, St = m, rt0−1, rt, rNt+1−1)
= P (rNt+1−1|St−1 = m, St= m, rt0−1, rt)· P (St = m, rt|St−1 = m, rt0−1)· P (St−1 = m, rt0−1)
= P (rN−1t+1 |St= m)· P (St= m, rt|St−1 = m)· P (St−1 = m, rt−10 ).
(2.10) Here, rt0−1 represents the received codecord sequence at time instance 0 to t− 1, while
rNt+1−1 is at time instance t + 1 to the end of sequence. The second equation of (2.10) results from Bayes’ rule, and the third equation is due to the Markov process in the state transitions. Therefore, the joint probability defined in (2.9) can be expressed by terms of (3.1.2), (3.1.2) and (2.8), hence (2.9) can be written as :
λt(m, m) = αt−1(m)· γt(m, m)· βt(m). (2.11)
Now we will derive the equations (3.1.2), (3.1.2) and (2.8) as follow:
αt(m) = P (St= m, rt0−1) = m∈S P (St−1 = m, St= m, rt0−1) = m∈S P (St= m, rt−1|St−1 = m, r0t−2)· P (St−1 = m, rt0−2) = m∈S P (St= m, rt−1|St−1 = m)· P (St−1 = m, rt0−2) = m∈S αt−1(m)· γt(m, m). (2.12)
Since that the registers of the encoder are all zero in the beginning of encoding process, hence, the initial condition of αt are :
α0(0) = 1, α0(m) = 0 for m= 0 (2.13) Similarly, we have βt(m) = P (rNt+1−1|St= m) = m∈S P (St+1 = m, rNt+1−1|St = m) = m∈S P (St+1 = m, rt+1, rtN+2−1, St= m) / P (St= m) = m∈S P (rNt+2−1|St+1 = m, rt+1, St= m)· P (St+1 = m, rt+1|St= m) = m∈S P (rNt+2−1|St+1 = m)· P (St+1= m, rt+1|St= m) = m∈S γt+1(m, m)· βt+1(m), (2.14)
where S represent the set of all states. If the trellis of encoding finally converges to zero state at t = N− 1, the following initial conditions of βt are :
βN(0) = 1, βN(m) = 0 for m = 0 (2.15)
Note that in Fig. 2.5, the forward metric α and the backward metric β are computed recursively in opposite direction, furthermore, the calculation of them requires the branch metric first. Hence, for any existing transitions from state m to m in a trellis stage, the branch transition probability γt(m, m) can be derived as :
γt(m, m) = P (St = m, rt|St−1 = m) = P (St−1 = m , S t= m, rt) P (St−1 = m) = P (St−1 = m , S t= m) P (St−1 = m) · P (St−1 = m, St= m, rt) P (St−1 = m, St = m) = P (St = m|St−1 = m)· P (rt|St−1 = m, St= m) = P (ut)· P (rt|vt), (2.16)
Note that P (uk) is the a-prior probability of uk and vt is the codeword associated with
As a summary of the MAP algorithm, with computation of γt(m, m) in (2.16), we can
derive α and β for each state at different time instances. As a result, the joint probability in (2.11) is also available for t = 0, 1,· · · , N − 1. The log-likelihood ratio L(ut) can be calculated by L(ut) = log (m,m)∈B+1 t αt−1(m )· γt(m, m)· βt(m) (m,m)∈B−1t αt−1(m)· γt(m, m)· βt(m) . (2.17)
2.2.2
The Log-MAP and MAX-Log-MAP algorithm
The MAP algorithm requires large memory and a large number of operations involving exponentiations and multiplications. The hardware realization of MAP decoder will be quite complex and difficult. Therefore, the Log-MAP algorithm is proposed to solve this problem. First, we transfer the branch metrics defined in the MAP algorithm to the logarithmic domain; that is
¯
γt(m, m) = log γt(m, m). (2.18)
Referring to (2.12) and (2.14), the forward path metric ¯αt can be expressed as
¯
αt(m) = log αt(m)
= log
m∈S
e¯αt−1(m)+¯γt(m,m), (2.19)
and the backward path metric ¯βt can be expressed as ¯
βt(m) = log βt(m)
= log
m∈S
e¯γt+1(m,m)+ ¯βt+1(m). (2.20)
Note that the initial conditions of path metrics also have changed, since all computations work with the logarithm domain.
¯
α0(0) = 0, α¯0(m) =−∞ for m = 0 ¯
βN(0) = 0, β¯N(m) =−∞ for m = 0
(2.21) After substituting (2.18), (2.19) and (2.20), the APP information L(ˆut) in (2.17) can be
rewritten as L(ut) = log (m,m)∈B+1t e¯αt−1(m )+¯γt(m,m)+ ¯βt(m) (m,m)∈B−1t e¯αt−1(m )+¯γt(m,m)+ ¯βt(m). (2.22)
Considering the following Jocobian algorithm [8] log(eδ1 + eδ2)≡ max∗(·)
= max(δ1, δ2) + log(1 + e−|eδ2−eδ1|) = max(δ1, δ2) + fc(|δ2− δ1|).
(2.23) where fc(·) is a compensation function and thus the performance can be improved. By a
recursive procedure of (2.23), the expression log(eδ1 + eδ2 +· · · + eδn) can be computed
exactly, as follows log(eδ1 + eδ2 +· · · + eδn) = log(Δ + eδn ), Δ = eδ1 +· · · + eδn−1 = eδ = max(log Δ, δn) + fc(|log Δ − δn|) = max(δ, δn) + fc(|δ − δn|). (2.24) Now we can use (2.23) to represent forward metrics in (2.19) and backward metrics in (2.20) as ¯ αt(m) = max m∈S ∗{¯αt −1(m) + ¯γt(m, m)}, (2.25) and ¯ βt(m) = max m∈S ∗{¯γt +1(m, m) + ¯βt+1(m)}, (2.26)
Therefore, the (2.29) can be expressed as
L(ˆut) = max (m,m)∈B+1 t ∗{¯αt −1(m) + ¯γt(m, m) + ¯βt(m)} − max (m,m)∈B−1t ∗{¯αt −1(m) + ¯γt(m, m) + ¯βt(m)}. (2.27) The Log MAP algorithm, the (2.27), are considered to reduce the hardware complexity comparing with MAP algorithm. However, some difficulty for hardware implementation still exists since computing fc(·) also involves exponentiations and multiplications. This
problem can be solved by using a look up table, but this approach might result a little bit-error rate degration and increase the size of hardware.
In order to further simplify the complexity, consider the approximation derived in (2.28). As the approximation is used to reduce the complexity of the MAP algorithm, the performance of the Max-Log MAP algorithm is sub-optimal.
log(eδ1 + eδ2 +· · · + eδn)≈ max
i∈{1,2,·,n}
Note that the term fc(·) is ignored in comparison with (2.24). Then we can simplify the equation (2.22) as follows: L(ut) = max (m,m)∈B+1t {¯αt−1 (m) + ¯γt(m, m) + ¯βt(m)} − max (m,m)∈B−1t {¯αt−1 (m) + ¯γt(m, m) + ¯βt(m)}. (2.29) Therefore, compared with the MAP algorithm, the Max-Log-MAP algorithm utilizes additions to replace the multiplications and avoids the complicated exponentiations. How-ever, the performance would degrade because of the information loss in (2.28).
2.3
Decoding with reciprocal dual trellis
To meet the growing demand of high data rate at high bandwidth and power efficien-cies, some researches have been focused on high code rate and their decoding algorithms that are powerful in the view of correction ability, yet reasonable complexity. However, for high rate k/n convolutional code (n− k < k), the branch calculation in normal MAP algorithm applying on trellis constructed by encoder polynomial is highly complicated
Radix− 2k. In such case, the MAP algorithm working on the corresponding reciprocal dual code’s trellis is less complexity Radix− 2n−k since the number of codeword in
reciprocal dual code space are less than that of the original code.
Decoding trellis shown as Fig. 2.5 can be treated as a linear block code. All of the paths are possible codewords generated by one of the RSC encoder, and one can calculated by other codewords. For example, for an (n = 3, k = 2) encoder with 2 registers and the decoding trellis using original and reciprocal dual trellis are illustrate in Fig. 2.6. Each state in reciprocal dual trellis is connected to 2 branches and that in original trellis is connected to 4 branches. Thus, if interleaver length is fixed and k gets larger, this decoding procedure seems to be difficult due to the complexity of original decoding trellis.
In [7], a new MAP decoding algorithm for high code rate convolutional codes using reciprocal dual convolutional code is presented. The advantage of this approach is a reduction of the computational complexity since the number of codewords to calculate is decreased for code rate higher than 1/2. According to [7], the log-likelihood ratio of a posterior probability L(ul) can be alternatively calculated by its reciprocal dual codewords
Original decoding trellis
Reciprocal dual trellis
00
01
10
11
Encoder stateFigure 2.6: Origianl trellis and reciprocal dual trellis comparison. ˜ c⊥i , 1 i 2N−K, that is : L(ul) = L(cl; yl) + log ˜c⊥ i∈ ˜C⊥ N−1 j=0,j=ltanh (L (cj; yj) /2) ˜c⊥ ij ˜c⊥ i∈ ˜C⊥(−1) ˜c⊥ ilN−1 j=0,j=ltanh (L (cj; yj) /2)˜c ⊥ ij (2.30) Note that at the rightest side of (2.30) is the log-likelihood ratio of extrinsic value L( ul) :
L( ul) = log ˜c⊥ i∈ ˜C⊥ N−1 j=0,j=ltanh (L (cj; yj) /2) ˜c⊥ ij ˜c⊥ i∈ ˜C⊥(−1) ˜c⊥ ilN−1 j=0,j=ltanh (L (cj; yj) /2) ˜c⊥ ij (2.31)
where c = (c0, c1, ..., cN−1) is a codeword of a systematic block code C, ˜c⊥ =
˜
c⊥0, ˜c⊥1, ..., ˜c⊥N−1
is a codeword of reciprocal dual code ˜C of C, and yl refers to the matched filter output
associated with cl. L (cj; yj) = ⎧ ⎨ ⎩ L (yj|cj) + L (cj) , if cj is an information bit L (yt|cj) , if cj is an parity check bit
(2.32) Under an additive white Gaussian noise (AWGN) and has the varience σ2 = N0/(2Es),
the term L (yj|cj) can be written as :
L (yj|cj) = 4 Es
N0 · yj, where N0/(2Es) is the signal-to-noise ratio (2.33)
And L (cj) ia the LLR of a prior probability, which is denoted : L (cj) = log
P (cj = 0) P (cj = 1)
2.3.1
Construct reciprocal dual trellis
In this section, we will introduce reciprocal dual trellis from some basic algebric prop-erties of convolutional codes. A rate R = k/n convolutional encoder under the field
F = GF (2) generates codeword vt at time t
vt= (vt0, ..., v n−1 t )∈ F n and given ut = (u0t, ..., u k−1
t ) are information bits. Sequences of ut and vt can be written
as u(D) = ∞ t=0 utDt , v(D) = ∞ t=0 vtDt
and the encoder can realize the mapping by the polynomial G(D) such that v(D) =
u(D)G(D).
The dual convolutional code C⊥ of a convolutional code C is a (n− k)-dimension which consists of all code seqence v⊥(D) orthogonal to all v(D) ∈ C. Hence, C⊥ is a (n, n− k) convolutional code generated by H with property G(D)HT(D) = 0.
With a code C, a reciprocal convolutional code ˜C can be obtained by substituting
D−1 for D in G and by multiplying the j−th row with Dd(j), where 1 j k and d(j) is
the degree of the j−th row of G(D). As a result, ˜v(D) ∈ ˜C is equal to the time-reversed
sequence v(D−1).
We summarize the steps to construct reciprocal dual trellis for convolutional codes when its encoder is given by G(D) :
1. Transfer G(D) to equivalent systematic encoder Gsys(D) if G(D) is not systematic.
2. Apply the property G(D)HT(D) = 0 to find the corresponding parity check matrix
H(D).
3. Calculate the reciprocal polynomial of H(D), and denote it as ˜H(D).
4. The reciprocal dual trellis of G(D) can be constructed by using ˜H(D).
Here, we show an example. A rate 2/3 convolutional code C is described by the nonsystematic polynomial generator matrix
G (D) = ⎛ ⎝1 + D D 1 + D D 1 1 ⎞ ⎠
and is generated by the equivalent systematic matrix Gsys(D) = ⎛ ⎝1 0 1+D+D1 2 0 1 1+D+D1+D2 2 ⎞ ⎠ Then the rate 1/3 dual code C⊥ is encoded by
H (D) =
1 1 + D2 1 + D + D2
Note that H(D) is the parity check matrix of G(D) with the property G(D)HT(D) = 0. Hence, the reciprocal dual code ˜C⊥ is generated by ˜H(D)
˜
H(D) = D2· H(D−1) =
D2 1 + D2 1 + D + D2
2.3.2
Decoding based on reciprocal dual trellis structure
In [7], (2.31) can be represented as the relationship of encoder states transition. First of all, we define two sets SA(s) and SB(s) to describe the possible transitions from a
state s to another state within one trellis stage. SA(s) contains the states si such that
there exits the transition si → s, and SB(s) is the set of destination states sj from state
s (s→ sj). Therefore, SA and SB are the same meaning as α and β. Here, we apply α, β, and γ parameters to represent forward and backward recursions.
Moreover, bits associated with transition s1 → s2 are combined in the n tuple, that are (b0(s1, s2) , ..., bn−1(s1, s2)). Using the substitution gj = tanh (L (cj; yj) /2) at t trellis
stage, and define the partial products :
γt(s1, s2) =
n−1 j=0
gbt×n+jj(s1,s2) (2.35) The forward recursion
αt+1(s) =
s∈SA(s)
αt(s)· γt(s, s) , 0 t < N − 1 (2.36)
The backward recursion
βt−1(s) =
s∈SB(s)
βt(s)· γt−1(s, s) , 2 t N (2.37) If we direct truncate the trellis at the end of receiving a codeword, the boundary conditions are α0(s) = 1, 0 s < 2v, β
number of register in one RSC and 2v is the state number of the encoder. Thus (2.31) can be rewitten as (2.39), where the special products ˜γt(l, s, s)
˜
γt(l, s, s) =
n−1 j=0,j=l−t·n
gtb·nj(s0+j1,s2) (2.38) The time instant t =l/n depends on index l.
L( ul) = log 2v−1 s1=0 s2∈SB(s1)αt(s1)· ˜γt(l, s1, s2)· βt+1(s2) 2v−1 s1=0 s2∈SB(s1)αt(s1)· (−1) bl−t·n(s1,s2)· ˜γt(l, s 1, s2)· βt+1(s2) (2.39) Finally, the LLR of a posterior probability (APP) can be represented by equations (2.33), (2.34), and (2.39)
L(ut) = L (cj) + 4 Es
N0 · yj + L( ul). (2.40)
2.4
Sliding window method of turbo code
In the traditional SISO decoding algorithm, the LLR of APP computation requires the path metric values generated by the forward and backward processes. Furthermore, since the backward recursive computation initials from the end of decoding trellis, as shown in Fig. 2.5, the decoding process can be started after the entire block message to be received. If the received sequence length is large, it will lead to long output latency and huge memory requirement for hardware implementation. For example, the maximum block length of 3GPP standard is 5114, which means 5114 LLR values and path metrics should be stored. It is the main disadvantage of turbo code for real applications.
The main problem is that long block length can not be divided into several short sub-block immediately, since the unknown initial condition of backward recursive metrics computations will damage the performance of turbo codes. Therefore, the sliding window approach was proposed [9] to overcome it. This algorithm utilizes the fact that the backward metrics can be highly reliable even without the initial condition if the backward recursion goes long enough. Fig. 2.7 shows the process of the sliding window algorithm and will be further illustrated as follows. First, the received codeword sequence is divided into several sub-blocks of length of W . And W is called the convergence length, which normally is set to be five times the constraint length of component encoder in turbo code to ensure the reliable initialization. In the sliding window approach, the end of sub-block
i i+1 i+2 i+3 L(ut) L(ut) L(ut) t1 t2 t3 t4 W
α
β
1 2β
1β
1β
2β
1β
2β
α
2β
α
α
Figure 2.7: The process diagram of sliding window algorithm.
is the initial of next sub-block whether the forward or backward recursive operation. Thus, the initial metric values are inherited from the last metrics calculated in the previous sub-block. Note that the dummy backward recursion β1 is employed to establish the initial condition for the true backward recursion β2. Although the initial condition for the β1 is unknown except the last sub-block, we utilize the equally likely condition for the β1 values at time instance (i + 1)· W :
β1(m) = 1
M, for all m∈ S (2.41)
where S represents all possible states and M is equal to the total state number. During the forward recursion α proceeds in the i-th sub-block and stores these values into memory, the dummy backward recursion β1 is performed in the i + 1 sub-block concurrently. As soon as the β1 computation is finished, the initial metrics in the i-th sub-block are available for the β2 recursion. And L(ˆut) can be calculated based on the α metrics in the memory, the β2 metrics in computation, and the corresponding branches metrics in the i-th sub-block.
For example, we concatenate two truncated component codes defined by
G (D) = ⎛ ⎝1 0 1+D+D1 2 0 1 1+D+D1+D22 ⎞ ⎠
and using a block interleaver with length= 400 to abtain a (800, 400) code of rate= 1/2. Compared to different sliding window, the bit error rate(BER) after six iterations is shown in Fig.2.8( SW25 represents window length=25 trellis stage and SW200 represents no sliding window). The reciprocal dual trellis decoding with sliding window is
signifi-cantly less complex than its optimum counterpart in terms of memory cost, however, it achieves for both codes virtually the same bit error performance.
0 0.5 1 1.5 2 2.5 3 10−6 10−5 10−4 10−3 10−2 10−1 100 Eb/No(db) BER SW25 SW200
Chapter 3
Reciprocal Dual Trellis Algorithm
3G mobile multimedia communication systems based on WCDMA support a flexible transmission capability to provide packet data service as well as voice service. Mobile mul-timedia communication requires a coding scheme that can accommodate various code rate requirements. In this thesis, we apply puncture skill to get various code rates and adopt WCDMA turbo code as the mother code, and using reciprocal dual trellis as decoding trellis.
3.1
Log domain approach
For high code rate, MAP algorithm working on the reciprocal dual trellis is preferable. However, in (2.39) reqiures both adders and multipliers, it is considered to be much hard-ware complexity. Hence, we further reduce the hardward cost by taking LOG operation of all metrics which is similar to Log-MAP method. However, the challenge involved in log domain implementation of this algorithm when bit metric value tanh (L (cj; yj) /2) is
negative. To present these metrics, some numerical transformation is applied.
3.1.1
Sign magnitude scheme
In [10], the sign magnitude is a simple representation to represent the reciprocal dual trellis metrics. In this, a real number x is represented as x = XSe−XM ≡ [XS, XM], where XS = sign (x) and XM = −log (|x|). The arithmetic operations with this representation
Negation :
−x ≡ [−XS, XM] (3.1)
Addition :
x + y ≡ min∗(x, y) = [S(min(XM, YM)), min(XM, YM)− log(1 + XSYSe−|XM−YM|)]
where S(XM) = XS or S(YM) = YS (3.2) Multiplication : x× y ≡ Sum∗(x, y) = [XSYS, XM + YM] (3.3) Division : x/y ≡ [XSYS, XM − YM] (3.4)
In the normal log-MAP, additionis implemented by the equivalent E operation. For the addition of two non-negative real numbers a and b, represented in log domain a A and B, we have
a + b≡ AEB = min(A, B) − log(1 + e−|A−B|) (3.5)
the second term− log(1+e−|d|), where d is the difference between the log domain values, is called the correction term. Another log domain operator that gives the absolute difference between two non-negative real numbers as :
|a − b| ≡ A¯EB = min(A, B) − log(1 − e−|A−B|) (3.6)
this is also called the correction term. In sign-magnitude representation, the correction terms perform the algebraic addition of two real numbers, could involve either an E or ¯E
operation depending on the relative signs of the two values. The correction term in the
E operation which only takes values in the range [− log(2), 0], however, the ¯E operation
− log(1 − e−|d|) shown in Fig. 3.1, can have any value in the range [∞, 0]. This makes the
fixed point hardware implementation more complex since the representation range should be chosen carefully, and warrants a large look up table (LUT) for the correction terms.
3.1.2
Proposed equation for calculating extrinsic value
In this section, we will discuss the computation of extrinsic value according to log domain scheme. According to (2.31), let us define the bit metric gj
0 1 2 3 4 5 6 0 1 2 3 4 5 6 |d| correction term −ln(1−exp(−|d|))
Figure 3.1: Correction term of ¯E operation.
and summation metric Ulb.
Ulb = ˜c⊥ i ∈ ˜C⊥;˜c⊥il=b N−1 j=0 g˜cj⊥ij (3.7)
• The forward recursion metric α :
αt(st) = min∗(Sum∗S(αt−1(st−1), γt(st−1, st))) • The backward recursion metric β :
βt−1(st−1) = min∗(Sum∗S(γt(st−1, st)), βt(st))) • The branch metric γ :
γt(st−1, st) = Sum∗(g
bj(st−1,st)
j , ..., g
bj+n−1(st−1,st)
j+n−1 )
where t = [j/n] is the time index ([x] denotes the integer part of x), b is the transition bit from st−1 to st at decoding index l, and S is the set of possible transitions from state
it is : 1 gl0 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=0 N−1 j=0 gj˜c⊥ij + 1 gl1 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=1 (−1)1 N−1 j=0 gj˜c⊥ij = Ul0 − (Ul1/gl)
The numerator of (2.31) can be written as : 1 gl0 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=0 N−1 j=0 g˜cj⊥ij+ 1 gl1 · ˜c⊥ i ∈ ˜C⊥;˜c⊥il=1 N−1 j=0 g˜cj⊥ij = Ul0+ (Ul1/gl)
Therefore, (2.31) can be simplified to :
L( ul) = log 1 + Ul1 Ul0gl 1− Ul1 Ul0gl (3.8) Equation (3.8) is also described in [10]. In fact, Ul1
Ul0gl can be represented as [U l
S, UMl ] due
to sign magnitude representation. Hence, (3.8) can be
L( ul) = log 1 + USle−UMl 1− Ul Se−U l M (3.9) Using the relation tanh(x/2) = (1− e−x)/(1 + e−x), hence, (2.31) can also be written as
L( ul) = ⎧ ⎨ ⎩ − log(|tanh(Ul M/2)|), if U l S = 1 log(|tanh(Ul M/2)|), if USl =−1 (3.10) (3.10) is the proposed equation for calculating extrinsic values. Hence, the extrinsic equation (3.10) is much suitable to hardware design. According to (2.40), the LLR of a posteriori probability L(ul) is written as
L(ul) = log pl(0) pl(1)
+ C× rs+ L( ul), where C is a constant and rs is a recieved symbol.
(3.11) Hence, the rule of decision is made by:
decision = ⎧ ⎨ ⎩ 1, if L(ul) < 0 0, if L(ul)≥ 0 (3.12) For convenient, we call the sign magnitude algorithm to SM and reciprocal dual trellis algorithm to Dual-MAP in the following sections.
3.2
Punctured convolutional codes
In this section, we want to generate high rate encoder polynomials. A general way to generate high rate convolutional code is to use a low rate (1/2) encoder and delete some of its parity check bits, and this is called puncture method. Some codeword bits might be punctured according to a deterministic puncture pattern or puncture matrix. The bit error rate may be a little different because of using different puncture pat-terns. The puncture procedure is shown as Fig. 3.2, assume that a source sequence (X0, X1, ..., X8, ..., XK−1) is sent to an encoder of code rate (1/2), and is encoeded to
codeword sequence (X0, B0, X1, B1, ..., X8, B8, ..., XK−1, BK−1). Before modulation, the
codeword is stolen some bits according to a puncture pattern. In Fig. 3.2, the puncture pattern is defined P = ⎛ ⎝1 1 1 1 0 0 0 1 ⎞ ⎠
the column number of this matrix represents the puncture period, and element 1 means
X0 X1 X2 X3 X4 X5 X6 X7 X8 X0 X1 X2 X3 X4 X5 X6 X7 X8 B0 B1 B2 B3 B4 B5 B6 B7 B8 X0 X1 X2 X3 B3 X4 X5 X6 X7 B7 X8 Source Data Encoded Date Punctured Codeword Sent Codeword B0 means punctured X0 X1 X2 X3 X4 X5 X6 X7 X8 B0 B1 B2 B3 B4 B5 B6 B7 B8
Figure 3.2: Puncture procedure.
that the corresponding codeword bit must be sent whereas element 0 means that corre-sponding the bit is punctured or stolen. Finally, this puncture procedure increases the code rate from (1/2) to (4/5) and produces anther codeword sequence (X0, X1, X2, X3, B3, ...X8, ..., XK−3, XK−2, XK−1, BK−1)
We take an example, assume an original systematic encoder of code rate 1/2 is :
Gsys(D) =
1 1+D+D1+D22
and using the puncture pattern P = ⎛ ⎝1 1 1 0 ⎞ ⎠
an equivalent punctured encoder with rate = 2/3 is defined by :
G (D) = ⎛ ⎝1 + D 1 + D 1 D 0 1 + D ⎞ ⎠
Hence, the reciprocal dual code can be generated by the polynomial generator matrix: ˜
H (D) =
1 + D2 1 + D + D2 1 + D
The relationship of trellis (Gsys(D), G (D), ˜H (D)) is shown in Fig. 3.3. G (D)
trellis is a readix-4 for turbo decoders, however the reciprocal dual code ˜H (D) is a
radix-2 architecture. In Fig. 3.4 shows the performance of decoding algorithm using ˜H (D)
trellis (Dual-MAP) and G (D) (MAP) trellis. The interleaver length is 400 which results a (600, 400) punctured block code. The result does very make sense due to both decoding algorithm reach the same performance. However, using original puncture trellis is not efficient compared to reciprocal dual trellis.
3.2.1
Apply rate compatible punctured turbo code to WCDMA
RCPC(Rate Compatible Punctured Convolutional) codes are one practical solution to adaptive coding mentioned in [11]. RCPC codes use a single rate-(1/n) convolutional encoder/decoder pair and only to share a puncturing table. RCPT(Rate Compatible Punctured Turbo) code can be used in a similar way a RCPC codes. This section will focus on applying RCPC to turbo code in WCDMA. The used puncture tables are shown
Table 3.1: Puncture tables Table P T 1 P T 2 P T 3 P T 4
Systematic 1 11 1111 11111111 Parity 1 1 01 0001 00000001 Parity 2 1 01 0001 00000001 Code rate 1/3 1/2 2/3 4/5
Nonpuncture trellis
Puncture trellis Reciprocal dual trellis Reciprocal dual trellis Puncture 00 01 10 11 Encoder state 11 1-| 1-| 0-00 11 11 110 Code rate=1/2
Code rate=2/3 Dual code rate=1/3
Figure 3.3: Trellis transformation relationship
bits are reserved for all code rates, however, the parity bits are reserved for only some period. The first row is systematic information, the second row is first parity bit, and the third row is the second parity bit. The characteristic of the tables is that codeword of lower code rate must contains codeword of higher rate.
In 3GPP(WCDMA) standard, the scheme of turbo encoder is a Parallel Connected Convolutional Code (PCCC) with 8-state constituent encoders and one interleaver. The nonpuncture code rate of turbo encoder is 1/3. The structure of turbo encoder is shown in Fig. 3.5. The transfer function of the 8-state constituent code for the PCCC is
G (D) =
1 1+D1+D+D2+D33
The bit sequence input for a given code block to channel coding by c0,c1,c2,c3,...,cK−1,
where K is the number of bits to encode. After encoding, the bits are denoted by d(i)0 ,
d(i)1 , d(i)2 , d(i)3 ,...,d(i)D−1, where D is the number of encoded bits per output stream and i indexes the encoder output stream.
0 0.5 1 1.5 2 2.5 3 3.5 10−6 10−5 10−4 10−3 10−2 10−1 100 Eb/No(db) BER MAP Dual−MAP
Figure 3.4: Performance of using Dual-MAP and MAP under punctured code The initial value of the shift registers of the 8-state constituent encoders shall be all zero when starting to encode the input bits. The output from the turbo encoder is
d(0)k = xk,d(1)k = zk,d(2)k = zk for k = 0, 1, 2, ..., K− 1. The bits input to the turbo encoder
are denoted by c0,c1,c2,c3,...,cK−1, and the bits output from the first and second encoder
are denoted by z0,z1,z2,z3,...,zK−1 andz0,z1,z2,z3,...,zK −1
3.2.2
Decoding with SM and Dual-MAP algorithm
When puncture table P T 1 is applied, that will produce a nonpunctured generator
G1(D) which is the same as G (D). We define G2(D), G3(D) ,and G4(D) are punctured generator when applying P T 2, P T 3, and P T 4, respectively. The systematic punctured generator polynomials are :
G1(D) = 1 1+D1+D+D2+D33 G2(D) = ⎛ ⎝1 0 1+D+D 2 1+D2+D3 0 1 1+D+D1+D2+D2+D3 3 ⎞ ⎠
D D D U Xk Zk D D D Z'k QPP Interleaver X'k (Information sequence )
Figure 3.5: Turbo encoder for WCDMA.
G3(D) = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 0 0 0 1+D1+D2+D2 3 0 1 0 0 1+D1+D2+D3 0 0 1 0 1+D12+D3 0 0 0 1 1+D1+D2+D3 3 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ G4(D) = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 0 0 0 0 0 0 0 1+D+DD+D23 0 1 0 0 0 0 0 0 1+DD+D2+D2 3 0 0 1 0 0 0 0 0 DD2+D+D23 0 0 0 1 0 0 0 0 D+DD 2 0 0 0 0 1 0 0 0 DD+D22 0 0 0 0 0 1 0 0 DD+D32 0 0 0 0 0 0 1 0 DD+D+D32 0 0 0 0 0 0 0 1 D+DD+D2+D2 3 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
And their corresponding reciprocal dual trellis are generated respectively by ˜H1(D), ˜ H2(D), ˜H3(D), and ˜H4(D) : ˜ H1(D) = 1 + D + D3 1 + D2+ D3 ˜ H2(D) = D + D2+ D3 1 + D + D2+ D3 1 + D + D3s
˜ H3(D) = D + D3 D2+ D3 D3 1 + D3 1 + D + D3 ˜ H4(D) = D + D2 D3 D2 D D + D3 D + D2+ D3 D2+ D3 1 + D2+ D3 1 + D + D3 Notice that the row number of these matrix is only one even if their original code rate
are higher than 1/2. Here, we use these reciprocal dual trellis as decoding trellis and the BER performance are shown in Fig. 3.6
0.5 1 1.5 2 2.5 3 3.5 4 10-6 10-5 10-4 10-3 10-2 10-1 Eb/No(db) BER
1/3
1/2
2/3
4/5
Figure 3.6: Performance of different code rate with block length 2048.
3.3
Performance analysis
In this section, we will present the simulation results and some parameter setting for hardware implementation. All the simulation results are signal-to-noise(SNR) versus BER under BPSK modulation and AWGN channel. In Fig 3.7, there is about 0.05dB loss between the sliding window size of 32 and 64 at the BER= 10−5 under the fixed 6
iterations. At code rates 1/3, 1/2, 2/3, and 4/5, the block length of each is 2048, and the performance of different iterations are presented in Fig 3.8. The BER performance of each code rate is almost saturate at 6 iterations. Thererfore, we choose iteration number 6 to our design. 0 0.5 1 1.5 10-6 10-5 10-4 10-3 10-2 10-1 100 Eb/No(db) BER
window size=16
window size=32
window size=64
Figure 3.7: Comparison of different sliding window size.
The fixed point representation of the internal variable in the SISO decoder is determined from the received symbol quantization. Fig. 3.9 shows the simulation result with different input symbol quantization under block length 2048, code rate 1/2, window size 32, and 6 iterations. Note that a, b in the figure denotes the quantization scheme where a is the integer part, and b is the fractional part. We can observe that the performance loss of (3.3) is quite small compared with (3.4) format. And we decide that the quantized format 6 bits (3.3) is suitable scheme for our design. In addition, the width of extrinsic information, branch metric, and path metric can be derived and we summarize the fixed representations in Table 3.2.
Figure 3.8: Comparison of different iteration.
Table 3.2: Summary of fixed representation in MAP decoder quantities Input Extrinsic Branch State
symbols information metrics metrics width 6 10 12 12
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 10-6 10-5 10-4 10-3 10-2 10-1 100 Eb/No(db) BER
Fix3.3
Fix3.4
Floating
Figure 3.9: Fixed point comparison.
Finally, we talk about the look up table approximation shown in Fig. 3.1. Since the correction term log(1− e−|d|) can has any value in the range [∞, 0] if d goes from 0 to ∞, this makes correction term changes severely when d closes to zero. Therefore, segment method of d is a very significant issue. Fig. 3.11 shows performance of two ways to segment the range of d. The simulations are under interleaver length 2048, code rate = 4/5, and sliding window size = 32 trellis stages. In Fig. 3.10, uniform segment is to divide the range to some equal intervals, and nonuniform segment is to divide the range according to the slope of correction term function. Here, according the performance of bit error rate, we shall choose nonuniform segment method to divide the range of d.
The design parameters and methodology are applied to our decoder. The performances of each code rate are shown in Fig. 3.12, and floating point simulations also are list. The fixed point performance loss is 0.3 to 0.5dB compared with floating point at BER = 10−5.
Uniformly segment Non-uniformly segment
Figure 3.10: Segment method.
0.5 1 1.5 2 2.5 3 3.5 4 10−6 10−5 10−4 10−3 10−2 10−1 Eb/No(db) BER uniform nonuniform floating
Chapter 4
Dual-MAP Turbo Decoder
Architecture
According to equation (2.39), there are 1, 2, 4, and 8 extrinsic values can be calculated during a trellis stage when we use P T 1, P T 2, P T 3, and P T 4 puncture tables respectively. In this chapter, we will disscus the architecture of reciprocal dual trellis turbo decoder.
4.1
Architecture overview
The architecture of proposed turbo decoder using reciprocal dual trellis is shown in Fig. 4.1. The SISO decoder performs Dual-MAP algorithm and outputs extrinsic values. The Input Buffers are used to store the received values from channel. The extrinsic memory stores the extrinsic values from SISO. Under each iteration, the SISO computes the extrinsic values which is a priori value estimation for the next iteration. There are two stages in a iteration. In normal stage, data is read from Input Buffer and extrinsic memory in normal order, and the extrinsic values are witten into the extrinsic memory in interleaved order. In interleaver stage, data is read from Input Buffer and extrinsic memory in interleaved order, and the extrinsic values are witten into the extrinsic memory in normal order. However, by using the reciprocal dual trellis, we can just apply radix-2 branch calculational circuit to achieve the turbo decoder design.
We apply sliding window approach mentioned in [9] to the SISO decoder. In Fig. 4.1, Window BUFs store the input soft values for evaluating α and β. The SMM is the
Output Buffer Input Buffer Decision Terminate parity1 parity 2 Decoder Systematic bit Ext
Window Buffer Window Buffer SMM SMM Buffer Extrinsic sets SMM d
β
SMA-SMA-α
β
α
Input and intrinsic values
Dual_MAP
SISO
Decision bit Extrinsic Memory Interleaver Normal order Normal order Interleaver Rd Addr Rd Addr Wr Addr ExtrinsicFigure 4.1: Iterative decoding of turbo decoder.
abbreviation of sign magnitude multiplication and calculates the branch metrics. Each SMA is the abbreviation of sign magnitude addition and calculates the path metrics for each recursion. To avoid waiting for the whole codeword for β evaluation, we use SMA-βd
from the end of the next sliding window to evaluate the initial values of β. And the
α-Buffer performs the Last-In/First-Out (LIFO) for the reversing output of α.
Fig. 4.2 is the decoding schedule of the SISO decoder. At the first time interval T0, the input values are written into a Window BUF1 in reversing order. At the second time interval T1, the second input data(W1) are written into the other Window BUF2 also in reversing order and SMA-βdcalculates βdto evaluate the initial conditions of β.
Simulta-neously, the SMA-α utilizes the data in W0 to compute α reversely and saves the results in the α-Buffer. Finally, the first window data will be read at T2 interval for β calcula-tion and the third window data(W2) will be written into Window BUF1 simultaneously. When the SMA-β unit starts to calculate, the extrinsic values can also be calculated by Extrinsic sets. As a result, the latency of the SISO decoder is about two sliding window size.