T HESIS O RGANIZATION - Gbps高速渦輪碼之設計與實現

CHAPTER 1 INTRODUCTION

1.2 T HESIS O RGANIZATION

This thesis consists of 7 chapters. In chapter 2, we’ll focus on interpreting turbo coding and decoding algorithm and its relative techniques. Chapter 3 presents a total solution of a high speed turbo decoder with a parallel architecture, including the design of a contention-free interleaver, a high radix turbo decoder, and some techniques applied on our design.

Chapter 4 explains how we improve the utilization of the previous chip. A Modified interleaver control for multiple block lengths support will be introduced. In chapter 5, we present the two architectures. A two-dimension parallel architecture will be proposed.

Meanwhile, a simplified intra-codeword parallel architecture and the relative issues will be discussed. Finally, conclusion and future work are made in chapter 6.

Chapter 2 Turbo Code

The parallel concatenated convolutional code (PCCC), named turbo code, was first proposed by C. Berrou, A. Glavieux, and P. Thitimajshima in 1993[1]. It has been proved to have a performance close to Shannon limit with simple constituent codes concatenated by an interleaver. This new technique is now adopted in 3GPP, 3GPP2 and WiMAX standards due to its excellent error correction ability. In this chapter, we’ll describe the principle of both turbo encoding and turbo decoding methods. The sliding-window approach and the tail-biting coding structure will also be interpreted here.

2.1 Principle of Turbo code

2.1.1 Turbo Encoding

The turbo encoder is composed of two recursive systematic convolutional (RSC) encoders, which are connected in parallel but separated by a turbo interleaver. The two RSC encoders are also called constituent codes of the turbo code. The block diagram of the turbo encoder is illustrated in Fig. 2.1. Note that the same input data are encoded by each RSC encoder but in different order. In 3GPP2 standard, each input bit is encoded as one systematic bit and two parity-check bits for each RSC encoder. Thus, the code rate of each component encoder is 1/3. In order to increase the code rate of turbo code, the systematic bits of the second RSC encoder are not transmitted. Therefore, the output encoded sequence should be {X, Y0, Y1, Y0’, Y1’}, and the overall code rate is 1/5.

Fig. 2.1 Turbo encoder for 3GPP2 standard

After encoding all input messages, we have to generate several tail bits to set both component encoders back to zero state. However, it’s impossible for a RSC encoder to return zero state by inserting dummy zeros into the encoder directly. Thus, a simple solution is provided in Fig. 2.2. While encoding input messages, the switch is set to position “A”. Once messages of whole block are encoded, the position of switch is changed to “B” for three additional cycles. This will force all registers to zeros and thus back to zero state.

Systematic bit Parity-check bit

Input message

Fig. 2.2 Trellis Termination

2.1.2 Turbo Interleaver

The interleaver plays a very important role in turbo encoder. First of all, a proper coding gain can be achieved with small memory RSC encoders since the interleaver scramble a long block message. Besides, the interleaver de-correlates the input of two RSC encoders so that iterative decoding algorithm can be applied between two component decoders. Theoretically, the block size of interleaver is one of the major factors to lower the upper bound on bit error probability of the turbo code system. The performance upper-bound of turbo code corresponding to a uniform random interleaver has been evaluated in [9]. The result shows that the bit-error-probability upper bound of turbo code is approximately proportional to 1/N, where N is the block size of turbo interleaver. The factor “1/N” is also called the interleaver gain.

2.1.3 Turbo Decoding

A general idea for iterative turbo decoding is illustrated in Fig. 2.3, where rs is the received systematic information, rp1 is the received parity information generated by the first RSC encoder, and rp2 is the received parity information generated by the second RSC encoder.

The iterative turbo decoding consists of two constituent decoders, which are soft-in/soft-out (SISO) decoders concatenated serially via one interleaver and one de-interleaver. An additional interleaver is used to interleave the input systematic information and then provides the interleaved data to the second SISO decoder. Two component decoders can be implemented based on either soft-output Viterbi algorithm (SOVA) [21] or maximum a posteriori probability (MAP) algorithm [2], which will be discussed particularly in the next section. During iterative decoding process, each constituent decoder delivers the extrinsic information Lex(u) which is taken as a priori information for the other constituent decoder.

That is L_in₁( )u_k =L_ex₂( )u_k and L_in₂( )u_k =L_ex₁( )u_k . As the number of iterations increases,

better coding gain is expected. However, the correlation between two SISO decoders is also raised up. Therefore, there is no significant performance improvement if the number of iterations reaches a threshold.

Fig. 2.3 Turbo decoding flowchart

2.1.4 Error floor effect

Although turbo coding provides an excellent performance, the bit-error-rate certainly starts to decrease quite slowly at high signal-to-noise ratio (SNR). This phenomenon can be observed in [19]. It is due to relative small free distance of turbo codes, and is called an “error floor” [22]. Consider the relation of the minimum free distance and the bit error probability in turbo coding, which can be expressed by

0 errors can be corrected by iterative decoding since systematic information and parity information can be regarded as highly independent events. However, as the channel provides a reliable transmission, the dependency of the systematic and parity information grows up and the interleaver does little contribution on iterative decoding. Thus, the error correction ability

is limited on the weak constituent code only. To overcome this issue, we can increase the interleaver size to lower the position of the error floor or concatenate a block code, e.g. BCH code, as an outer code to remove the left error bits. For more details, please refer to [9] [23].

2.2 Decoding Algorithms for Turbo Code

It has been proved that the MAP algorithm is the optimal decoding method for turbo code while comparing with SOVA [10]. Unlike Viterbi algorithm which utilizes maximum likelihood (ML) algorithm to find the codewords with minimum error probability, the MAP algorithm minimizes the symbol (or bit) error probability. In this section, we’ll focus on introducing the turbo decoding methods based on MAP algorithm [2][3]. Although SOVA is also one of the commonly used techniques for turbo decoding, we’ll skip it since it’s not adopted in our proposed design. To understand more detail about SOVA, please refer to [21].

And some comparisons of MAP algorithm and SOVA applied in turbo code system are shown in [10].

2.2.1 The MAP algorithm

The main idea of MAP algorithm is to compute the log-likelihood ratio (LLR) of the transmitted information bit u_k conditioned on the received information r_k for 1≦k≦N, where N is the block length of encoded message.

( 1| the number of output bits for each encoded bit in the constituent code. Let’s consider the trellis diagram of turbo code in 3GPP2 standard, which is shown in Fig. 2.4 as an example.

Note that the solid lines represent the transitions corresponding to an information bit uk of -1, while the dotted lines represent the transitions corresponding to an information bit uk of +1.

Then, the equation can be further expressed as

where the numerator and denominator are the sum of joint probabilities for all existing transitions from state sk-1 to state sk that corresponding to an information bit uk of +1 and -1 respectively.

Fig. 2.4 Trellis diagram of turbo code in 3GPP2 standard

Assume the encoded data is transmitted through the discrete memoryless channel (DMC), and then the term P(sk-1,s_k,r) can be decomposed as three terms:

1 1 1

of the block up to time index “k-1”. Similarly, e^β^k^{( )}^s^k is that of state sk and received symbols r_j from the end of block back to time index “k”. By shifting the value “k”, it can be perceived that α is the forward recursion of the MAP algorithm, and can be formulated as

1 1 The same as above, the backward recursion β can be formulated as

1( 1) ( 1, ) ( ) Note that since the trellis of turbo code diverges from state zero and converges to state zero, the initial condition of the forward recursion and backward recursion should be set as

0 0

Here, the term “P(uk)” is well-known as a priori probability. According to the definition of LLR, which is

( )

where the term Ak is equal for all transitions at the same time index, and thus will cancel out in (2. 3). On the other hand, the value of P(rk|u_k) is dependent on channel characteristic. For an additive white Gaussian noise (AWGN) channel, the LLR of rk conditioned on uk can be expressed as

where Lc=4Es/N0 and is called the channel reliability. Here, xk,v is the v-th transmitted symbol while encoding uk. For systematic codes, xk,1 is equal to uk. Now we can obtain the value of

1 1 1

nt decoder, and great performance improvement in iterative AP decoding can be achieved.

2.2.

his problem can be solved by Log-MAP algorithm [24]. It employs the Jacobian algorithm

u =−

The term Lex(uk) is called extrinsic information since it’s a function of the redundant information that comes from the en . It removes the information about the systematic input and a priori information fromL u( )ˆ_k . Therefore, this term is useful to estimate a priori probability for the next compone

2 The Log-MAP algorithm

It can be figured out easily that Max-Log-MAP algorithm is a sub-optimal solution for turbo decoding since an approximation of (2. 21) is used to reduce the complexity of MAP algorithm. T been proved that (2. 21) can be computed exactly by a recursive operation of (2. 25) [10].

c(|δ1-δ2|) is a correction function, and thus the performance can be improved. It

Substituting (2. 18) and (2. 19) into (2. 25), the forward and backward recursions can be represented as

where the max*(.) operation is defined as

1 2

The performance of Log-MAP algorithm is identical to that of MAP algorithm. However, the complexity is also increased compared with Max-Log-MAP algorithm since computing f (.) still involves complicated exponentiations and multiplications. Thus, the values of f (.) are usually stored in a pre-computed table and Log-MAP algorithm can be implemented by table look-up. It has been found that excellent performance can be obtained with 8 stored values and |δ -δ | ranging between 0 and 5, and no improvement is achieved with a finer representation [10].

2.2.3 The Max-Log-MAP algorithm

As we can see, the MAP algorithm involves too many exponentiations and

multiplications. These are quite complex for hardware realization. Thus, an approximation of MAP algorithm termed Max-Log-MAP algorithm [24] was derived for simple implementation of MAP decoders. Instead of calculating e^γ^k, e^α^k, and e^β^k directly, all computations are done in logarithm domain. Here we define γk, αk, and βk as transition metric, forward path metric and backward path metric respectively. γ_k can be formulated as

1 1

respectively. After substituting (2. 17), (2. 18), and (2. 19), in (2. 15) can be re-written as

By utilizing the approximation of

1 2

log(e^δ +e^δ + +e^δⁿ) max( , , , )≈ δ δ δ_n , (2. 27) can be further simplified to

( )ˆ_k

T and backward recursions that repetitively compute the

αk and βk, and can be expressed by

and

Both equations are add-compare-select (ACS) operations, which are similar to the path metric pdating of Viterbi algorithm.

2.2.

and Log-MAP algorithm under different SNR estimation fsets was made in [26].

Otherwise, Log-MAP decoder should be the aspect of coding gain.

4 SNR sensitivity of Max-Log-MAP and Log-MAP algorithm

Referring to (2.13) and its followed deductions, it’s evident that both MAP and log-MAP algorithm requires SNR estimation to obtain the value of channel reliability, i.e. Lc. Unfortunately, accurate estimation cannot be achieved easily. Several papers have discussed the effect of SNR mismatch in turbo decoding. In [25], the simulations show that about -3 to +6 dB SNR estimation offset is tolerable before significant performance degradation.

However, Max-Log-MAP algorithm is able to provide a SNR independent scheme if a priori information is initialized with a reasonable value, such as all zero’s for each state [26]. Due to the linearity of max(.) operations, the term L_c can be canceled out while computing L u( )ˆ_k . The comparison of Max-Log-MAP

Although Log-MAP algorithm provides the performance better than that of Max-Log-MAP algorithm, it suffers the risk of serious SNR mismatch offset. Thus, channel characteristics play an important role in practical implementation. It has been concluded in [26] that if channel characteristics change over time, the Max-Log-MAP decoder is suitable to be the constituent decoder in turbo decoding.

preferable in

2.3 Sliding Window Approach

As what we described in the previous section, the MAP-based algorithm (including MAP algorithm, Max-Log-MAP algorithm, and Log-MAP algorithm) requires both forward and backward path metric to calculate the log-likelihood ratio. Since the forward and backward recursions start from different initial point, the entire block message has to be received and stored for computing forward and backward recursions. Furthermore, we have to store one of the path metrics of forward or backward recursion and wait for another. These restrictions enlarge the memory requirement for hardware implementation of turbo decoder. For example, the maximum block length of 3GPP standard is 5114, which means 5114 codewords and path metrics should be stored. Besides, long output la

state if the backward recursion goes long enough. Fig. 2.5 and Fig. 2.6 shows the process of this approach in both directions and the detail operating flow is described as follows.

tency is also introduced. It limits the speed and throughput of turbo decoder design.

The main problem is that long block length can not be divided into several shot sub-blocks immediately, since the lack of boundary path metric of sub-blocks in opposite direction of input sequences will degrade the performance. Thus, a sliding window approach was proposed in [27] and later on in [28] to overcome this drawback. This approach utilizes the fact that the backward path metrics can be highly reliable even without knowing the initial

i i+1 i+2 i+3

Fig. 2.5 The process diagram of sliding window approach in the forward direction

path metric values for the true backward recursion

First, the received codeword is divided into many sub-blocks, with a sub-block length of W. W is called the convergence length with typically five times the constraint length of the encoder. For each sub-block i, the initial path metric values are inherited from the neighbor sub-blocks for both forward and backward recursion operations. Note that in Fig. 2.5 the dummy backward recursion β1 is employed to obtain the initial

β2. Although the initial condition for β1 is unknown except the last sub-block, we introduce the equal probability condition for β1 values:

( ) x

_t^j

1 , for all j 0,1,..., M

β = M =

(2. 31)

where

x

_t^j denotes the path metric of j-th state at time t, the last Trellis section of β1 , and M is equal to the total state number. During the forward recursion α proceeds in the i-th sub-block and stores these values into memory, the dummy backward recursion β1 is performed in the i+1 sub-block concurrently. As soon as β1 computation is finished, the initial metrics in the i+1 sub-block are available for β2 metrics in computation, and the corresponding branches metrics in the i-th sub-block.

Fig. 2.6 shows the process diagram of sliding window approach in the backward direction. The operation flow is similar to the forward direction type except for two forward recursions α and one backward recursion β.

β

ength code blocks of CCs. The standard solution is to add same bits at the tail of in

Fig. 2.6 The process diagram of sliding window approach in the backward direction

2.4 Tail-Biting Approach

Tail-biting convolutional codes are first developed by G. Solomon and H. C. A. van Tilborg[5] and recognized as equivalent to quasi-cyclic block codes.[6] From the strict definition of convolutional codes (CCs) it is clear that CCs can only be applied to semi-infinite sequences, i.e., encoding starts at time t = 0 in the all-zero state and goes on continuously. But almost any communication system is block-oriented, we must find methods to obtain finite l

formation sequences to force the encoder back to the all-zero state. This method can avoid the weak error protection for the last codeword bits, however it causes same rate loss due to tail bits.

Tail-biting avoids the rate loss without suffering from degraded error protection at the end of the codeword. With tail biting technique, the starting state of encoder is not necessarily

the all-zero state. It can also be any one of the other states. The fundamental idea behind state after encoding the infor

tail-biting is that the starting state should be the same as the ending

mation sequence, i.e., x₀ =x_N. In the Trellis representation of tail-biting codes only those paths that start and end at the state are valid codewords.

2.4.1 Encoding tail-biting codes using feedback encoders

Let us consider a feedforward encoder first. It is obvious that we only have to consider the last m input k0-tuples of information sequences to fulfill the tail-biting boundary conditionx₀ =x_N. But the situation is more complicated for feedback encoders. The last

encoding statex_N depends on the entire information vector u=( , ,u₀ … u_N₋₁). Thus, we must

calculate for a given information vector u the initial statex₀ that will lead to the same state after N cycle. To solve this problem, we consider the state representation:

t t

x

₊

= A x + B u

_t (2. 32) To solve the iterated function by substitution, we can find that the complete solution of (2.32) equals to the superposition of the zero-input solution and the zero-state solution .

If we demand that the state as time t=N is equal to itial statex₀, we obtain from

[ ]zs N

(2.33):

(

)

x = A + I x

(2. 34) Where I_mdenotes the m-by-m identity matrix. If a feedback encoder with certain information length N can provide an invertible matrix(A^N +I_m), the correct initial state x₀ can be calculated by knowing the zero-state responsex^{[ ]}_N^zs .

The encoding process of tail-biting convolutional code shown in Fig. 2.9 is divided into

two steps:

First, the encoder starts from the all-zero state with given information sequences to determine the zero-state response . By knowing the zero-state response, we can calculate the corresponding initial state

[ ]zs

x0 by (2.34). Second, the encoder starts from the correct initial statex₀ and a valid codeword results.

Fig. 2.7 The encoder process of tail-biting convolutional code

Since the matrix has to be invertible, not every code length is legal with a given feedback encoder. Moreover, some feedback encoder can not be tail-biting. Some detail discussion can be found in [7], [8], and[9].

(A^N +I_m)

Chapter 3 The High Speed Turbo Decoder Design I

3.1 Introduction

Presented by Berrou et al. in 1993 [1], turbo codes have been recognized as a milestone in the channel coding theory. Due to their outstanding error-correcting capabilities, turbo codes have been highly appreciated in wireless communications, where signal-to-noise ratios (SNRs) are generally low. Two commonly used soft-input–soft-output (SISO) turbo decoding algorithms are maximum a posteriori probability (MAP) algorithm [2] and soft-output Viterbi algorithm (SOVA) [4]. MAP-based turbo decoders are known to have better performance than SOVA-based turbo decoders while having slightly larger complexity.

Many researches are proposed to improve the speed of turbo decoder. Bickerstaff proposed a high radix decoder [11]; Bougard introduced a full-duplex design [12]; Urard implemented a 5 iterations series turbo decoder [16]. Their works increase the throughput by refining the architectures of the SISO decoders. The highly parallel structure might be a solution to substantial improvement, but there are two difficulties that have to be overcome.

One is the memory contention problem resulted from high-radix and multiple processing elements; the other is the critical path resided in the add-compare-select (ACS) circuit. We proposed a high speed solution that resolves these two problems by using a novel interleaving methods and modifying the MAP decoders. Some interleaving algorithms with contention-free properties have been published [9], and our design adopts the inter-block permutation (IBP) interleaver [13]. Then we exploit a high-radix MAP decoder with shorter

在文檔中 Gbps高速渦輪碼之設計與實現 (頁 15-0)