The MAP algorithm

CHAPTER 2 TURBO CODE

2.2 D ECODING A LGORITHMS FOR T URBO C ODE

2.2.1 The MAP algorithm

The main idea of MAP algorithm is to compute the log-likelihood ratio (LLR) of the transmitted information bit u_k conditioned on the received information r_k for 1≦k≦N, where N is the block length of encoded message.

( 1| the number of output bits for each encoded bit in the constituent code. Let’s consider the trellis diagram of turbo code in 3GPP2 standard, which is shown in Fig. 2.4 as an example.

Note that the solid lines represent the transitions corresponding to an information bit uk of -1, while the dotted lines represent the transitions corresponding to an information bit uk of +1.

Then, the equation can be further expressed as

where the numerator and denominator are the sum of joint probabilities for all existing transitions from state sk-1 to state sk that corresponding to an information bit uk of +1 and -1 respectively.

Fig. 2.4 Trellis diagram of turbo code in 3GPP2 standard

Assume the encoded data is transmitted through the discrete memoryless channel (DMC), and then the term P(sk-1,s_k,r) can be decomposed as three terms:

1 1 1

of the block up to time index “k-1”. Similarly, e^β^k^{( )}^s^k is that of state sk and received symbols r_j from the end of block back to time index “k”. By shifting the value “k”, it can be perceived that α is the forward recursion of the MAP algorithm, and can be formulated as

1 1 The same as above, the backward recursion β can be formulated as

1( 1) ( 1, ) ( ) Note that since the trellis of turbo code diverges from state zero and converges to state zero, the initial condition of the forward recursion and backward recursion should be set as

0 0

Here, the term “P(uk)” is well-known as a priori probability. According to the definition of LLR, which is

( )

where the term Ak is equal for all transitions at the same time index, and thus will cancel out in (2. 3). On the other hand, the value of P(rk|u_k) is dependent on channel characteristic. For an additive white Gaussian noise (AWGN) channel, the LLR of rk conditioned on uk can be expressed as

where Lc=4Es/N0 and is called the channel reliability. Here, xk,v is the v-th transmitted symbol while encoding uk. For systematic codes, xk,1 is equal to uk. Now we can obtain the value of

1 1 1

nt decoder, and great performance improvement in iterative AP decoding can be achieved.

2.2.

his problem can be solved by Log-MAP algorithm [24]. It employs the Jacobian algorithm

u =−

The term Lex(uk) is called extrinsic information since it’s a function of the redundant information that comes from the en . It removes the information about the systematic input and a priori information fromL u( )ˆ_k . Therefore, this term is useful to estimate a priori probability for the next compone

2 The Log-MAP algorithm

It can be figured out easily that Max-Log-MAP algorithm is a sub-optimal solution for turbo decoding since an approximation of (2. 21) is used to reduce the complexity of MAP algorithm. T been proved that (2. 21) can be computed exactly by a recursive operation of (2. 25) [10].

c(|δ1-δ2|) is a correction function, and thus the performance can be improved. It

Substituting (2. 18) and (2. 19) into (2. 25), the forward and backward recursions can be represented as

where the max*(.) operation is defined as

1 2

The performance of Log-MAP algorithm is identical to that of MAP algorithm. However, the complexity is also increased compared with Max-Log-MAP algorithm since computing f (.) still involves complicated exponentiations and multiplications. Thus, the values of f (.) are usually stored in a pre-computed table and Log-MAP algorithm can be implemented by table look-up. It has been found that excellent performance can be obtained with 8 stored values and |δ -δ | ranging between 0 and 5, and no improvement is achieved with a finer representation [10].

2.2.3 The Max-Log-MAP algorithm

As we can see, the MAP algorithm involves too many exponentiations and

multiplications. These are quite complex for hardware realization. Thus, an approximation of MAP algorithm termed Max-Log-MAP algorithm [24] was derived for simple implementation of MAP decoders. Instead of calculating e^γ^k, e^α^k, and e^β^k directly, all computations are done in logarithm domain. Here we define γk, αk, and βk as transition metric, forward path metric and backward path metric respectively. γ_k can be formulated as

1 1

respectively. After substituting (2. 17), (2. 18), and (2. 19), in (2. 15) can be re-written as

By utilizing the approximation of

1 2

log(e^δ +e^δ + +e^δⁿ) max( , , , )≈ δ δ δ_n , (2. 27) can be further simplified to

( )ˆ_k

T and backward recursions that repetitively compute the

αk and βk, and can be expressed by

and

Both equations are add-compare-select (ACS) operations, which are similar to the path metric pdating of Viterbi algorithm.

2.2.

and Log-MAP algorithm under different SNR estimation fsets was made in [26].

Otherwise, Log-MAP decoder should be the aspect of coding gain.

4 SNR sensitivity of Max-Log-MAP and Log-MAP algorithm

Referring to (2.13) and its followed deductions, it’s evident that both MAP and log-MAP algorithm requires SNR estimation to obtain the value of channel reliability, i.e. Lc. Unfortunately, accurate estimation cannot be achieved easily. Several papers have discussed the effect of SNR mismatch in turbo decoding. In [25], the simulations show that about -3 to +6 dB SNR estimation offset is tolerable before significant performance degradation.

However, Max-Log-MAP algorithm is able to provide a SNR independent scheme if a priori information is initialized with a reasonable value, such as all zero’s for each state [26]. Due to the linearity of max(.) operations, the term L_c can be canceled out while computing L u( )ˆ_k . The comparison of Max-Log-MAP

Although Log-MAP algorithm provides the performance better than that of Max-Log-MAP algorithm, it suffers the risk of serious SNR mismatch offset. Thus, channel characteristics play an important role in practical implementation. It has been concluded in [26] that if channel characteristics change over time, the Max-Log-MAP decoder is suitable to be the constituent decoder in turbo decoding.

preferable in

2.3 Sliding Window Approach

As what we described in the previous section, the MAP-based algorithm (including MAP algorithm, Max-Log-MAP algorithm, and Log-MAP algorithm) requires both forward and backward path metric to calculate the log-likelihood ratio. Since the forward and backward recursions start from different initial point, the entire block message has to be received and stored for computing forward and backward recursions. Furthermore, we have to store one of the path metrics of forward or backward recursion and wait for another. These restrictions enlarge the memory requirement for hardware implementation of turbo decoder. For example, the maximum block length of 3GPP standard is 5114, which means 5114 codewords and path metrics should be stored. Besides, long output la

state if the backward recursion goes long enough. Fig. 2.5 and Fig. 2.6 shows the process of this approach in both directions and the detail operating flow is described as follows.

tency is also introduced. It limits the speed and throughput of turbo decoder design.

The main problem is that long block length can not be divided into several shot sub-blocks immediately, since the lack of boundary path metric of sub-blocks in opposite direction of input sequences will degrade the performance. Thus, a sliding window approach was proposed in [27] and later on in [28] to overcome this drawback. This approach utilizes the fact that the backward path metrics can be highly reliable even without knowing the initial

i i+1 i+2 i+3

Fig. 2.5 The process diagram of sliding window approach in the forward direction

path metric values for the true backward recursion

First, the received codeword is divided into many sub-blocks, with a sub-block length of W. W is called the convergence length with typically five times the constraint length of the encoder. For each sub-block i, the initial path metric values are inherited from the neighbor sub-blocks for both forward and backward recursion operations. Note that in Fig. 2.5 the dummy backward recursion β1 is employed to obtain the initial

β2. Although the initial condition for β1 is unknown except the last sub-block, we introduce the equal probability condition for β1 values:

( ) x

_t^j

1 , for all j 0,1,..., M

β = M =

(2. 31)

where

x

_t^j denotes the path metric of j-th state at time t, the last Trellis section of β1 , and M is equal to the total state number. During the forward recursion α proceeds in the i-th sub-block and stores these values into memory, the dummy backward recursion β1 is performed in the i+1 sub-block concurrently. As soon as β1 computation is finished, the initial metrics in the i+1 sub-block are available for β2 metrics in computation, and the corresponding branches metrics in the i-th sub-block.

Fig. 2.6 shows the process diagram of sliding window approach in the backward direction. The operation flow is similar to the forward direction type except for two forward recursions α and one backward recursion β.

β

ength code blocks of CCs. The standard solution is to add same bits at the tail of in

Fig. 2.6 The process diagram of sliding window approach in the backward direction

2.4 Tail-Biting Approach

Tail-biting convolutional codes are first developed by G. Solomon and H. C. A. van Tilborg[5] and recognized as equivalent to quasi-cyclic block codes.[6] From the strict definition of convolutional codes (CCs) it is clear that CCs can only be applied to semi-infinite sequences, i.e., encoding starts at time t = 0 in the all-zero state and goes on continuously. But almost any communication system is block-oriented, we must find methods to obtain finite l

formation sequences to force the encoder back to the all-zero state. This method can avoid the weak error protection for the last codeword bits, however it causes same rate loss due to tail bits.

Tail-biting avoids the rate loss without suffering from degraded error protection at the end of the codeword. With tail biting technique, the starting state of encoder is not necessarily

the all-zero state. It can also be any one of the other states. The fundamental idea behind state after encoding the infor

tail-biting is that the starting state should be the same as the ending

mation sequence, i.e., x₀ =x_N. In the Trellis representation of tail-biting codes only those paths that start and end at the state are valid codewords.

2.4.1 Encoding tail-biting codes using feedback encoders

Let us consider a feedforward encoder first. It is obvious that we only have to consider the last m input k0-tuples of information sequences to fulfill the tail-biting boundary conditionx₀ =x_N. But the situation is more complicated for feedback encoders. The last

encoding statex_N depends on the entire information vector u=( , ,u₀ … u_N₋₁). Thus, we must

calculate for a given information vector u the initial statex₀ that will lead to the same state after N cycle. To solve this problem, we consider the state representation:

t t

x

₊

= A x + B u

_t (2. 32) To solve the iterated function by substitution, we can find that the complete solution of (2.32) equals to the superposition of the zero-input solution and the zero-state solution .

If we demand that the state as time t=N is equal to itial statex₀, we obtain from

[ ]zs N

(2.33):

(

)

x = A + I x

(2. 34) Where I_mdenotes the m-by-m identity matrix. If a feedback encoder with certain information length N can provide an invertible matrix(A^N +I_m), the correct initial state x₀ can be calculated by knowing the zero-state responsex^{[ ]}_N^zs .

The encoding process of tail-biting convolutional code shown in Fig. 2.9 is divided into

two steps:

First, the encoder starts from the all-zero state with given information sequences to determine the zero-state response . By knowing the zero-state response, we can calculate the corresponding initial state

[ ]zs

x0 by (2.34). Second, the encoder starts from the correct initial statex₀ and a valid codeword results.

Fig. 2.7 The encoder process of tail-biting convolutional code

Since the matrix has to be invertible, not every code length is legal with a given feedback encoder. Moreover, some feedback encoder can not be tail-biting. Some detail discussion can be found in [7], [8], and[9].

(A^N +I_m)

Chapter 3 The High Speed Turbo Decoder Design I

3.1 Introduction

Presented by Berrou et al. in 1993 [1], turbo codes have been recognized as a milestone in the channel coding theory. Due to their outstanding error-correcting capabilities, turbo codes have been highly appreciated in wireless communications, where signal-to-noise ratios (SNRs) are generally low. Two commonly used soft-input–soft-output (SISO) turbo decoding algorithms are maximum a posteriori probability (MAP) algorithm [2] and soft-output Viterbi algorithm (SOVA) [4]. MAP-based turbo decoders are known to have better performance than SOVA-based turbo decoders while having slightly larger complexity.

Many researches are proposed to improve the speed of turbo decoder. Bickerstaff proposed a high radix decoder [11]; Bougard introduced a full-duplex design [12]; Urard implemented a 5 iterations series turbo decoder [16]. Their works increase the throughput by refining the architectures of the SISO decoders. The highly parallel structure might be a solution to substantial improvement, but there are two difficulties that have to be overcome.

One is the memory contention problem resulted from high-radix and multiple processing elements; the other is the critical path resided in the add-compare-select (ACS) circuit. We proposed a high speed solution that resolves these two problems by using a novel interleaving methods and modifying the MAP decoders. Some interleaving algorithms with contention-free properties have been published [9], and our design adopts the inter-block permutation (IBP) interleaver [13]. Then we exploit a high-radix MAP decoder with shorter

critical path to increase data rate [14]. The proposed turbo decoder provides both high throughput capability and outstanding energy efficiency while maintaining equivalent performance as 3GPP turbo code.

3.2 Decoder Structure

For high speed turbo decoder design, there are generally two types of architectures proposed in the state of the art. Fig 3.1 shows these architectures, the series architecture and the parallel architecture. The series architecture duplicates the same number of processing elements as iterations and each processing element decodes the codeword for only one iteration. After decoding, each processing element will pass the extrinsic value to the next element. This architecture is easy to implement but the hardware cost is very high. The parallel architecture decodes one codeword with multiple decoders. This architecture is more flexible since number of decoders varies from different specifications. The major problem of this architecture is that how to decode a block codeword with multiple decoders. The forward recursion and the backward recursion connect the whole codeword, so we should apply some techniques to separate them. In the following, we will introduce our proposed design using the parallel architecture to solve this problem.

Fig. 3.1 Block diagram of proposed turbo decoder

Fig. 3.2 shows the block diagram of proposed decoder, which consists of 32 parallel MAP decoders and 32 parallel memory sets. We separate a codeword into 32 sub-codewords with length 128. Each sub-codeword is assigned to one decoder and decoded separately.

These sub-codewords are connected by a well-designed inter-block permutation (IBP) interleaver. This method avoids the forward and backward recursion problem while using the parallel architecture. The decoding process is described as follows: first, each memory will collect a 128-bit sub-codeword from input buffer till the whole 4096-bit codeword is received.

The memory stores the received symbols and extrinsic information, which is divided into two banks to support the radix-4 design. Second, the 32 memories will deliver the required data to the 32 MAP decoders through the IBP network, which is part of the interleaver. The interleaver is implemented with the address generators in each memory and the network controller. The MAP decoders perform the primary decoding procedures, and each one is responsible for 128 bits. After 8 iterations, this design would output the decisions of current block and start to decode next block.

Fig. 3.2 Block diagram of proposed turbo decoder

3.3 Interleaver Design for High Speed Turbo Code

3.3.1 Contention-free Interleaver

To increase throughput, a log-MAP decoder is parallelized by dividing a size-N trellis

into M size-W windows (N = MW) and employing M synchronous MAP-based decoders with M separate memory banks. Interleaving latency is eliminated by writing the M values generated each clock cycle directly to their interleaved positions. However, if the interleaver is not designed carefully, two or more MAP-based decoders may require access to the same memory bank on a given clock cycle, resulting in a memory contention. Moreover, a high radix decoding structure also suffers from the memory contention problem while accessing multiple codeword symbols from memories. Fig 3.3 shows an example of memory contention problem in a parallel decoding structure. We store a codeword sequence in order in four different memory banks. It is obvious that it is a contention-free access at all different timing with pre-permutation order. But it will have the memory contention problem if we apply different interleavers. The post-permutation 1 is a contention-free interleaver design. Because every time we access four symbols, they come from different memory banks. The interleaver design of post-permutation 2 suffers two contention collisions at time t0 and t3.

Fig. 3.3 Example of a contention-free permutation

3.3.2 IBP Interleaver

The IBP interleaver in [13] favors both performance and throughput of turbo decoder.

Such method guarantees no hazards when multiple MAP decoders try to access multiple memories concurrently. The IBP interleaver consists of two steps of permutation: intra-block permutation and inter-block permutation. The first step rearranges the symbol sequences in each sub-block with the same rule. The second step swaps the sequences between blocks periodically. The destination can be derived by executing bit-wise exclusive-or between the original block index and the IBP parameter. Fig. 3.4 demonstrates an example of IBP interleaver with four sub-blocks. First, all sub-blocks are individually reordered by right rotate;

Second, they exchange data among these permuted sequences.

Fig. 3.4 An example of IBP interleaver with four sub-blocks 3.3.3 Butterfly network

The butterfly network is designed to perform the inter-block permutation in the IBP interleaver. This structure also avoids the memory contention problem between sub-blocks and reduces the circuit complexity. Fig. 4 shows the corresponding structure for above example illustrated in Fig. 3. The network is divided into two levels, and each level has one external signal to control the multiplexers. S0 and S1 will define four possible connections. In

general, the butterfly network links N memories to N MAP decoders by log2N levels of switches. Each level requires 1-bit control signal to manage its N multiplexers; the total log2N bits establish N possible connections.

Fig. 3.5 A 4x4 butterfly network for IBP interleaver 3.3.4 Double prime interleaver

All the data inside each block will be divided into two groups and be stored in the two separate memory banks. When radix-4 MAP decoders request two symbols at each cycle, these two symbols must be derived from different memory banks. This is another contention problem that should be aware of. Our design uses the double prime interleaver to resolve this problem. The double prime interleaver is constructed by two prime interleavers whose function are expressed by

(( 2 ) mod ) 2 1, is odd2 (( 2 ) mod ) 2, is even2

( ) ⁱ {

ⁱ_i ^p_{p s} ^L ^L ⁱ_i

π

^{⎢ ⎥×}^{⎣ ⎦} ^{× +}

⎢ ⎥× + ×

⎣ ⎦

=

(3. 1)

This L is the block length, and it must be an even number. Note that p must be relative prime to L/2 and s is a constant shift. Both the interleaver and de-interleaver could be expressed in (3.1) with different parameters. Double prime interleaver with well-searched parameters would outperform the interleaver in 3GPP turbo coding. Most important of all, an well-designed double prime interleaver is an fully contention-free interleaver for certain

sub-block length. For example, we can choose any factor of the sub-block length as the parallel access number and the memory bank number. It is guaranteed that a well-designed double prime interleaver is a contention-free interleaver.

3.4 High-Throughput MAP Decoders

3.4.1 Retimed radix-2x2 ACS unit

For trellis-based decoders, the branch number of conventional high-radix design increases exponentially however the branch number of the two-stage structure increases

在文檔中 Gbps高速渦輪碼之設計與實現 (頁 20-0)