• 沒有找到結果。

CHAPTER 1 INTRODUCTION

1.3 T HESIS O RGANIZATION

This thesis consists of 7 chapters. In chapter 2, we’ll focus on interpreting turbo coding and decoding algorithm and its relative techniques. The reader is assumed to be familiar with Viterbi algorithm and thus only a brief description is made in Chapter 3. Chapter 4 explains how we decide the fix-point resolution in our design. Some simulation results will also be shown here. In chapter 5, we present the proposed architecture. For clearness, operating flow for turbo mode and Viterbi mode will be discussed separately. In addition, several characteristic of our design will be stated here. Chapter 6 outlines the specification of our implemented chip. We also provide some comparisons with other similar works. Finally, conclusion and future work are made in chapter 7.

Chapter 2

Turbo Coding and Decoding

The parallel concatenated convolutional code (PCCC), named turbo code [3], was first proposed by C. Berrou, A. Glavieux, and P. Thitimajshima in 1993. It has been proved to have a performance close to Shannon limit with simple constituent codes concatenated by an interleaver. This new technique is now adopted in both 3GPP and 3GPP2 standards due to its excellent error correction ability. In this chapter, we’ll describe the principle of both turbo encoding and turbo decoding methods. The error floor effect in turbo decoding and some decoding techniques will also be interpreted here.

2.1 Principle of Turbo codec

2.1.1 Turbo Encoding

The turbo encoder is composed of two recursive systematic convolutional (RSC) encoders, which are connected in parallel but separated by a turbo interleaver. The two RSC encoders are also called constituent codes of the turbo code. The block diagram of the turbo encoder is illustrated in Fig. 2.1. Note that the same input data are encoded by each RSC encoder but in different order. In 3GPP2 standard, each input bit is encoded as one systematic bit and two parity-check bits for each RSC encoder. Thus, the code rate of each component encoder is 1/3. In order to increase the code rate of turbo code, the systematic bits of the second RSC encoder are not transmitted. Therefore, the output encoded sequence should be {X, Y0, Y1, Y0’, Y1’}, and the overall code rate is 1/5.

Control

X Y0

Y1

Turbo Interleaver

Control

X'

Y0'

Y1'

Input message

Fig. 2.1 Turbo encoder for 3GPP2 standard

After encoding all input messages, we have to generate several tail bits to set both component encoders back to zero state. However, it’s impossible for a RSC encoder to return zero state by inserting dummy zeros into the encoder directly. Thus, a simple solution is provided in Fig. 2.2. While encoding input messages, the switch is set to position “A”. Once messages of whole block are encoded, the position of switch is changed to “B” for three additional cycles. This will force all registers to zeros and thus back to zero state.

Systematic bit Parity-check bit

Input message

A B

Fig. 2.2 Trellis Termination

2.1.2 Turbo Interleaver

The interleaver plays a very important role in turbo encoder. First of all, a proper coding gain can be achieved with small memory RSC encoders since the interleaver scrambles a long block message. Besides, the interleaver de-correlates the input of two RSC encoders so that iterative decoding algorithm can be applied between two component decoders. Theoretically, the block size of interleaver is one of the major factors to lower the upper bound on bit error probability of the turbo code system. The performance upper-bound of turbo code corresponding to a uniform random interleaver has been evaluated in [4]. The result shows that the bit-error-probability upper bound of turbo code is approximately proportional to 1/N, where N is the block size of turbo interleaver. The factor “1/N” is also called the interleaver gain.

Fig. 2.3 shows the address generator of turbo interleaver in 3GPP2 standard. It provides a maximum block size of 20,730 and minimum block size of 378. Detail supported block sizes and its corresponding “n” value are listed in Table 2.1.

Add 1

Fig. 2.3 Turbo Interleaver for 3GPP2 standard

Table 2.1 Turbo interleaver parameters

Turbo Interleaver Block size Turbo interleaver parameter (n)

378 4 402 4 570 5 762 5 786 5 1,146 6 1,530 6 1,554 6 2,298 7 2,322 7 3,066 7 3,090 7 3,858 7 4,602 8 6,138 8 9,210 9 12,282 9 20,730 10

2.1.3 Turbo Decoding

A general idea for iterative turbo decoding is illustrated in Fig. 2.4, where rs is the received systematic information, rp1 is the received parity information generated by the first

RSC encoder, and rp2 is the received parity information generated by the second RSC encoder.

The iterative turbo decoding consists of two constituent decoders, which are soft-in/soft-out (SISO) decoders concatenated serially via one interleaver and one de-interleaver. An additional interleaver is used to interleave the input systematic information and then provides the interleaved data to the second SISO decoder. Two component decoders can be implemented based on either soft-output Viterbi algorithm (SOVA) [5] or maximum a posteriori probability (MAP) algorithm [6], which will be discussed particularly in the next section. During iterative decoding process, each constituent decoder delivers the extrinsic information Lex(u) which is taken as a priori information for the other constituent decoder.

That is and . As the number of iterations increases,

better coding gain is expected. However, the correlation between two SISO decoders is also raised up. Therefore, there is no significant performance improvement if the number of iterations reaches a threshold. Fig. 2.5 shows the performance comparison under different iteration numbers in 3GPP2 standard.

1( ) 2( )

Fig. 2.4 Turbo decoding flowchart

-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

Performance of Turbo Decoder under different iteration number (N=20730, 16-QAM, Code Rate=1/5)

Fig. 2.5 Performance comparison under different iteration numbers in 3GPP2 standard

2.1.4 Error floor effect

Although turbo coding provides an excellent performance, the bit-error-rate certainly starts to decrease quite slowly at high signal-to-noise ratio (SNR). This phenomenon can be observed in Fig. 2.5. It is due to relative small free distance of turbo codes, and is called an

“error floor” [7]. Consider the relation of the minimum free distance and the bit error probability in turbo coding, which can be expressed by

0 errors can be corrected by iterative decoding since systematic information and parity information can be regarded as highly independent events. However, as the channel provides

a reliable transmission, the dependency of the systematic and parity information grows up and the interleaver does little contribution on iterative decoding. Thus, the error correction ability is limited on the weak constituent code only. To overcome this issue, we can increase the interleaver size to lower the position of the error floor or concatenate a block code, e.g. BCH code, as an outer code to remove the left error bits. For more details, please refer to [4] [8].

2.2 MAP Decoding algorithm for Turbo Decoding

It has been proved that the MAP algorithm is the optimal decoding method for turbo code while comparing with SOVA [9]. Unlike Viterbi algorithm which utilizes maximum likelihood (ML) algorithm to find the codewords with minimum error probability, the MAP algorithm minimizes the symbol (or bit) error probability. In this section, we’ll focus on introducing the turbo decoding methods based on MAP algorithm [6] [10]. Although SOVA is also one of the commonly used techniques for turbo decoding, we’ll skip it since it’s not adopted in our proposed design. To understand more detail about SOVA, please refer to [5].

And some comparisons of MAP algorithm and SOVA applied in turbo code system are shown in [9].

2.2.1 The MAP algorithm

The main idea of MAP algorithm is to compute the log-likelihood ratio (LLR) of the transmitted information bit uk conditioned on the received information rk for 1≦k≦N, where N is the block length of encoded message.

( 1| the number of output bits for each encoded bit in the constituent code. Let’s consider the trellis diagram of turbo code in 3GPP2 standard, which is shown in Fig. 2.6 as an example.

Note that the solid lines represent the transitions corresponding to an information bit uk of -1, while the dotted lines represent the transitions corresponding to an information bit uk of +1.

Then, the equation can be further expressed as

( 1| )

where the numerator and denominator are the sum of joint probabilities for all existing transitions from state sk-1 to state sk that corresponding to an information bit uk of +1 and -1 respectively.

Sk u Sk+1

k= 1 Forward Direction

for computing α uk=+1 Backward Direction for computing β

Fig. 2.6 Trellis diagram of turbo code in 3GPP2 standard

Assume the encoded data is transmitted through the discrete memoryless channel (DMC), and then the term P(sk-1,sk,r) can be decomposed as three terms:

1 1 1 rj from the end of block back to time index “k”. By shifting the value “k”, it can be perceived that α is the forward recursion of the MAP algorithm, and can be formulated as

1 1 The same as above, the backward recursion β can be formulated as

1( 1) ( 1, ) ( ) Note that since the trellis of turbo code diverges from state zero and converges to state zero, the initial condition of the forward recursion and backward recursion should be set as

0 0

Here, the term “P(uk)” is well-known as a priori probability. According to the definition of LLR, which is

( )

where the term Ak is equal for all transitions at the same time index, and thus will cancel out in (2. 3). On the other hand, the value of P(rk|uk) is dependent on channel characteristic. For an additive white Gaussian noise (AWGN) channel, the LLR of rk conditioned on uk can be expressed as

where Lc=4Es/N0 and is called the channel reliability. Here, xk,v is the v-th transmitted symbol while encoding uk. For systematic codes, xk,1 is equal to uk. Now we can obtain the value of

1 1 1

The term Lex(uk) is called extrinsic information since it’s a function of the redundant information that comes from the encoder. It removes the information about the systematic input and a priori information from . Therefore, this term is useful to estimate a priori probability for the next component decoder, and great performance improvement in iterative MAP decoding can be achieved.

k) L u

2.2.2 The Max-Log-MAP algorithm

As we can see, the MAP algorithm involves too many exponentiations and multiplications. These are quite complex for hardware realization. Thus, an approximation of MAP algorithm termed Max-Log-MAP algorithm [11] was derived for simple implementation of MAP decoders. Instead of calculating eγk, eαk, and eβk directly, all computations are done in logarithm domain. Here we define γk, αk, and βk as transition metric, forward path metric and backward path metric respectively. γk can be formulated as

( 1, ) log ( , |

1( 1) log ( |

k sk P j k sk)

β = r> (2. 19) respectively. After substituting (2. 17), (2. 18), and (2. 19), in (2. 15) can be re-written as

By utilizing the approximation of

1 2

This computation consists of forward and backward recursions that repetitively compute the αk and βk, and can be expressed by

Both equations are add-compare-select (ACS) operations, which are similar to the path metric updating of Viterbi algorithm.

2.2.3 The Log-MAP algorithm

It can be figured out easily that Max-Log-MAP algorithm is a sub-optimal solution for turbo decoding since an approximation of (2. 21) is used to reduce the complexity of MAP algorithm. This problem can be solved by Log-MAP algorithm [11]. It employs the Jacobian algorithm

1 2 been proved that (2. 21) can be computed exactly by a recursive operation of (2. 25) [9].

1

Substituting (2. 18) and (2. 19) into (2. 25), the forward and backward recursions can be represented as

where the max*(.) operation is defined as

1 2

The performance of Log-MAP algorithm is identical to that of MAP algorithm. However, the complexity is also increased compared with Max-Log-MAP algorithm since computing fc(.) still involves complicated exponentiations and multiplications. Thus, the values of fc(.) are usually stored in a pre-computed table and Log-MAP algorithm can be implemented by table look-up. It has been found that excellent performance can be obtained with 8 stored values and |δ1-δ2| ranging between 0 and 5, and no improvement is achieved with a finer

representation [9].

2.2.4 SNR sensitivity of Max-Log-MAP and Log-MAP algorithm

Referring to (2.13) and its followed deductions, it’s evident that both MAP and log-MAP algorithm requires SNR estimation to obtain the value of channel reliability, i.e. Lc. Unfortunately, accurate estimation cannot be achieved easily. Several papers have discussed the effect of SNR mismatch in turbo decoding. In [12], the simulations show that about -3 to +6dB SNR estimation offset is tolerable before significant performance degradation. However, Max-Log-MAP algorithm is able to provide a SNR independent scheme if a priori information is initialized with a reasonable value, such as all zeros for each state [13]. Due to the linearity of max(.) operations, the term Lc can be canceled out while computing . The comparison of Max-Log-MAP and Log-MAP algorithm under different SNR estimation offsets was made in [13].

k) L u

Although Log-MAP algorithm provides the performance better than that of Max-Log-MAP algorithm, it suffers the risk of serious SNR mismatch offset. Thus, channel characteristics play an important role in practical implementation. It has been concluded in [13] that if channel characteristics change over time, the Max-Log-MAP decoder is suitable to be the constituent decoder in turbo decoding. Otherwise, Log-MAP decoder should be preferable in the aspect of coding gain.

2.3 Sliding Windowed Approach

As what we described in the previous section, the MAP-series algorithm (including MAP algorithm, Max-Log-MAP algorithm, and Log-MAP algorithm) requires the entire block message to be received before decoding procedure can be started since backward path metric

computation needs information at the end of trellis. This restriction enlarges the memory requirement for hardware implementation of turbo decoder. For example, the maximum block length of 3GPP2 standard is 20730, which means 20730 metrics should be stored. Besides, long output latency is also introduced. This is disadvantageous for turbo code in real-time application.

A simple method to solve these problems is to divide data stream into many sub-blocks.

However, the last bits in these sub-blocks suffer lower error tolerance because of the lack of initial metrics for backward recursion. Thus, a sliding windowed approach was proposed in [14] and later on in [15] to overcome this drawback. It utilizes the fact that the backward path metrics can be highly reliable even without knowing the initial state if the backward recursion goes long enough. The windowed processing schedule used in our design is illustrated in Fig.

2.7 and the detail operating flow is described as follows.

i i+1 i+2 i+3

Fig. 2.7 The windowed MAP algorithm

Initially, the received data block is divided into many sub-blocks, with a sub-block length of L. L is called the convergence length. Typically, it’s about five times the constraint length of the encoder. In 3GPP2 standard, the constraint length is 4. For each sub-block i, the

forward recursion computes the forward path metrics α and storing these values into memory.

In parallel, an additional backward recursion β1 is performed in the next sub-block i+1. Once β1 operation in sub-block i+1 is finished, the last backward path metric obtained for each state is regarded as a reliable initial β for sub-block i to start its backward recursion, which is labeled as β2 in Fig. 2.7. Finally, the L uk) can be computed by α, β2, and γ. Fig. 2.8 shows the influence of different sub-block lengths on the performance of turbo code.

-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4

10-7 10-6 10-5 10-4 10-3 10-2 10-1 100

SNR

BER

Performance of Turbo Decoder under different sub-block length (N=20730, 16-QAM, Code Rate=1/5, 6 iterations)

sublen=N sublen=24 sublen=20 sublen=16 sublen=12

Fig. 2.8: Performance comparison among different sub-block lengths in 3GPP2 standard

Chapter 3

Principle of Convolutional codec

3.1 Convolutional Code

For the convolutional code, its encoder is constructed with several memory elements and modulo-2 adders. In general, it is usually expressed as a (n, k, v) convolutional encoder where n, k, v are the number of output, the number of input and the number of memory elements respectively. 3GPP2 standard specifies rate 1/2, 1/3, 1/4, and 1/6 convolutional codes. All of them have a constraint length of 9. An example of rate 1/2 convolutional encoder with the generator matrix of [753, 561](octal) is illustrated in Fig. 3.1. For each input information bit, it generates two code symbols (c0 and c1) by the generator matrix. The generator matrices for other code rate convolutional codes are listed in Table 1.2. The convolutional encoder should be initialized with all-zero state.

g0

g1 Input

information

c0

c1 Fig. 3.1 Rate 1/2 convolutional encoder with the constraint length of 9

3.2 Viterbi Decoding

Up to now, Viterbi algorithm [16] is the optimal solution to decode the convolutional code. It utilizes the maximum likelihood decoding algorithm and searches the shortest path through a weighted graph. In fact, Viterbi algorithm has become a standard due to its fairly decoding complexity. Before explaining Viterbi algorithm, a system platform, which is shown in Fig. 3.2, should be interpreted first.

Convolutional

Encoder Modulator Viterbi

Decoder Channel

m c x r m^

Fig. 3.2 A system platform of the convolutional codec

Initially, the message sequence m is encoded into the codeword sequence c. After signal modulation, the modulated sequence x is transmitted into the channel. In the receiver, the sequence r is received. The major concept of Viterbi algorithm is to find the maximum likelihood sequence according to r. Theoretically, it’s equivalent to maximize the probability of P(m|r). Using Baye’s rule

mˆ where P(r) is independent of m. Thus, what the decoder does is to maximize the probability of P(r | m). Assume the length of the received sequence is τ; then P(r | m) can be expressed as

1

Similarly, these works can be transformed to logarithm domain to reduce computing complexity. The probability P(r|m) in logarithm domain is given by

1 For AWGN channel, (3. 3) can be further rewritten as

( ) ( )

In other words, to maximize the probability of P(r | m) is to minimize Euclidean distance.

1 2 In order to compute Euclidean distance, Viterbi algorithm defines the branch metric (BM, also called transition metric, or simply TM) λt(st1, )st and the path metric Γt s, as follows.

It is clear that the path metric Γt s, is the minimum Euclidean distance for state s

k

)

t. At each time index, the decoder computes and compares the metrics of all branches that entering the state. The branch with the minimum metric and its corresponding decision bit will be preserved and others will be eliminated. The history record of the decision bits is called survivor. According to the minimum path metric at each time index, the maximum likelihood sequence can be estimated.

Finally, the steps of Viterbi algorithm can be summarized as follows.

1. Initialize all path metric storages and survivor memory.

2. According to the received sequence r, compute the branch metric λt(st1,st for each state transition.

3. Accumulate the path metric with the branch metric that will converge toward the same

state.

4. Update the path metric storage for each state according to the following principle.

(

1

)

, min 1, ( , )

k t

t s t s λt st st

Γ = Γ + 1

The decision bit of each state is also stored into survivor memory at the same time.

5. Decode the message sequence according to the minimum path metric and the survivor.

6. Repeat this process until all messages are decoded.

3.3 Trace-back Method

The trace-back method is a technique to trace the maximum likelihood sequence in the survivor memory. Here we’ll use a (2, 1, 2) convolutional code with the generator matrix of [111, 101]binary as the example. Its trellis diagram and the corresponding contents of the survivor memory are shown in Fig. 3.3. In this figure, all dotted lines represent the eliminated paths. Once the upper path entering into this state is chosen, the decision bit is set as zero;

otherwise, it’ll be set to one.

1

Fig. 3.3 Trellis diagram of the (2, 1, 2) convolutional code and its survivor memory

After all symbols are received, the maximum likelihood sequence can be decided by trace-back method. This procedure starts from the state with minimum path metric happened in S00. By recursively shifting the state number left and inserting the decision bit stored in the survivor memory back to the right hand side, decoding procedure can be completed. The overall trace-back operation of the example in Fig. 3.3 is illustrated with Fig. 3.4.

0

Fig. 3.4 Trace-back procedure of the convolutional code

In fact, the length of received symbols may be quite long. If we don’t start the trace-back

In fact, the length of received symbols may be quite long. If we don’t start the trace-back

相關文件