• 沒有找到結果。

Chapter 1 Introduction

1.3 Organization of the Thesis

This thesis is organized as follows. In chapter 2, we describe the fundamentals of convolutional code and Viterbi algorithm. The general architectures of Viterbi decoder will be introduced in chapter 3. In chapter 4, some low-power schemes for Viterbi decoder will be presented. In chapter 5, we proposed a low-power Viterbi decoder with reduced state transition and efficient memory access. The implementation results and some comparison will also be presented. Finally, the conclusions and future work are given in chapter 6.

Chapter 2

Convolutional Code and Viterbi Algorithm

2.1 Convolutional Code

Convolutional code is a widely used error control code in modern communication systems such as DVB-T, IEEE 802.11, IEEE 802.16, and MB-OFDM UWB systems.

To describe a convolutional code, one needs to characterize the encoding process.

Several methods such as matrix and polynomial representation are used for representing the encoding process of convolutional code. In addition, the trellis diagram description is a common way for illustrating the codeword sequence with timing information. All of them will be introduced in this section.

2.1.1 Encoding of Convolutional Code

A convolutional encoder generates a coded output data stream from an input data stream. As mentioned in previous chapter, a convolutional code is specified in (n, k, m) format where (n, k, m) denotes the number of output, the number of input, and the number of memory element respectively. The coding rate is k/n which means k input bits produce n output bits. The coded bit depends not only on the current input bit but also on m previous input bits. A convolutional encoder is composed of several shift registers and modulo-2 adders (or the XOR operation). Figure 2.1 shows a (2, 1, 2) convolutional encoder with two shift registers and three modulo-2 adders. It produces 2-bit encoded codeword for 1-bit input information.

(2) (2) (2)

Figure 2.1 The (2, 1, 2) convolutional encoder

The input of this encoder is some binary sequence, u=( , , , )u u u0 1 2 … . The output

is an interleaved sequence of the two binary

sequences and . For each input bit, the coded symbol and are generated by the following function

(1) (2) (1) (2) (1) (2)

where denotes the XOR operation. Next, the input bit is shifted into the leftmost register and the bits in the registers are shifted one position to the right. Therefore, the codeword sequence c depends on not only the current input bit but also on the two previous input bits and

ui 1

ui ui2. Obviously, the interconnection of the encoder influences the codeword sequence. In general, these interconnections of a (n, k, m) convolutional encoder can be formulized as the generator sequences

(2.3)

where ) represents the interconnections for coded symbol from left to right.

( ) ( ) ( )

0 1

(g i ,gi , ,… gmi c( )i

For information sequence u, the encoding process can be represented in a matrix form as

c uG= (2.4)

where G is called the generator matrix. For a (n, k, m) convolutional code, the generator matrix is made up in the form of

(1) (2) ( ) (1) (2) ( ) (1) (2) ( ) (1) (2) ( )

where each row of the matrix is obtained by interleaving the generator sequences. For example, the (2, 1, 2) convolutional encoder in Figure 2.1 can be described by

(2.5) Assume the input information sequence is

1011100

u= … (2.7)

Then the coded sequence can be analyzed as

1 1 1 1 0 1 1

Finally, the interleaved codeword sequence can be obtained as

(2.9) 11,10, 00, 01,10, 01,11,

c= …

In addition to the matrix representation, the encoding process can be described in a polynomial form. A (n, k, m) convolutional encoder is often characterized by the

generator polynomial. The degree of the generator polynomial is less or equal than m.

The coefficient of each term is either 1 or 0, depending on whether a connection exists between the shift register and the modulo-2 adder. For example, the generator polynomial of the (2, 1, 2) convolutional encoder in Figure 2.1 can be written as

(1)( ) 1 2

g D = + +D D (2.10)

(2)( ) 1 2

g D = +D (2.11)

where the factor D means the unit delay operation. For information polynomial u(D), the encoded polynomials are expressed by

(2.12)

Assume the information sequence is the same as that of previous example, the input polynomial can be represented as

2 3

( ) 1

u D = +D +D +D4 (2.14)

Then the encoded the encoded polynomials become

(2.15)

Thus the interleaved codeword sequence is

(2.17) 11,10, 00, 01,10, 01,11,

c= …

which agree with the result from previous example.

2.1.2 Trellis Diagram of convolutional code

One can regard a convolutional encoder as a finite state machine, where the output is a function of the current input and the current state. Thus, the operation of a convolutional encoder can be specified by the state diagram. Figure 2.2 shows the state diagram of the convolutional encoder in Figure 2.1. As there are two shift registers in the encoder circuit, the contents of these shift registers will have four

states represented as 00, 01, 10, and 11. A state transition corresponding to an information bit “0” is represented by a dotted line. Similarly, a state transition corresponding to an information bit “1” is represented by a solid line. The label on the line represents the information input and the corresponding codeword symbols generated by the state transition.

Figure 2.2 State diagram of the convolutional encoder in Figure 2.1

With the state diagram, it is easy to determine the codeword sequence in the encoding process. For example, assume the information sequence is (1011100…).

The transition starts at state 00 and goes through the state diagram corresponding to a solid line if the information bit is “1”, and a dotted line if that is “0”. Following the track, the codeword sequence is (11, 10, 00, 01, 10, 01, 11,…). This codeword sequence is the same as the result described in section 2.1.1.

As the length of information sequence is large, it is difficult to trace the codeword sequence from the state diagram. Therefore, a representation called a trellis diagram is obtained from an extension of the state diagram that shows the dimension of time.

Figure 2.3 shows encoding process for the information sequence (1011100…) by the trellis diagram. With the trellis diagram, it is easy to illustrate the encoding process as well as the decoding process described in next section.

Figure 2.3 The trellis diagram of the convolutional encoder in Figure 2.1

2.2 Viterbi Algorithm

The Viterbi algorithm [1] proposed by A.J. Viterbi in 1967 is used to decode convolutional code. Forney [2] later proves that the Viterbi algorithm provides a maximum likelihood (ML) decoding algorithm. In fact, an optimum solution to decode a convolutional code is equivalent to find the maximum likelihood path in the trellis diagram. Until now, Viterbi algorithm is still the optimal solution for convolutional code and has become an important algorithm in communication systems. The maximum likelihood decoding and Viterbi algorithm will be introduced in this section.

2.2.1 Maximum Likelihood Decoding

Figure 2.4 shows a simplified communication system that focuses on the channel coding. The encoder transforms the information sequence u into the codeword sequence c by adding certain structural redundancy. Then the codeword sequence c is

transmitted across the noisy channel. The decoder uses the redundancy to correct the errors in the received sequence r and produces an estimate which is the most possible information sequence.

ˆu

u c r ˆu

Figure 2.4 The system blocks that focuses on the channel coding

The maximum likelihood decoder finds the sequence that maximizes the probability . Considering a rate k/n convolutional code, assume the information sequence u is composed of L k-bit blocks.

ˆc

The codeword sequence c generated by the convolutional encoder consists of L n-bit blocks.

(0) (1) ( 1) (0) (1) ( 1) ( 1)

0 0 0 1 1 1 1

( , , , n , , , , n , , L ) c= c cc c cc c

The decoder receives sequence r and generates the maximum likelihood sequence . They have the following form.

ˆc The probability can be expressed as

(2.18)

From equation (2.18), the maximum likelihood estimation is ˆc

1 1

Taking the logarithm conversion to equation (2.19), the product terms turn into summation terms. Thus, the estimation becomes ˆc

( )

Equation (2.20) shows that to maximize is equivalent to minimize Euclidean distance of and . This rule is also the function of Viterbi algorithm which will be described in next subsection.

log ( | )P r c

r c

2.2.2 Viterbi Decoding Algorithm

The goal of Viterbi algorithm is to find codeword that maximize the probability . According to the maximum likelihood decoding rule, Viterbi proposed an algorithm to compute the minimum Euclidean distance as time goes on. There are two basic measures defined in the Viterbi algorithm, which are branch metric

( | )

PM . At each time t, the branch metric and path metric are s

computed as

x y

The branch metric is the Euclidean distance between a received symbol and the corresponding trellis codeword symbol.

x y

t s s

BM represents the branch metric associated with the transition between the state s at time t-1 and state x sy at time t.

The path metric is the minimum Euclidean distance between a received sequence and the corresponding trellis codeword sequence.

y

t

PM represents the path metric of the s

state sy at time t. In other words, the path metric is the accumulation of branch metrics that across the corresponding paths. Therefore, the Viterbi algorithm can find the minimum path metric at each time instant. Then the maximum likelihood sequence can be estimated in trellis diagram along the minimum path metric.

For a (n, k, m) convolutional code, the steps of the Viterbi algorithm can be described as the following.

z Step 1.

Initially, set path metrics as

0 0 0 0

and store the survivor at time t. The survivor means the decision bit corresponding to the chosen branch from all branch merged into Sy.

z Step 3.

If t < L (the length of information sequence), go to step 2. Otherwise, stop.

Figure 2.5 illustrates the Viterbi decoding process over an ideal channel by the trellis diagram. Assume the information sequence is the same as the example described in Figure 2.3, the codeword sequence (11, 10, 00, 01, 10, 01, 11,…) is transmitted through the channel. Based on the assumption of ideal channel, the received sequence will be the same as the codeword sequence. The path metric is labeled above each state. As previous mentioned, the path metric of state 00 at time t=0 is initialized to 0. At each time instant, the path metric is updated and only one of branches merged to the current state is preserved. The preserved branches, called the survivors, are represented by solid lines. On the other hand, the discarded branches are represented by dotted lines. When the computation of survivors and path metrics are done, the next step is to decode the information sequence . In Figure 2.5, the best state at time t=7 is 00. By performing a trace-back process from the best state, one can estimate the source information sequence. In this example, the decoded information sequence is (1011100…) where the corresponding survivors are highlighted in Figure 2.5.

ˆu

Figure 2.5 Viterbi decoding over an ideal channel

As the codeword sequence is transmitted through a noisy channel, the received sequence may not match the original codeword sequence due to the channel noise.

Figure 2.6 shows the Viterbi decoding process over a noisy channel. Considering the same codeword sequence as that of previous example is transmitted through the channel, assume the received sequence including two-bit errors is (11, 11, 00, 01, 10, 00, 11,…) . In Figure 2.6, the errors are represented in boldface. By the process mentioned before, one can obtain the decoded information sequence (1011100…) which is identical to the source information bits.

Figure 2.6 Viterbi decoding over a noisy channel

2.2.3 Path Merging Property

Figure 2.7 shows the survivors in Figure 2.6 and four survivor paths corresponding to each state. Figure 2.7 also shows the path merging property of the Viterbi algorithm. In this example, all survivor paths will merge to the survivor path with the minimum path metric after the merged point. In other words, the decoded data is determined after all survivor paths merge, whether the trace-back operation starts from the best state or not.

Figure 2.7 Path merging phenomenon in Figure 2.6

The path merging property of the Viterbi algorithm is an important characteristic for hardware implementation. In practical application, the length of the information sequence may be very large. To reduce the storage requirement and the decoding latency, the survivor path should be truncated to a finite length, called the truncation length. Figure 2.8 shows the truncated survivor paths while the length of information sequence is N. The boldface line means the survivor path with minimum path metric.

All survivor paths will merge with high probability if the truncation length L is long enough. By selecting proper truncation length, the decoded data can be determined with L-stage information only. Moreover, it is unnecessary to search for the best state.

Figure 2.8 Truncated survivor paths

Chapter 3

Architecture of Viterbi Decoder

In this chapter, we will introduce the hardware implementation of the Viterbe algorithm. Figure 3.1 shows the main blocks of Viterbi decoder. A Viterbi decoder is usually composed of four basic units. They are summarized as following.

z Branch Metric Unit (BM Unit):

According to the received sequence, compute the branch metric for different branches in trellis diagram.

z Add-Compare-Select Unit (ACS Unit):

Accumulate the branch metric recursively and perform comparison operation to generate the path metric for each state. Decide the survivor corresponding to each state according to the comparison result.

z Path Metric Unit (PM Unit):

Store the path metric at each time instant.

z Survivor Memory:

Store the survivors from ACS unit. Then use the register-exchange approach or trace-back approach to decode the maximum likelihood information sequence.

Figure3.1 Main blocks of Viter decoder

3.1 Branch Metric Unit

This unit generates all branch metrics from the received symbol. If the receiver adopts 1-bit quantization, it is called the hard-decision decoding. On the other hand, the soft-decision decoding adopts q-bit quantization when receiving the transmitted symbols. Figure 3.2 illustrates the quantization of the received symbol. In fact, hard-decision decoding uses a bit to indicate a received bit, while soft-decision decoding uses q bits to indicate a received bit. Although soft-decision decoding performs better than hard-decision decoding, the complexity of branch metric unit and ACS unit with soft-decision decoding is high. In general, 3-bit soft-decision decoding is a good choice considering the trade-off between performance and complexity.

(a)Hard-decision

(b) 3-bit soft-decision

Figure 3.2 Quantization of the received symbol

Taking the (2, 1, 2) convolutional code described before as example, the received symbol with q-bit quantization can be represented by (r1 r2). The codeword symbol corresponding to each trellis branch may be 00, 01, 10, or 11. The branch metrics are defined as

1 2

Equation (3.1) can be rewritten as a simpler form:

1 2

According to equation (3.2), one can easily implement the branch metric unit and the result of all branch metrics are delivered to the ACS unit. Figure 3.3 shows the architectures of branch metric unit for hard-decision decoding and 3-bit quantization soft-decision decoding.

(a) Branch metric unit for hard-decision decoding

Received bit 1

(b) Branch metric unit for 3-bit soft-decision decoding Figure 3.3 The architectures of branch metric unit

3.2 Add-compare-select Unit

The trellis diagram of convolutional code can be decomposed in to basic sub trellises. Each sub trellis can be implemented as the add-compare-select (ACS) module. The ACS module is the key component in the Viterbi decoder to calculate the minimum path metric and to estimate the survivor.

There are many issues in designing an ACS structure. For low complexity application, the bit-serial ACS unit is used to save the area even to reduce the power consumption. For high speed application, the bit-parallel structure is used by duplicating ACS units for a (n, k, m) convolutional code. As modern communication systems are required to transmit information in high data rate, this section focuses on the fully parallel architecture. Some ACS structure for different applications will be discussed in this section.

2m

3.2.1 Radix-2 ACS Structure

As previous mentioned, the ACS unit calculate the minimum path metric and estimate the survivor. Each ACS unit adds the previous path metric of each predecessor state to the corresponding branch metric. Then, it compares the results among all partial path metrics to find the minimum partial path metric. And all compared results of ACS units, which mean the estimated information, are saved in the survivor memory. Moreover, the minimum partial path metric is selected as the new path metric.

Figure 3.4 shows the 4-state radix-2 trellis and the fundamental radix-2 ACS unit for state S0. As the trellis diagram illustrated, the state S0 has two predecessor states including S0 and S1. First, the corresponding path metric and branch metric are added.

Then, the two summations are compared to decide which branch is the survivor and which path metric is updated. The new path metric will become the predecessor path

metric at next time instant. Because of the feed-back characteristic, the main speed issue of Viterbi decoder depends on the ACS unit.

1

Figure 3.4 The 4-state radix-2 trellis and the radix-2 ACS unit for state S0

3.2.2 High-radix ACS and Two-dimension ACS

ACS unit is the speed bottleneck of Viterbi decoder due to the feed-back characteristic described in previous subsection. For high speed applications, decreasing the critical path of ACS unit is the most intuitive idea. High-radix ACS structures like radix-4 ACS, radix-8 ACS, radix-16 ACS …, etc. are such strategy.

The high-radix structures unroll the ACS loop in order to perform multi-step of the trellis within a single clock period. These lookahead methods replace the fundamental radix-2 trellis with a radix-4 trellis or radix-8 trellis …, etc. For example, a 4-state radix-2 trellis and a 4-state radix-4 trellis are shown in Figure 3.5. Note that the radix-4 ACS trellis in Figure 3.5(b) is formed by combining a two-stage of radix-2 trellis in Figure 3.5(a). For the same clock period, it is clear that the data rate of the radix-4 ACS is two time faster than that of the radix-2 unit. In a similar manner, one can obtain a higher radix trellis diagram.

S0 t-1

(a) 4-state radix-2 trellis diagram (b) 4-state radix-4 trellis diagram Figure 3.5 The 4-state radix-2 and radix-4 trellis diagrams

Higher radix trellis must be realized by much larger costs of area. Figure 3.6 shows a radix-4 ACS unit for state S0. This unit computes four sums in parallel followed by a four-way comparison. The comparison illustrated in Figure 3.7 is realized using six parallel subtractions for minimizing the critical path. Select signal (D0(1) and D0(0)) for 4-to-1 multiplexer can be realized by simple logic gates.

Afterward, the minimum partial path metric is selected as the new path metric.

Although the critical path increases, the radix-4 architecture achieves two operation steps per clock period. Consequently, the effective throughput is improved.

2

Figure 3.6 A radix-4 ACS unit

_ 0

Figure 3.7 The 4-way comparator in Figure 3.6

For the same clock period, the radix-2τ ACS unit achieves τ times speed up as compared to the radix-2 ACS unit. Nevertheless, the number of trellis branches will be 2τ-1 times of that in radix-2 trellis, leading to the exponentially increasing complexity. The comparison of different radix-2τ ACS structures is shown in Table 3.1. The high-radix approach that accelerates Viterbi algorithm can also cause large critical path due to exponentially increasing branches. Among different radix-2τ ACS structures, radix-4 ACS is a popular choice because of the better compromise between cost and throughput.

Table 3.1 Comparison of different radix-2τ ACS structures

Radix Throughput Complexity

2 1 1

4 2 2

8 3 4

16 4 8

Although high-radix ACS unit performs multi-step of the trellis within a single clock period, the exponentially increasing complexity causes the difficulty in VLSI implementation. The number of branch metrics generated by the BM unit also increases exponentially. Therefore, a radix-2p×2q structure is introduced to achieve the throughput equivalent to radix-2τ approach where τ = p + q. The radix-2p×2q ACS unit, referred to the two-dimension structure, is similar to the radix-2τ ACS unit, except that only smaller radix-2p ACS unit and radix-2q ACS unit are required. Since the exponentially increasing hardware cost of a high-radix ACS, the complexity of a Viterbi decoder based on radix-2p×2q architecture is much smaller than that based on radix-2τ architecture. However, the critical path of the two-dimension ACS unit is longer than of radix-2τ ACS unit. Figure 3.8 shows a 4-state radix-2 trellis and a 4-state radix-2x2 trellis. The structure of radix-2x2 ACS unit for state S0 is shown in Figure 3.9.

(a) 4-state radix-2 trellis diagram (b) 4-state radix-2×2 trellis diagram Figure 3.8 The 4-state radix-2 and radix-2×2 trellis diagrams

相關文件