Viterbi decoder architecture

correct path

3.3 Viterbi decoder architecture

The Viterbi decoder consists of four main units [122]: branch metric unit (BMU), add-compare-select unit (ACSU), path metric (PM) memory, and survivor memory unit (SMU).

As illustrated in Fig. 3.14, BMU calculates the branch metrics from the input data based

ACSU SMU

BMU

PM Memory

Input Output

Figure 3.14: Block diagram of Viterbi decoder

on either (3.79) or (3.82). The ACSU recursively accumulates branch metrics (BM) as path metrics stored in the PM memory, and makes decisions to select the most likely state sequence. The add-compare-select (ACS) operation is formulated in (3.75). Finally, the SMU traces the decisions to extract this sequence in the survivor memory that keeps all survivors terminating at each state.

The nonlinear and recursive nature of ACSU limits the maximum achievable through-put rate. Furthermore, as the overall constraint length ν rises, the large number (2^ν) of ACS operations are required to determine 2^ν survivors. The hardware complexity increases exponentially, and so does the power consumption, leading to many researches on the the optimization for ACSU.

On the other hand, the SMU is also an area and power consuming blocks in Viterbi decoders. There are two mainly solutions for the SMU: the register-exchange and the mem-ory traceback architectures [123–126]. As compared with the register-exchange approach, the traceback based SMU has a limited memory bandwidth in nature, and thus limits the decoding speed. However, the traceback approach with memory is more area efficient for large constraint lengths; it is also considerably more power efficient without data movement

in the memory.

Each component in the Viterbi decoder of Fig. 3.14 will be addressed in the following sections.

3.3.1 Branch metric unit

The BMU evaluates the metric, or distance, between the received samples and the codewords on branches. In the hard decision decoding scheme, the channel is assumed to be BSC where the received signals have been decided to be the alphabets during transmission. In binary cases, the received symbols are either one or zero. Therefore, the branch metric (BM) is the Hamming distance between the received data and the codewords as expressed in (3.78).

Alternatively, the soft decision decoding scheme can be applied to improve the decoding performance [94, 127]. The BM evaluation is shown in (3.82). Due to the finite precision in practical implementation, it is necessary to quantize the channel symbols, but this will cause additional quantization noise. Such noise may increase the required SNR (Es/N0) to achieve a specific bit error rate (BER). The hardware complexity increases linearly with the quantization bit number in the demodulated symbols; therefore, the objective is to find the sufficient quantization levels that minimize the effect of quantization loss on the decoding performance.

For a ̺ bits quantizer, we consider the uniform quantization because nonuniform ones can achieve only slight improvement when ̺ ≥ 3 [128]. The stepsize ∆ is also defined to be the spacing between any two quantized values. For the received BPSK signal r⁽ⁱ⁾_t , Fig. 3.15 illustrates a ρ = 3 example, and r_t,q⁽ⁱ⁾ is the quantization results between −2^ρ−1 and 2^ρ−1− 1.

We can write the quantization function Φ(r_t⁽ⁱ⁾) as

0 +1

Figure 3.15: Block diagram of Viterbi decoder

r⁽ⁱ⁾_t,q = Φ(r_t⁽ⁱ⁾) =

The quantization bits ρ and the stepsize ∆ vary with modulation types and channel condi-tions. Moreover, for each ρ, there is a optimal ∆ that minimize the BER. We consider the

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Figure 3.16: Different quantization schemes in BPSK modulation and their BER perfor-mance

(2,1,6) convolutional encoder

G(D) = [ 1 + D²+ D³+ D⁵+ D⁶ 1 + D + D²+ D³+ D⁶ ] (3.121)

in the IEEE 802.11a wireless LAN (WLAN) system [129]. With the BPSK modulation, the

performance figures in AWGN channel are shown in Fig. 3.16, where the input codewords are quantized to be different bit number and step sizes. In Fig. 3.16(b) with the fixed SNR=0.5dB, the step size ∆ significantly affects the bit error rate, and the performance degrades rapidly for smaller ∆. The optimal step size that minimizes BER decreases as ρ increases. However, it may be better to apply the ∆ larger than the optimal value to avoid serious performance degradation [128]. If ρ is determined to be four, Fig. 3.16 also shows that the optimal ∆ is almost independent of the channel SNRs; such property avoids the necessity of dynamically adjusting ∆ according channel conditions, which are quite difficult to be estimated in real applications. Additionally, the results in the 64 quadrature amplitude

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

10⁻⁵ 10⁻⁴ 10⁻³ 10⁻²

∆

BER

ρ=3

ρ=4 ρ=5

(a) SNR=13.5dB

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹

∆

BER ^SNR=12.5dB

SNR=13dB SNR=13.5dB

(b) ρ = 4

Figure 3.17: Different quantization schemes in 64-QAM and their BER performance

modulation (64-QAM) is presented in Fig. 3.17. The quantization is slightly different from Fig. 3.15 because of the amplitude modulation. Referring to the 64-QAM constellation in [129], we use the following quantizer scheme to demap the first three bits:

b0 = Φ(I) (3.122)

b1 = Φ(4 − |I|) (3.123)

b2 = Φ(2 − |4 − |I||) (3.124)

where I is the in-phase component (carrier) in the received signal. The other three bits can also be obtained from (3.122)∼ (3.124) with I being replaced by the quadrature component Q. The performance in Fig. 3.17(a) indicates a significant improvement from ρ = 3 to ρ = 4, meaning that more resolution is requited as compared to the BPSK modulation.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Figure 3.18: Rate effect on the step size in BPSK modulation and 64-QAM

Applying different puncturing matrices [129] to the encoder (3.121), we obtain the per-formance figures in Fig. 3.18. Notice that the optimal step size decreases with the increasing rate especially in 64-QAM. in Fig. 3.18(b), the lower rate convolutional code with more redundant information seems much flexible over a wide range of step sizes, whereas the punctured codes with rates R = 3/4 and R = 2/3 become sensitive to the step size varia-tion.

3.3.2 Add-compare-select unit

The ACSU is the major arithmetic unit in the Viterbi decoder, and it also dominates the computational complexity. The most common solution to develop a high throughput Viterbi decoder is fully parallel approach where ACS units are assigned to each state. Nevertheless, the throughput is also limited by the recursive operation. Fig. 3.19(a) shows a subset of the trellis diagram in Fig. 3.8, where S^(t) denotes state Si at time t, and β^(t) represents

the branch connecting S_i^(t) and S₀^(t+1). We can construct the ACS unit for state S₀^(t+1) in Fig. 3.19(b) according to the following operation:

Γ(S₀^(t+1)) = min

i=0,1[Γ(S_i^(t)) + γ(β_i^(t))]. (3.125) The branch metric γ(β_i^(t)) can be based on either (3.83) for soft decision decoding or (3.79) for hard decision decoding. The two-way comparator (cmp) finds the best path metric associated with the survivor.

t t

+

Figure 3.19: ACS unit structure and the corresponding trellis diagram

Moreover, the critical path delay and the cost of the ACSU is also determined by the word length of fixed-point path metric Γ. With a finite word length, metric normalization is necessary to rescale Γ for the accumulation in (3.125) may cause overflow. Note that different normalization schemes will lead to different word lengths. Among various normalization approaches, the modulo normalization can simplify the circuit implementation [130, 131]

since it exploits the nature of two’s complement arithmetic and dispenses with extra control circuits. From the discussion of path truncation, we know that all survivors at time t + T would very likely originate from the same state at time t for sufficient large truncation length T ; otherwise, there would be a significant truncation error. After input symbol quantization,

在文檔中通道解碼器之設計與實作 (頁 97-102)