Viterbi algorithm - Convolutional codes - 通道解碼器之設計與實作

Convolutional codes

3.2 Viterbi algorithm

The Viterbi algorithm, proposed in 1967 [15], is a maximum likelihood (ML) decoding tech-nique for convolutional codes [104]. Assuming that the codeword v is transmitted through a discrete memoryless channel, the received sequence r is observed from the channel output.

The ML decoder will find a codeword ˆv, the estimation of v, for which the a posteriori probability P (ˆv|r) is maximum. With Bayes’ rule, the probability can be written as

P (ˆv|r) = P (r|ˆv)P (ˆv)

P (r) , (3.67)

and the maximization of P (ˆv|r) is equivalent to maximizing P (r|ˆv)P (ˆv). Note that each ˆv corresponds to a distinct state sequence of length N denoted by (x₀, x₁, . . . , x_N), assuming

x0 and xN are known. The sequence, also a path in the trellis diagram, is a finite-state discrete-time Markov process [104]. Therefore, the probability of being in state x_t+1 at time t + 1, given all states up to time t, depends only on the state xt at time t. That is

P (xt+1|x⁰, x1, . . . , xt) = P (xt+1|x^t), (3.68)

and

P (ˆv) = P (x0, x1, x2, . . . , xN)

N −1

t=0

P (xt+1|x^t, . . . , x0)

N −1Y

t=0

P (xt+1|x^t) (3.69)

For a discrete memoryless channel, we can also write

P (r|ˆv) =

N −1

t=0

P (rt|ˆv^t), (3.70)

and the decoder will maximize the probability

P (r, ˆv) = P (r|ˆv)P (ˆv) =

N −1

t=0

P (rt|ˆv^t)P (xt+1|x^t) (3.71)

For convenience, we assign a metric

Γ = − log P (r, ˆv) =

N −1X

t=0

− log P (r^t|ˆv^t) − log P (x^t+1|x^t) (3.72)

to the path; thus, the decoder need to find a path such that Γ is minimum. Note that P (xt+1|x^t) depends on the t-th encoder input ut and is zero if there is no branch between xt+1 and xt. If the data sequence u is an equally probable source, P (xt+1|x^t) is a constant

equal to 1/2^k for a binary (n, k, m) convolutional code. Consequently, we can reduce the metric to

Γ =

N −1

t=0

− log P (r^t|ˆv^t) (3.73)

We first consider the path (x0, x1, . . . , xt) terminating in xt at time t and its path metric

Γ(xt) =

t−1

i=0

− log P (rⁱ|ˆvⁱ). (3.74)

Although there are many possible paths that terminate in xt+1, the one with the smallest path metric is of interest and is denoted by ˆx(xt+1), the survivor corresponding to the state x_t+1. The set of all physical states is defined to be S. The Viterbi decoding algorithm is proceeded as follows:

• Initialization:

t = 0 ˆ

x(x0) = x0, Γ(x0) = 0

x(χ) is arbitrary, Γ(χ) = ∞ for χ ∈ S and χ 6= xˆ ⁰

• Iterations until t = N:

For each x_t+1 in S, we compute

Γ(xt+1) = min

(Γ(xt) + γ(xt+1, xt)), (3.75)

and

γ(xt+1, xt) = − log P (r^t|ˆv(x^t+1, xt)). (3.76) Notice that ˆv(xt+1, xt) is the codeword sequence that corresponds to the branch be-tween xt+1 and xt. Among the paths entering xt+1, only the one with the minimum metric is stored to be ˆx(xt+1), and the others are discarded; moreover, the path metric

Γ(xt+1) is saved for the next iteration. If t = N, the operation is completed; otherwise, t is increased by one to resume the next iteration.

Finally, we can obtain the survivor ˆx(xN) as well as the estimated data sequence ˆu on the survivor. Some decoding examples can be found in [104], [94], and [105]. The term γ(xt+1, xt) in (3.76) is called branch metric. If the n-tuple codeword is

v_t= ˆv(x_t+1, x_t) = (ˆv_t⁽¹⁾, ˆv_t⁽²⁾, . . . , ˆv⁽ⁿ⁾_t ),

and the received sequence is also n-tuple

rt = (r⁽¹⁾_t , r_t⁽²⁾, . . . , r_t⁽ⁿ⁾),

we can rewrite the branch metric as

γ(xt+1, xt) = −

i=1

log P (r⁽ⁱ⁾_t |ˆvt⁽ⁱ⁾). (3.77)

For a code over GF (2) and a binary symmetric channel (BSC) with transition probability p < 0.5, the branch metric will be

γ(xt+1, xt) = d(rt, ˆvt) log 1 − p

p + n log 1

1 − p, (3.78)

where d(rt, ˆvt) is the Hamming distance between rt and ˆvt. Additionally, since n log_1−p¹ is constant and log^1−p_p > 0, the branch metric in (3.78) can be reduced to

γ(xt+1, xt) = d(rt, ˆvt) (3.79)

without any effect on finding the least metric path in (3.75). On the other hand, if the code

is transmitted over an AWGN channel with BPSK signals, the probability is expressed

P (r_t⁽ⁱ⁾|ˆvt⁽ⁱ⁾) = 1

√2πσ² e⁻^(r

(i) t −ˆv(i)

t )2

2σ2 . (3.80)

Notice that ˆv_t⁽ⁱ⁾ has been mapped with 0 7→ −1 and 1 7→ +1, and 2σ² = N0/Es in which N0

is the one sided power spectra density of noise, and E_s is the energy per signal. Moreover, Es/N0is often termed the signal to noise ratio (SNR). As a result, the branch metric becomes

γ(xt+1, xt) = n

2ln(2πσ²) + 1 2σ²

i=1

(r⁽ⁱ⁾_t − ˆvt⁽ⁱ⁾)² (3.81)

that can also be simplified to

γ(xt+1, xt) =

i=1

(r_t⁽ⁱ⁾− ˆvt⁽ⁱ⁾)² =

i=1

[(r⁽ⁱ⁾_t )²− 2r⁽ⁱ⁾t vˆ_t⁽ⁱ⁾+ (ˆv_t⁽ⁱ⁾)²] (3.82)

for n and σ² are constant. Notice that P(r⁽ⁱ⁾_t )² is the same for all survivors, and (ˆv_t⁽ⁱ⁾)² is constant in BPSK modulation. Therefore, the metric is further reduced to

γ(xt+1, xt) = −

i=1

r⁽ⁱ⁾_t vˆ⁽ⁱ⁾_t (3.83)

which is the negative inner product between the received rt and the codeword ˆvt.

Based on the Viterbi decoding algorithm, the decoding error probability can be evalu-ated [94,100,106]. We first assume an all zero data sequence u over GF (2) is encoded (v = 0) and transmitted through a binary symmetric channel. Any 1s in the decoded sequence ˆu are decoding errors. In the trellis diagram Fig. 3.8, for instance, the correct state sequences are all S0. If some errors occur, the decoder will trace the path that diverges from the correct one. We consider the first event error that an incorrect path first diverges from the correct path at time t and remerges to it after some time instants. Assuming the incorrect path has

codewords of weight d, the first event error probability is

Based on (3.65), the first error event probability caused by all incorrect paths at time t is overbounded by

Pf(E) <

X∞ d=df ree

AdPd. (3.85)

Notice that the error event probabilities at any time instants must be (3.85) because of the independence of t. There may be many error events after the first error event. As shown in Fig. 3.10, the first error event cause the decoded path to be v₁ instead of the correct v at time t1. Moreover, the decoder eliminates v1 at time t2 due to the second error event, the the survivor becomes v2. As a result, we have the following path metrics for v, v2, and v2

t

₁

t

₂

v

₁

v

₂

v

Figure 3.10: Illustration of error events in the trellis diagram

at time t2:

Γ(v) ≥ Γ(v¹) ≥ Γ(v²) (3.86)

We can find that if the path selection at time t2 is between v and v2, the survivor will also be v2. Hence the error event probability is still bounded by (3.85), and we can conclude

that the error event probability at any time instant is

The probability Pd in (3.84) can be upper bounded by

Pd<

Consequently, we can upper bound P (E) in (3.87) with the WEF in (3.65); that is

P (E) <

If p is small, the small degree terms will dominate the bound, and we can approximate (3.89) as

P (E) ≈ Adf ree(2pp(1 − p))^d^{f ree} = A_d_{f ree}(2√p)^d^{f ree} (3.90)

Furthermore, the bit error probability Pb(E) for the source sequence u can be upper bounded by

Pb(E) < 1

k(wAw,d)Pd, (3.91)

where w and Aw,d are defined in IOWEF (3.55), and k is the information bit number per branch; thus, wAw,d is the total number of non-zero information bits on all weight d paths.

Similarly, based on (3.88) and (3.55), we can further bound Pb(E) as

Pb(E) < 1 k

∂A(W, D)

∂W |_D=2√

p(1−p),W =1 (3.92)

In the AWGN channel with binary inputs and continuous outputs, the error probability can be derived similarly according to the above mentioned approach [100, 106]. The all zero sequence is assumed to be transmitted with BPSK modulation, where 1 is mapped to +1, and 0 to −1. The correct path v is a codeword of all −1s. As shown in Fig. 3.10, if the error event v1 containing d +1 codeword symbols merges v at time t1, the path metric of v1 must be smaller, and therefore

e=1

(r^(e)− (−1))² ≥

e=1

(r^(e)− (+1))², (3.93)

where r^(e)denote the received symbols corresponding to which v1 has +1 codeword symbols.

Moreover, we can write

e=1

[(r^(e)− (−1))²− (r^(e)− (+1))²] = 4

e=1

r^(e) ≥ 0, (3.94)

and the event error probability becomes

Pd= P r{ξ =

e=1

r^(e) ≥ 0}. (3.95)

We further note that r^(e) are independent Gaussian random variables with mean −1 and variance σ² = N0/2Es; as a result, ξ is also Gaussian with mean −d and variance dσ².

Hence, the probability in (3.95) will be

Pd= Z _∞

√ 1

2πdσ² e⁻^{(ξ−(−d))2}^2dσ2 dξ

= Z _∞

d/√ dσ²

√1

2π e⁻^x2² dx = Q(r d

σ²) = Q(r 2dE_s N0

). (3.96)

The Gaussian error integral

Q(x) , 1

√2π Z _∞

e⁻^x2² dx = 1

2erfc( x

√2), (3.97)

and the complementary error function (erfc) is defined in [107]. According to (3.87), the event error probability for the AWGN channel can be represented by

P (E) <

∞

d=d_{f ree}

AdQ(r 2dE_s N0

). (3.98)

With the following bound [108, 109],

Q(x) ≤ 1

2e⁻^x2² < e⁻^x2² , (3.99)

we will have

P (E) < A(D)|D=e^−E^s/N0 (3.100) as well as the bit error probability

Pb(E) < 1 k

∂A(W, D)

∂W |D=e^−Es/N0,W =1 (3.101)

The upper bounds of (3.100) and (3.101) are derived from the weaker bound in (3.99). The tighter versions can be found in [106] and [100]. Moreover, the more accurate approximations for Q(x) are discussed in [109], [110], and [111].

The same ensemble average error bound for time-varying convolutional codes is shown in theorem 3.1, assuming the maximum likelihood decoding. The time-varying convolutional codes are counterparts of fixed, or time-invariant, convolutional codes in which the gener-ator polynomials are invariant over different time instants. Consequently, in time-varying convolutional codes, the generator matrix (see (3.40)) may have different sum-matrices Gx

at distinct rows, leading to the following encoder:

G =¯ channel coding theorem for binary codes is described as follows [100, 106].

Theorem 3.1 (Viterbi [100]). For any discrete input memoryless channel with capacity C, there exists a time-varying convolutional code of constraint length K, rate k/n bits per channel symbol, and arbitrary block length, whose bit error probability Pb, resulting from maximum likelihood decoding, is bounded by

Pb < (2^k− 1) 2^−KkE^c^(R)/R

The Gallager function [112, 113] is defined as follows:

For the set of all possible channel input alphabets X, the arbitrary set p = {p(x)|x ∈ X}

satisfies p(x) ≥ 0, ∀x ∈ X and P

xp(x) = 1. The transition probability p(y|x) for y ∈ Y and x ∈ X indicates a discrete memoryless channel, and Y denotes the channel output alphabets. The code rate R in theorem 3.1 is nats per channel symbol; that is, R = k ln 2/n.

3.2.1 Path truncation

As was indicated in the Viterbi decoding algorithm, the paths, or survivors, terminated at each state should be stored up to the last received codeword, meaning that the entire received sequence are analyzed before any decoding output. In real applications, the information sequence length N may be very large that cause massive storage requirement. Due to the practical storage constraint, the survivor for each state should be truncated to a finite length as shown in Fig. 3.11. The corresponding trellis diagram with state number M = 2^ν is truncated to finite time instants T , and there are M paths terminating at time t + T . With the truncation length of T , the decoder is required to output data on the branch at depth t according to the path metrics at time t + T [100, 114]. If all surviving paths have a

...

Figure 3.11: Trellis diagram truncated to T instants

common node at time t, the unique branch is chosen. Otherwise the branch corresponding to the best metric value at time t + T will be selected. This truncation technique can result in an additional error if an incorrect path diverges from the correct path at depth t, and remains unmerged from it before time t + T . Therefore, T must be larger enough such that the truncation error is comparable to or less than the maximum-likelihood decoding [115].

We also assume an all zero information sequence over GF (2) is encoded and transmitted through a memoryless channel. In the truncated trellis diagram of length T , the truncation error will be caused by the incorrect paths that diverge from the correct path before time t and extend to Si 6= S⁰ at time t + T without going through S0. Therefore, the WEF for

...

1 2 3 ... t- 1 t t+ 1 ... t+T

0 S

₀

S

₁

S

₂

S

₃

在文檔中通道解碼器之設計與實作 (頁 81-92)