Convolutional codes
3.2 Viterbi algorithm
The Viterbi algorithm, proposed in 1967 [15], is a maximum likelihood (ML) decoding tech-nique for convolutional codes [104]. Assuming that the codeword v is transmitted through a discrete memoryless channel, the received sequence r is observed from the channel output.
The ML decoder will find a codeword ˆv, the estimation of v, for which the a posteriori probability P (ˆv|r) is maximum. With Bayes’ rule, the probability can be written as
P (ˆv|r) = P (r|ˆv)P (ˆv)
P (r) , (3.67)
and the maximization of P (ˆv|r) is equivalent to maximizing P (r|ˆv)P (ˆv). Note that each ˆv corresponds to a distinct state sequence of length N denoted by (x0, x1, . . . , xN), assuming
x0 and xN are known. The sequence, also a path in the trellis diagram, is a finite-state discrete-time Markov process [104]. Therefore, the probability of being in state xt+1 at time t + 1, given all states up to time t, depends only on the state xt at time t. That is
P (xt+1|x0, x1, . . . , xt) = P (xt+1|xt), (3.68)
and
P (ˆv) = P (x0, x1, x2, . . . , xN)
=
N −1
Y
t=0
P (xt+1|xt, . . . , x0)
=
N −1Y
t=0
P (xt+1|xt) (3.69)
For a discrete memoryless channel, we can also write
P (r|ˆv) =
N −1
Y
t=0
P (rt|ˆvt), (3.70)
and the decoder will maximize the probability
P (r, ˆv) = P (r|ˆv)P (ˆv) =
N −1
Y
t=0
P (rt|ˆvt)P (xt+1|xt) (3.71)
For convenience, we assign a metric
Γ = − log P (r, ˆv) =
N −1X
t=0
− log P (rt|ˆvt) − log P (xt+1|xt) (3.72)
to the path; thus, the decoder need to find a path such that Γ is minimum. Note that P (xt+1|xt) depends on the t-th encoder input ut and is zero if there is no branch between xt+1 and xt. If the data sequence u is an equally probable source, P (xt+1|xt) is a constant
equal to 1/2k for a binary (n, k, m) convolutional code. Consequently, we can reduce the metric to
Γ =
N −1
X
t=0
− log P (rt|ˆvt) (3.73)
We first consider the path (x0, x1, . . . , xt) terminating in xt at time t and its path metric
Γ(xt) =
t−1
X
i=0
− log P (ri|ˆvi). (3.74)
Although there are many possible paths that terminate in xt+1, the one with the smallest path metric is of interest and is denoted by ˆx(xt+1), the survivor corresponding to the state xt+1. The set of all physical states is defined to be S. The Viterbi decoding algorithm is proceeded as follows:
• Initialization:
t = 0 ˆ
x(x0) = x0, Γ(x0) = 0
x(χ) is arbitrary, Γ(χ) = ∞ for χ ∈ S and χ 6= xˆ 0
• Iterations until t = N:
For each xt+1 in S, we compute
Γ(xt+1) = min
xt
(Γ(xt) + γ(xt+1, xt)), (3.75)
and
γ(xt+1, xt) = − log P (rt|ˆv(xt+1, xt)). (3.76) Notice that ˆv(xt+1, xt) is the codeword sequence that corresponds to the branch be-tween xt+1 and xt. Among the paths entering xt+1, only the one with the minimum metric is stored to be ˆx(xt+1), and the others are discarded; moreover, the path metric
Γ(xt+1) is saved for the next iteration. If t = N, the operation is completed; otherwise, t is increased by one to resume the next iteration.
Finally, we can obtain the survivor ˆx(xN) as well as the estimated data sequence ˆu on the survivor. Some decoding examples can be found in [104], [94], and [105]. The term γ(xt+1, xt) in (3.76) is called branch metric. If the n-tuple codeword is
ˆ
vt= ˆv(xt+1, xt) = (ˆvt(1), ˆvt(2), . . . , ˆv(n)t ),
and the received sequence is also n-tuple
rt = (r(1)t , rt(2), . . . , rt(n)),
we can rewrite the branch metric as
γ(xt+1, xt) = −
n
X
i=1
log P (r(i)t |ˆvt(i)). (3.77)
For a code over GF (2) and a binary symmetric channel (BSC) with transition probability p < 0.5, the branch metric will be
γ(xt+1, xt) = d(rt, ˆvt) log 1 − p
p + n log 1
1 − p, (3.78)
where d(rt, ˆvt) is the Hamming distance between rt and ˆvt. Additionally, since n log1−p1 is constant and log1−pp > 0, the branch metric in (3.78) can be reduced to
γ(xt+1, xt) = d(rt, ˆvt) (3.79)
without any effect on finding the least metric path in (3.75). On the other hand, if the code
is transmitted over an AWGN channel with BPSK signals, the probability is expressed
P (rt(i)|ˆvt(i)) = 1
√2πσ2 e−(r
(i) t −ˆv(i)
t )2
2σ2 . (3.80)
Notice that ˆvt(i) has been mapped with 0 7→ −1 and 1 7→ +1, and 2σ2 = N0/Es in which N0
is the one sided power spectra density of noise, and Es is the energy per signal. Moreover, Es/N0is often termed the signal to noise ratio (SNR). As a result, the branch metric becomes
γ(xt+1, xt) = n
2ln(2πσ2) + 1 2σ2
n
X
i=1
(r(i)t − ˆvt(i))2 (3.81)
that can also be simplified to
γ(xt+1, xt) =
n
X
i=1
(rt(i)− ˆvt(i))2 =
n
X
i=1
[(r(i)t )2− 2r(i)t vˆt(i)+ (ˆvt(i))2] (3.82)
for n and σ2 are constant. Notice that P(r(i)t )2 is the same for all survivors, and (ˆvt(i))2 is constant in BPSK modulation. Therefore, the metric is further reduced to
γ(xt+1, xt) = −
n
X
i=1
r(i)t vˆ(i)t (3.83)
which is the negative inner product between the received rt and the codeword ˆvt.
Based on the Viterbi decoding algorithm, the decoding error probability can be evalu-ated [94,100,106]. We first assume an all zero data sequence u over GF (2) is encoded (v = 0) and transmitted through a binary symmetric channel. Any 1s in the decoded sequence ˆu are decoding errors. In the trellis diagram Fig. 3.8, for instance, the correct state sequences are all S0. If some errors occur, the decoder will trace the path that diverges from the correct one. We consider the first event error that an incorrect path first diverges from the correct path at time t and remerges to it after some time instants. Assuming the incorrect path has
codewords of weight d, the first event error probability is
Based on (3.65), the first error event probability caused by all incorrect paths at time t is overbounded by
Pf(E) <
X∞ d=df ree
AdPd. (3.85)
Notice that the error event probabilities at any time instants must be (3.85) because of the independence of t. There may be many error events after the first error event. As shown in Fig. 3.10, the first error event cause the decoded path to be v1 instead of the correct v at time t1. Moreover, the decoder eliminates v1 at time t2 due to the second error event, the the survivor becomes v2. As a result, we have the following path metrics for v, v2, and v2
t
1t
2v
1v
2v
Figure 3.10: Illustration of error events in the trellis diagram
at time t2:
Γ(v) ≥ Γ(v1) ≥ Γ(v2) (3.86)
We can find that if the path selection at time t2 is between v and v2, the survivor will also be v2. Hence the error event probability is still bounded by (3.85), and we can conclude
that the error event probability at any time instant is
The probability Pd in (3.84) can be upper bounded by
Pd<
Consequently, we can upper bound P (E) in (3.87) with the WEF in (3.65); that is
P (E) <
If p is small, the small degree terms will dominate the bound, and we can approximate (3.89) as
P (E) ≈ Adf ree(2pp(1 − p))df ree = Adf ree(2√p)df ree (3.90)
Furthermore, the bit error probability Pb(E) for the source sequence u can be upper bounded by
Pb(E) < 1
k(wAw,d)Pd, (3.91)
where w and Aw,d are defined in IOWEF (3.55), and k is the information bit number per branch; thus, wAw,d is the total number of non-zero information bits on all weight d paths.
Similarly, based on (3.88) and (3.55), we can further bound Pb(E) as
Pb(E) < 1 k
∂A(W, D)
∂W |D=2√
p(1−p),W =1 (3.92)
In the AWGN channel with binary inputs and continuous outputs, the error probability can be derived similarly according to the above mentioned approach [100, 106]. The all zero sequence is assumed to be transmitted with BPSK modulation, where 1 is mapped to +1, and 0 to −1. The correct path v is a codeword of all −1s. As shown in Fig. 3.10, if the error event v1 containing d +1 codeword symbols merges v at time t1, the path metric of v1 must be smaller, and therefore
d
X
e=1
(r(e)− (−1))2 ≥
d
X
e=1
(r(e)− (+1))2, (3.93)
where r(e)denote the received symbols corresponding to which v1 has +1 codeword symbols.
Moreover, we can write
d
X
e=1
[(r(e)− (−1))2− (r(e)− (+1))2] = 4
d
X
e=1
r(e) ≥ 0, (3.94)
and the event error probability becomes
Pd= P r{ξ =
d
X
e=1
r(e) ≥ 0}. (3.95)
We further note that r(e) are independent Gaussian random variables with mean −1 and variance σ2 = N0/2Es; as a result, ξ is also Gaussian with mean −d and variance dσ2.
Hence, the probability in (3.95) will be
Pd= Z ∞
0
√ 1
2πdσ2 e−(ξ−(−d))22dσ2 dξ
= Z ∞
d/√ dσ2
√1
2π e−x22 dx = Q(r d
σ2) = Q(r 2dEs N0
). (3.96)
The Gaussian error integral
Q(x) , 1
√2π Z ∞
x
e−x22 dx = 1
2erfc( x
√2), (3.97)
and the complementary error function (erfc) is defined in [107]. According to (3.87), the event error probability for the AWGN channel can be represented by
P (E) <
∞
X
d=df ree
AdQ(r 2dEs N0
). (3.98)
With the following bound [108, 109],
Q(x) ≤ 1
2e−x22 < e−x22 , (3.99)
we will have
P (E) < A(D)|D=e−Es/N0 (3.100) as well as the bit error probability
Pb(E) < 1 k
∂A(W, D)
∂W |D=e−Es/N0,W =1 (3.101)
The upper bounds of (3.100) and (3.101) are derived from the weaker bound in (3.99). The tighter versions can be found in [106] and [100]. Moreover, the more accurate approximations for Q(x) are discussed in [109], [110], and [111].
The same ensemble average error bound for time-varying convolutional codes is shown in theorem 3.1, assuming the maximum likelihood decoding. The time-varying convolutional codes are counterparts of fixed, or time-invariant, convolutional codes in which the gener-ator polynomials are invariant over different time instants. Consequently, in time-varying convolutional codes, the generator matrix (see (3.40)) may have different sum-matrices Gx
at distinct rows, leading to the following encoder:
G =¯ channel coding theorem for binary codes is described as follows [100, 106].
Theorem 3.1 (Viterbi [100]). For any discrete input memoryless channel with capacity C, there exists a time-varying convolutional code of constraint length K, rate k/n bits per channel symbol, and arbitrary block length, whose bit error probability Pb, resulting from maximum likelihood decoding, is bounded by
Pb < (2k− 1) 2−KkEc(R)/R
The Gallager function [112, 113] is defined as follows:
For the set of all possible channel input alphabets X, the arbitrary set p = {p(x)|x ∈ X}
satisfies p(x) ≥ 0, ∀x ∈ X and P
xp(x) = 1. The transition probability p(y|x) for y ∈ Y and x ∈ X indicates a discrete memoryless channel, and Y denotes the channel output alphabets. The code rate R in theorem 3.1 is nats per channel symbol; that is, R = k ln 2/n.
3.2.1 Path truncation
As was indicated in the Viterbi decoding algorithm, the paths, or survivors, terminated at each state should be stored up to the last received codeword, meaning that the entire received sequence are analyzed before any decoding output. In real applications, the information sequence length N may be very large that cause massive storage requirement. Due to the practical storage constraint, the survivor for each state should be truncated to a finite length as shown in Fig. 3.11. The corresponding trellis diagram with state number M = 2ν is truncated to finite time instants T , and there are M paths terminating at time t + T . With the truncation length of T , the decoder is required to output data on the branch at depth t according to the path metrics at time t + T [100, 114]. If all surviving paths have a
...
Figure 3.11: Trellis diagram truncated to T instants
common node at time t, the unique branch is chosen. Otherwise the branch corresponding to the best metric value at time t + T will be selected. This truncation technique can result in an additional error if an incorrect path diverges from the correct path at depth t, and remains unmerged from it before time t + T . Therefore, T must be larger enough such that the truncation error is comparable to or less than the maximum-likelihood decoding [115].
We also assume an all zero information sequence over GF (2) is encoded and transmitted through a memoryless channel. In the truncated trellis diagram of length T , the truncation error will be caused by the incorrect paths that diverge from the correct path before time t and extend to Si 6= S0 at time t + T without going through S0. Therefore, the WEF for