二元迴旋碼之接近最大機率循序搜尋解碼演算法

(1)

國立交通大學

電信工程學系

博士論文

二元迴旋碼之接近最大機率循序搜尋解碼

演算法

Near Maximum-Likelihood Sequential-Search

Decoding Algorithms for Binary

Convolutional Codes

研究生：謝欣霖

(2)

二元迴旋碼之接近最大機率循序搜尋解碼演算法

Near Maximum-Likelihood Sequential-Search

Decoding Algorithms for Binary Convolutional Codes

研究生：謝欣霖

Student:

Shin-Lin

Shieh

指導教授：陳伯寧博士 Advisor:

Dr.

Po-Ning

Chen

國立交通大學

電信工程學系

博士論文

A Dissertation

Submitted to Institute of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Communication Engineering

Hsinchu, Taiwan

(3)

二元迴旋碼之接近最大機率循序搜尋

解碼演算法

研究生：謝欣霖

指導教授：陳伯寧博士

國立交通大學電信工程研究所

摘要

在本篇論文中，我們重新探索 [17] 裡的最大機率循序搜尋解碼演算法 (Maximum-Likelihood Sequential-Search Algorithm)。藉由更換傳統的費諾計量值(Fano metric)為基於華格納規則(Wagner rule)推導出的計量值，[17]裡的循序搜尋解碼演算法可以保證產生最大機率效能，於是被命名為最大機率循序搜尋解碼演算法。藉由模擬結果顯示出此演算法比維特比演算法 (Viterbi algorithm)擁有明顯較低的軟體解碼複雜度。一個循序搜尋解碼演算法共通的問題是當訊雜比(signal-to-noise ratio) 低於對應於切斷率(cut-off rate)的訊雜比時，平均解碼複雜度和所需的堆疊 (stack)大小會隨著訊息長度快速增加。此共通問題限制了循序搜尋解碼演算法使用於訊息長度較長的迴旋碼。為了降低此問題的影響，本文中建議在搜尋過程中如果堆疊裡的最高路徑(top path)的層級(level)比搜尋過程最遠的展開層級還少超過△以上，則將此最高路徑直接丟棄。我們將此動作稱為提早淘汰(early elimination)。我們接著分析使得解碼的效能衰退忽略不計所需要的最小△值。由我們的理論分析結果顯示所需的△值大小對編碼率1/2的迴旋碼在加成性高斯白雜訊(additive white Gaussian noise)通道及二元對稱(binary symmetric)通道大約分別為3倍及2.2倍的限制長度(constraint length)。對編碼率1/3的迴旋

(4)

碼，所需的△值大小更分別下降到2倍及1.2倍的限制長度。這些理論分析結果也幾乎與模擬的結果相吻合。由於只需要很小的△值即可達到接近最大機率的效能，經過此修改的循序搜尋解碼演算法消除了大量的計算量及記憶體需求。這使得此修改過後的循序搜尋解碼演算法適合於需要低軟體複雜度卻又要求接近最大機率解碼效能的應用。我們更進一步使用Berry-Esseen不等式分析修改前及修改後的循序搜尋解碼演算法複雜度。理論分析得到的平均複雜度上界及模擬得到的平均複雜度結果都顯示此修改將大幅降低解碼的複雜度，並且使得修改過後循序搜尋解碼演算法的每單位訊息長度所需之解碼複雜度在中到高訊雜比狀況下幾乎與迴旋碼的計憶深度 (memory order)無關。

(5)

NEAR MAXIMUM-LIKELIHOOD

SEQUENTIAL-SEARCH DECODING ALGORITHM

FOR BINARY CONVOLUTIONAL CODES

Student: Shin-Lin Shieh

Advisor: Dr. Po-Ning Chen

Department of Communication Engineering

National Chiao Tung University

Abstract

In this work, the maximum-likelihood sequential-search decoding algorithm proposed in [17] is revisited. By replacing the conventional Fano metric with one that is derived based on the Wagner rule, the sequential-search decoding in [17] guarantees the maximum-likelihood (ML) performance, and was therefore named the

maximum-likelihood sequential decoding algorithm (MLSDA). It was then concluded by simulations that when the MLSDA is operated over the convolutional code trellis, its software computational complexity is in general considerably smaller than that of the Viterbi algorithm.

A common problem on sequential-type decoding is that at the signal-to-noise ratio (SNR) below the one corresponding to the cut off rate, the average decoding complexity and the required stack size grow rapidly with the information length [25]. This problem, to some extent, prevents the practical use of sequential-type decoding from codes with long information sequence. In order to alleviate the problem in the MLSDA, we propose to directly eliminate the top path whose end node is ∆-trellis-level prior to the farthest one among all nodes that have been expanded thus far by the sequential search, which we termed the early elimination. We then analyze the early-elimination window that results in negligible performance degradation for the MLSDA. Our asymptotic-based analytical result indicates that the required early elimination window for negligible performance degradation is around three times (resp. 2.2-fold) of the constraint length for rate one-half convolutional codes under additive white Gaussian (resp. binary symmetric) channel. For rate one-third convolutional codes, the required early-elimination window reduces to two times (resp. 1.2-fold) of the constraint length for the same channel. The theoretical level thresholds almost coincide with the simulation results.

As a consequence of small early elimination window required for near

(6)

rules out considerable computational burdens, as well as memory requirement, by directly eliminating a big number of the top paths. This makes the MLSDA with early elimination suitable for applications that dictate a low-complexity software

implementation with near maximum-likelihood performance. The upper bounds of decoding complexity of both the MLSDAs with and without early elimination are subsequently derived by utilizing the Berry-Esseen inequality. Both the upper bound and the simulated complexity indicate that the average decoding complexity per output bit for the MLSDA with early elimination is almost irrelevant to the memory order, as well as the message length, for medium to high SNRs.

(7)

誌謝

首先要感謝指導教授陳伯寧老師及韓永祥老師。由於兩位多年來的悉心教誨，使得我能完成這篇論文。感謝兩位適時指導我研究方向，指正我研究路上的錯誤、以及讓我學習到出色的研究所需要的嚴謹態度。接下來感謝金湖幼稚園、柏村國小、金湖國小、金城國中、金門高中、清華電機系、清華電機研究所一路所遇到的所有師長。感謝大家在我求學路上的教導及照顧，為我取到博士學位奠定穩固的基礎。再來感謝家人一路的付出及關懷。沒有你們的支持和鼓勵，我無法堅持到現在完成學業。願全家人能共享這份榮耀。最後感謝工業技術研究院、凌陽科技股份有限公司、凌陽電通科技股份有限公司一路上對我研究的支持。

(8)

List of Figures

2.1 Encoder for the binary (2, 1, 2) convolutional code with generators g1 = 7

(octal) and g2 = 5 (octal), where gi is the generator polynomial characterizing

the ith output. . . 7 2.2 Encoder for the binary (3, 2, 2) systematic convolutional code with generators

g₁(1) = 4 (octal), g₁(2) = 0 (octal), g(1)₂ = 0 (octal), g(2)₂ = 4 (octal), g(1)₃ = 2 (octal) and g(2)₃ = 3 (octal), where g(j)_i is the generator polynomial character-izing the ith output according to the jth input. The dashed box is redundant and can actually be removed from this encoder; its presence here is only to help demonstrating the derivation of generator polynomials. Thus as far as the number of stages of the jth shift register is concerned, K1 = 1 and K2 = 2. 8

2.3 Code tree for the binary (2, 1, 2) convolutional code in Fig. 2.1 with single input sequence of length 5. Each branch is labeled by its respective “input bit/output code bits”. The code path indicated by the thick line is labeled in sequence by code bits 11, 01, 10, 01, 00, 10 and 11, and its corresponding codeword isv = (11 01 10 01 00 10 11). . . . 12

2.4 Trellis for a (3, 1, 2) binary convolutional code with information length L = 5. In

this case, the code rateR = 1/3 and the codeword length N = 3(5 + 2) = 21. The

code path indicated by the thick line is labeled by 111, 010, 001, 110, 100, 101 and

(11)

3.1 Bit error rates of the MLSDA for (2, 1, 6) and (2, 1, 10) convolutional codes with L = 100. . . 22 3.2 Average decoding complexities of the MLSDA for (2, 1, 6) and (2, 1, 10)

con-volutional codes with L = 100. . . 23 3.3 Average decoding complexity versus information length for the MLSDA

ap-plied to the (2, 1, 10) convolutional code. . . 24

3.4 Early elimination window Δ in the trellis-based MLSDA. . . 25

4.1 Single-inputn-output encoder model considered in [36]. All elements are in GF(q),

whereq is either a prime or a power of a prime. . . 28

4.2 Exponent lower bound E_r(R) of the additional error due to path truncation and

exponentE_c(R) of the maximum-likelihood decoding error for time-varying

convo-lutional codes (without path truncation) under the BSC with crossover probability 0.4. . . 30

4.3 Exponent lower boundE_el(R) of the additional error due to early elimination and

convo-lutional codes (without early elimination) under the BSC with crossover probability

0.045. . . 37

4.4 Exponent lower boundE_el(R) of the additional error due to early elimination and

convo-lutional codes (without early elimination) under the BSC with crossover probability

(12)

4.5 Performance for (2,1,12) convolutional codes for maximum-likelihood (ML) decoder, stack algorithm with Fano metric, and MLSDA with early elimination window Δ = 30 under BSC. The generator polynomial of the code is [42554 77304] in octal.

The message lengthL = 500. . . 38

4.6 Performance for (3,1,8) convolutional codes for maximum-likelihood (ML) decoder, stack algorithm with Fano metric, and MLSDA with early elimination window Δ = 11 under BSC. The generator polynomial of the code is [557 663 711] in octal. The message lengthL = 500. . . 38

5.1 Block error rate upper bound (BLER UB) given by (5.2) and simulated BLER for (2, 1, 6) convolutional code under AWGN channels. . . 41

5.2 Block error rate upper bound (BLER UB) given by (5.2) and simulated BLER for (2, 1, 10) convolutional code under AWGN channels. . . 42

5.3 Block error rate upper bounds for (2, 1, 6) convolutional codes with L = 200. . . . 48

5.4 Block error rate upper bound for (2, 1, 8) convolutional codes with L = 200. . . . 49

5.5 Block error rate upper bounds for (2, 1, 10) convolutional codes with L = 200. . . 49

5.6 Block error rate upper bounds for (2, 1, 12) convolutional codes with L = 200. . . 50

5.7 Block error rate upper bounds for (3, 1, 8) convolutional codes with L = 200. . . . 50

5.8 Simulated block error rates for (2, 1, 6) convolutional codes with L = 200. . . 51

5.9 Simulated block error rates for (2, 1, 8) convolutional codes with L = 200. . . 51

5.10 Simulated block error rates for (2, 1, 10) convolutional codes for L = 200. . . 52

(13)

6.1 A˜_n−d(λ) for fixed d/n = 0.2 with respect to different γ. Notation “1(0)” represents that the y-tic is either 1 (for the curve below) or 0 (for the curve above). . . 64 6.2 A˜_n−d(λ) for fixed γ = −3dB with respect to different d/n ratios. Notation

“1(0)” represents that the y-tic is either 1 (for the curve below) or 0 (for the curve above). . . 65

6.3 Exempliﬁed trellis diagram for the MLSDA with early elimination. . . 69

6.4 Example that the ﬁrst extended path has a larger metric, when it is compared with

the all-zero path for the MLSDA with early elimination.. . . 70

6.5 Decoding complexity upper bounds and simulations for (2,1,10) convolutional codes.

The message lengthL = 100. . . 75

6.6 Upper bounds and simulation results of the average decoding complexity per

infor-mation bit versus message lengthL for (2,1,10) convolutional codes at SNR = 3.5

dB. . . 75

6.7 Simulation results of the average decoding complexity per information bit versus the

memory orderm. The message length L = 100. The chosen Δ = 10, 15, 20, 25, 30

(14)

Chapter 1 Introduction

The convolutional code, as invented by Elias [5] in 1955, is perhaps the most famous error cor-recting code in the history of communication industry. Right after its invention, Wozencraft and Reiffen [39] proposed a sequential decoding algorithm to effectively decode convolutional codes with large constraint lengths. Thereafter, Fano [6] developed the sequential decod-ing algorithm with extreme efficiency. These works further inspired Zigangirov [40], and independently, Jelinek [21] for the invention and development of the stack algorithm.

Unfortunately, the sequential decoding algorithm has received little attention in the past 30 years due to its sub-optimum performance and lack of efficient and cost-effective hardware implementation. It is however specially suitable for the decoding of convolutional codes with large memory order because its decoding complexity is irrelevant to the code constraint length. For this reason, the sequential decoding algorithm has recently been proposed to be used in the decoding of the so-called “super-code” that considers the joint effect of multi-path channels and convolutional codes [19].

Another commonly used decoding algorithm for convolutional codes is the Viterbi algo-rithm. It operates on a convolutional code trellis, and has been shown to be a maximum-likelihood decoder [25]. Since its decoding complexity grows exponentially with the code

(15)

constraint length, the Viterbi algorithm is usually applied only for convolutional codes with short constraint lengths.

In 2002, a variant of the sequential decoding algorithm has been established. The new variant uses a novel metric derived based on the Wagner rule, and was proved to result in maximum-likelihood performance [17]. The new sequential-type decoding algorithm was therefore termed the maximum-likelihood sequential decoding algorithm (MLSDA). By sim-ulations, the authors in [17] observed that from pure software implementation standpoint, the average decoding complexity of the MLSDA is in general considerably smaller than the Viterbi algorithm when the signal-to-noise ratio (SNR) of the additive white Gaussian noise (AWGN) channel is larger than 2 dB.

When the information sequence is long, path truncation was suggested for a practical implementation of the Viterbi decoder [25]. Instead of keeping all trellis branches of the survivor paths in the decoder memory, only a certain number of the most recent trellis branches is retained, and a decision is forced on the oldest trellis branch whenever a new data arrives in the decoder. In literature, three strategies have been proposed on the forceful decision: (1) majority-vote strategy that traces back from all states, and outputs the decision that occurs most often; (2) best state strategy that only traces back from the state with the best metric, and outputs the information bits corresponding to the path being traced; (3) random state strategy that randomly traces back from one state, and outputs the information bits corresponding to the path being traced. Although none of the three forceful strategies guarantees maximum-likelihood, their performance degradation can be made negligible as long as the traceback window or truncation window is suﬃciently large.

In [9], Forney proved by random coding argument that a truncation window of 5.8-fold of the code constraint length suﬃces to provide negligible performance degradation for the best state strategy. Hemmati and Costello [20] later derived an upper performance bound

(16)

as a function of the truncation window for a speciﬁc convolutional encoder, and obtained a similar conclusion for the best state strategy. McEliece and Onyszchuk [28] studied the tradeoﬀ between length of the truncation window and performance loss for the random state strategy, and concluded that the truncation window for the random state strategy should be about twice as large as that for the best state strategy.

Similar to the Viterbi algorithm, the decoding burden of the sequential decoding algo-rithm, both in memory consumption and in computational complexity, grows as the length of the information sequence increases. Yet, in order to compensate the SNR loss due to the additional zeros at the end of the information sequence, a long information sequence is often preferred in practice. One solution to reduce the decoding burden as a result of a practi-cally long information sequence is to introduce the path truncation concept of the Viterbi algorithm to the sequential decoding algorithm. As an example, Zigangirov considered the situation, in which the decoder traces back the top path in stack to force the decisions of the symbols at those levels prior to a backsearch limit, and derived an error probability upper bound for the sequential decoding with backsearch limit [41]. In case the channel critical rate is smaller than (κ− 1)/κ of the computational cutoﬀ rate, where κ is the ratio of the backsearch limit against code constraint length, Zigangirov’s bound was shown to reduce to the Yudkin-Viterbi bound [11] for inﬁnite backsearch limit at low to medium rates, and coincide with the random coding bound at high rate [41].

In this dissertation, an alternative approach to lower the decoding complexity of the new variant of the sequential decoding algorithm, i.e., the MLSDA, is examined. Instead of tracing back the top path in stack to force the decision of the symbols beyond the backsearch limit, we propose to directly eliminate the top path whose end node is Δ-level-prior to the farthest node among all that have been expanded thus far by the sequential search, which is so named the early elimination.

(17)

In the analysis of sufficiently large Δ such that the performance degradation is negligible, two attempts based on different techniques are made. The first one follows similarly the random coding argument used by Forney [9], while the second one elaborates the code generator polynomial specifically for the convolutional code adopted. The random coding argument then indicates that under binary symmetric channels (BSCs), the required early elimination window for negligible performance degradation is just 2.2-fold of the constraint length for rate one-half convolutional codes, and for rate one-third convolutional codes, the required early-elimination window even reduces to 1.2-fold of the constraint length. With the knowledge of code generator polynomial, additional error rate due to early elimination can be formulated under additive white Gaussian noise (AWGN) channels, which is accordingly used to determine the sufficient large early elimination window for near optimal performance. Simulations are henceforth performed, and confirm the accuracy of these analytical results. As a consequence of small early-elimination window required for near maximum-likelihood performance, the MLSDA with early-elimination modification rules out considerable compu-tational burden, as well as memory requirement, by directly eliminating a large number of the top paths. It can also be implemented together with the backsearch scheme to provide timely decision of fixed delay to further reduce the decoding complexity. This suggests the potential and suitability of the MLSDA with early elimination for applications that dictate a low-complexity software implementation with near maximum-likelihood performance.

In the analysis of the decoding complexity of the MLSDA, as well as the complexity reduction due to early elimination, upper bounds that utilize the Berry-Esseen inequality [7, Sec. XVI. 5] are established. Both the analytical and simulation results substantiate that the early elimination modiﬁcation can signiﬁcantly reduce the decoding computational complexity. Also shown from these results is that the average decoding complexity per information bit for the MLSDA with early elimination does not grow with the message

(18)

length, which makes it specially suitable for the timely decoding of codes with long message lengths.

The rest of the dissertation is organized as follows. The channel model, convolutional coding, and the conventional sequential decoding algorithm as well as the Fano metric are briefed in Section 2. The MLSDA algorithm and its variation with early elimination are presented in Section 3. The analyses of the suﬃcient early elimination window for near-maximum-likelihood performance under BSC and AWGN channels are given in Sections 4 and 5, respectively. Complexity upper bounds for both the MLSDAs with and without early elimination scheme are presented in Section 6. The concluding remarks and future work are summarized in Section 7.

(19)

Chapter 2 Convolutional Codes, Channel Models

and Sequential Decoding algorithms

In this chapter, the convolutional coding and channel model considered are introduced in Sections 2.1 and 2.2, respectively. Then, the conventional decoding algorithm as well as the Fano metric is briefed in Section 2.3.

2.1 Convolutional code and its graphical

representa-tion

A binary convolutional encoder is conveniently structured as a mechanism with shift registers and modulo-2 adders, where the encoder output bits are given by modulo-2 additions of selective shift register contents and input bits at present. Let∼ denote a binary (n, k, m)C convolutional code, in which the encoder outputs a block of n bits whenever a block of k information bits are inputted. The value m designates the maximum number of previous k-bit blocks that have to be memorized in the encoder (i.e., if the number of stages of the jth shift register is K_j, then m = max_1≤j≤kK_j). The initial values of shift registers are all zeros,1 and the current n output bits are linear combination of the present k input bits and

(20)

the previous m× k input bits. In this work, we assume that the input sequence contains k× L bits that come from k input sequences, each of length L bits. In addition, m zeros will be attached at the end of each input sequence in order to reset the encoder shift registers. Consequently, these k(L + m) input bits jointly induce n(L + m) output bits.

^ - - -s s s ) P P P P P P P P P i 6 -⊕ ⊕ c c c J J J J ] u = (11101) v1 = (1010011) v2 = (1101001) v = (11 01 10 01 00 10 11)

Figure 2.1: Encoder for the binary (2, 1, 2) convolutional code with generators g₁ = 7 (octal) and g2 = 5 (octal), where gi is the generator polynomial characterizing the ith output.

Figures 2.1 and 2.2 exemplify the encoders of binary (2, 1, 2) and (3, 2, 2) convolutional codes, respectively. As illustrated in Fig. 2.1, the encoder of the (2, 1, 2) convolutional code emits two output sequences,

v1 = (v1,0, v1,1, v1,2, . . . , v1,6) = (1010011)

and

v2 = (v2,0, v2,1, v2,2, . . . , v2,6) = (1101001)

due to the single input sequence u = (u₀, u₁, u₂, u₃, u₄) = (11101) of length L = 5, where u₀ is fed in the encoder ﬁrst. The encoder then interleaves v₁ and v₂ to yield the codeword

v = (v1,0, v2,0, v1,1, v2,1, . . . , v1,6, v2,6) = (11 01 10 01 00 10 11)

(21)

-s s s -s s s c c ⊕ - Z ZZ~ _c u2 = (11) u1 = (10) u = (11 01) v1 = (1000) v2 = (1100) v3 = (0001) v = (110 010 000 001)

Figure 2.2: Encoder for the binary (3, 2, 2) systematic convolutional code with generators g₁(1) = 4 (octal), g₁(2) = 0 (octal), g₂(1) = 0 (octal), g₂(2) = 4 (octal), g(1)₃ = 2 (octal) and g(2)₃ = 3 (octal), where g_i(j)is the generator polynomial characterizing the ith output according to the jth input. The dashed box is redundant and can actually be removed from this encoder; its presence here is only to help demonstrating the derivation of generator polynomials. Thus as far as the number of stages of the jth shift register is concerned, K1 = 1 and K2 = 2.

convolutional code in Fig. 2.2 generates the output sequences of

v1 = (v1,0, v1,1, v1,2, v1,3) = (1000),

v2 = (v2,0, v2,1, v2,2, v2,3) = (1100)

and

v3 = (v3,0, v3,1, v3,2, v3,3) = (0001)

due to the two input sequencesu₁ = (u_1,0, u_1,1) = (10) and u₂ = (u_2,0, u_2,1) = (11) of length L = 2, which in turn generates the interleaved output sequence

v = (v1,0, v2,0, v3,0, v1,1, v2,1, v3,1, v1,2, v2,2, v3,2, v1,3, v2,3, v3,3) = (110 010 000 001)

of length 3(2 + 2) = 12. In terminology, the interleaved output v is called the convolutional codeword corresponding to the combined input sequence u.

(22)

One representation that characterizes the relation between the encoder inputs and en-coder outputs is the generator polynomials. For example, g₁(x) = 1+x+x2and g₂(x) = 1+x2 can be used to identify v1 and v2 induced by u in Fig. 2.1, where the appearance of

xi _{indicates that a physical connection is applied in Fig. 2.1 at the (i + 1)th dot}

po-sition, counted from the left. To be speciﬁc, putting u and v_i in polynomial form as

u(x) = u0+u1x+u2x2+· · · and vi(x) = vi,0+vi,1x+vi,2x2+· · · yields that vi(x) =u(x)gi(x)

for i = 1, 2, where addition of coeﬃcients is based on modulo-2 operation.

Similarly, the relation between the inputs and the outputs can also be characterized by matrix operation. For example, the relation in Fig. 2.2 can be formulated as

v1(x) v2(x) v3(x)=u1(x) u2(x) g₁(1)(x) g(1)₂ (x) g₃(1)(x) g₁(2)(x) g(2)₂ (x) g₃(2)(x) ,

where v_i(x) = v_i,0 + v_i,1x + v_i,2x2 +· · · and u_j(x) = u_j,0 + u_j,1x + u_j,2x2+· · · deﬁne the ith output sequence and the jth input sequence, respectively, and the generator polynomial g_i(j)(x) characterizes the relation between the ith output and the jth input sequences. For simplicity, generator polynomials are sometimes abbreviated by their coeﬃcients in octal number format. Continuing the example in Fig. 2.1, the generator polynomials in octal format are g₁ = 7 (octal) and g₂ = 5 (octal).

A finite-length (n, k, m) convolutional code can be transformed to an equivalent linear block code with effective code rate2 R_effective = kL/[n(L + m)], where L is the length of the information input sequences. Usually, the code rate of the (n, k, m) convolutional code is referred to as R = k/n, which can be viewed as the effective code rate at L approaching infinity.

The constraint length of an (n, k, m) convolutional code has two different definitions in literature: n_A = m + 1 [38] and n_A = n(m + 1) [25]. In this dissertation, the former definition

(23)

is adopted, because it is more extensively used in industrial publications.

Let v_(a,b) = (v_a, v_a+1, . . . , v_b) denote a portion of codeword v, and abbreviate v_(0,b) by

v(b). Deﬁne the Hamming distance between the ﬁrst rn bits of codewords v and z by:

d_Hv_(rn−1),z_(rn−1)=

rn−1 i=0

v_i⊕ z_i,

where “⊕” denotes modulo-2 addition. The Hamming weight of the ﬁrst rn bits of codeword

v thus can be represented by dH(v(rn−1), 0(rn−1)), where 0 represents the all-zero codeword.

Furthermore, define the column distance function (CDF) d_c(r) of a binary (n, k, m) con-volutional code as the minimum Hamming distance between the first rn bits of any two codewords whose first n bits are distinct, i.e.,

d_c(r) = min d_H(v_(rn−1),z_(rn−1)) :v_(n−1) = z_(n−1) for v, z ∈ C∼,

where∼ is the set of all codewords. Clearly, dC _c(r) is nondecreasing in r. Two cases of CDFs are of speciﬁc interest: r = m + 1 and r = ∞. In r = ∞ case, the Hamming distance should be calculated with inﬁnite-length input sequences. However, d_c(r) for an (n, k, m) convolutional code reaches its largest value d_c(∞) when r is a little beyond 5 × m in most cases. This property facilitates the determination of d_c(∞). The value d_c(∞), or d_free in general, is called the free distance, whereas d_c(m + 1) is called the minimum distance of the convolutional code.

The operational meanings of the minimum distance, the free distance and the CDF of a convolutional code are as follows. When a maximum-likelihood decoder is employed onto a received codeword with sufficiently large length, the error correcting performance is mainly characterized by d_free [36]. On the other hand, if a decoder figures the transmitted bits only based on the first n(m + 1) received bits (as in, for example, the majority-logic decoding [26]), d_c(m + 1) can be used instead to characterize the error correcting capability. Finally, the column distance function characterizes the decoding computational complexity, defined

(24)

as the number of metric computations performed for the sequential decoding algorithm. Usually, the sequential decoding algorithm requires a rapid initial growth of CDF in order to have a small decoding complexity.

Next, we introduce two graphical representations, code tree and trellis, of convolutional codewords. A code tree of a binary (n, k, m) convolutional code presents every codeword as a path on a tree. For input sequences of length L bits, the code tree consists of (L + m + 1) levels. The single leftmost node at level 0 is called the origin node. At the ﬁrst L levels, there are exactly 2k _{branches leaving each node. For those nodes located at levels L through}

(L + m), only one branch remains. The 2kL _{rightmost nodes at level (L + m) are called}

the terminal nodes. As expected, a path from the single origin node to a terminal node represents a codeword; therefore, it is named the code path corresponding to the codeword. Figure 2.3 illustrates the code tree for the encoder in Fig. 2.1 with a single input sequence of length 5.

In contrast to a code tree, a code trellis as termed by Forney [8] is a structure obtained from a code tree by merging those nodes in the same state. The state associated with a node is determined by the associated shift-register contents. For a binary (n, k, m) convolutional code, the number of states at levels m through L is 2K_{, where K =}k

j=1Kj and Kj is the

length of the jth shift register in the encoder; hence, there are 2K _{nodes on these levels.}

Due to node merging, only one terminal node remains in a trellis. Analogous to a code tree, a path from the single origin node to the single terminal node in a trellis also mirrors a codeword. Figure 2.4 exempliﬁes the trellis of the (3, 1, 2) convolutional code.

(25)

s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s level 0 1 2 3 4 5 6 7 0/00 1/11 0/00 1/11 0/10 1/01 0/00 1/11 0/10 1/01 0/11 1/00 0/01 1/10 0/00 1/11 0/10 1/01 0/11 1/00 0/01 1/10 0/00 1/11 0/10 1/01 0/11 1/00 0/01 1/10 0/00 1/11 0/10 1/01 0/11 1/00 0/01 1/10 0/00 1/11 0/10 1/01 0/11 1/00 0/01 1/10 0/00 1/11 0/10 1/01 0/11 1/00 0/01 1/10 0/00 1/11 0/10 1/01 0/11 1/00 0/01 1/10 0/00 0/10 0/11 0/01 0/00 0/10 0/11 0/01 0/00 0/10 0/11 0/01 0/00 0/10 0/11 0/01 0/00 0/10 0/11 0/01 0/00 0/10 0/11 0/01 0/00 0/10 0/11 0/01 0/00 0/10 0/11 0/01 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11 0/00 0/11

Figure 2.3: Code tree for the binary (2, 1, 2) convolutional code in Fig. 2.1 with single input sequence of length 5. Each branch is labeled by its respective “input bit/output code bits”. The code path indicated by the thick line is labeled in sequence by code bits 11, 01, 10, 01, 00, 10 and 11, and its corresponding codeword is v = (11 01 10 01 00 10 11).

(26)

n n n n n n n n s0 s0 s0 s0 s0 s0 s0 s0 n n n n n s1 s1 s1 s1 s1 n n n n n s₂ s₂ s₂ s₂ s₂ n n n n s3 s3 s3 s3 @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ A A A A A A A AA A A A A A A A AA A A A A A A A AA A A A A A A A AA Original node Terminal node level 0 1 2 3 4 5 6 7 000 000 000 000 000 000 000 111 111 111 111 111 101 101 101 101 101 010 010 010 010 001 001 001 110 110 110 110 100 100 100 011 011 011 011 011 A A A A A A A AA @ @ @ @ @ @ @ @

Figure 2.4: Trellis for a (3, 1, 2) binary convolutional code with information length L = 5. In this

case, the code rate R = 1/3 and the codeword length N = 3(5 + 2) = 21. The code path indicated

by the thick line is labeled by 111, 010, 001, 110, 100, 101 and 011, thus its corresponding codeword isv = (111010001110100101011).

(27)

2.2 Channel models for hard-decision and soft-decision

decoders

When the n(L + m) convolutional code bits, encoded from kL input bits, are modulated into respective waveform for transmission over a physical medium, the received waveform is garbled by attenuation, distortion, interference, noise, etc. The demodulator then transforms the received waveform into discrete signals for use by the decoder to determine the original transmitted sequences. If the discrete signals are of two values, usually denoted by {0, 1}, then the demodulator is termed a hard-decision demodulator. If the demodulator passes discrete-in-time but continuous-in-value analog outputs to the decoder, then it is classiﬁed as a soft-decision demodulator. Terminologically, if a soft-decision demodulator is employed, then the subsequent decoder is also classiﬁed as a soft-decision decoder. In situation in which the decoder receives inputs from a hard-decision demodulator, the decoder is called a hard-decision decoder. In general, the soft-decision decoder provides better error correcting performance than the hard-decision decoder.

The decoder should determine the original information sequences based on the n(L + m) demodulator decision outputs according to some criterion. The criterion that most frequently applies is the maximum-likelihood decoding (MLD) rule. It is well-known that the MLD minimizes the codeword error probability under the premiss that the transmitted codewords are equiprobable.

In this dissertation, we focus on two typical channel types — the binary symmetric chan-nel (BSC) and the additive white Gaussian noise (AWGN) chanchan-nel. The former is a typical channel model for the performance evaluation of hard-decision decoders, while the latter is widely used in examining the error rate of soft-decision decoders. It should be mentioned that for a coding system, a channel is simply a signal passage that aggregates all the intermediate eﬀects onto the signal, including modulation, upconversion, signal distortion,

(28)

downconver-sion, demodulation, thermal noise and others. The demodulator in concept incorporates these eﬀects into a widely adopted additive channel model as

r = s + n,

where r is the demodulator output, s is the transmitted signal, and n represents the aggre-gated signal distortion, simply termed noise.

The aggregated signal distortions for every transmitted and received bits are further assumed to be independent and identically distributed with common marginal distribution, which is termed memoryless. The extension to multiple independent channel usages is given by

r_j = s_j + n_j,

for 0≤ j ≤ N − 1, where all {n_j}N−1_j=0 share the same probability distribution. In situation where the power spectrum of the noise samples is a constant, which can be interpreted as the noise contributing equal power at all frequencies, the noise is dubbed white. Therefore, the AWGN channel for a time-discrete coding system speciﬁcally indicates a memoryless noise sequence with a Gaussian distributed marginal.

As it turns out, the decoder inputs r0, r1, . . . , rN−1 are independent and Gaussian

dis-tributed with means s0, s1, . . . , sN−1, respectively, and equal variance N0/2, where N0/2 is

the doubled-sided noise power per hertz. Assuming an antipodal transmission and equal prior on c_j ∈ {0, 1} gives

s_j = s_j(c_j) = (−1)cj√_E,

where c_j ∈ {0, 1} is the jth code bit, and E = E[s2_j] = 1 2 √ E 2 +1 2 −√E 2

(29)

An index that guides the error performance for AWGN channels is the signal-to-noise ratio (SNR). For the time-discrete system considered, it is deﬁned as the average signal energy E divided by N0. Notably, the SNR ratio is invariable with respect to scaling of the

demodulator output. In other words, the SNR ratio remains unchanged by scaling r_j by a multiplicative factor λ, since

λ· r_j = λ· (−1)cj√_{E + λ}· n

j.

Accordingly, the performance of the soft-decision decoding algorithm under AWGN channels is often illustrated by error rate against SNR. In order to account for the code redundancy for diﬀerent code rates, the code bit energy E is further transformed to E_b, the equivalent average transmission energy per information bit. Their relation can be easily characterized by E_b = E/R_eﬀective = E× [n(L + m)/(kL)] as the energy of n(L + m) code bits should be equally distributed to kL information bits. Thus, a new index, denoted by E_b/N₀, is used instead of SNR= E/N0 in plotting the performance curves.

The channel model can be further simpliﬁed to binary channel input and binary channel output, for which the noise sample n and the transmitted signal s are both elements of {0, 1}. Their modulo-2 addition yields the hard-decision demodulation output r. The binary channel statistics can be deﬁned using two crossover probabilities: p1 = Pr(r = 1|s = 0)

and p2 = Pr(r = 0|s = 1). In this dissertation, we focus on the case that two crossover

probabilities are equal p₁ = p₂ = p. The binary channel is therefore symmetric, and is called the binary symmetric channel. The binary symmetric channel can be treated as a quantized simpliﬁcation of the AWGN channel. Hence, the crossover probability p can be derived from

r_j = (−1)cj√_{E + n} j as p = 1 2erfc E N0 ,

(30)

where erfc(x) = √2 π _∞ x e−x2dx

is the complementary error function. This convention is adopted here in presenting the performance ﬁgures for BSCs.

Throughout the dissertation, as there exists a one-to-one correspondence between the transmitted signals s = (s0, s1, . . . , sN−1) and the code words c = (c0, c1, . . . , cN−1),

Pr(r|c) = N−1 j=0 Pr(r_j|c_j) and Pr(r|s) = N−1 j=0 Pr(r_j|s_j)

will be used interchangeably to represent the channel statistics of receiving r given that s (equivalently,c) is transmitted.

2.3 Sequential decoding of convolutional codes

Since its discovery in 1963 [6], the Fano metric has become the most popular path metric in sequential decoding. The Fano metric was originally discovered through massive simulations, and was ﬁrst used by Fano in his sequential decoding algorithm on code trees [6]. For any path v_(n−1) that ends at level on a code tree, the Fano metric is deﬁned as:

Mv_(n−1)|r_(n−1)=

n−1 j=0

M (v_j|r_j),

where r = (r0, r1, . . . , rN−1) is the received vector, and the bit metric is deﬁned as

M (v_j|r_j) = log₂ Pr(r_j|v_j) Pr(r_j) − R.

In the above bit metric formula, R = k/n is the convolutional code rate, and the calculation of Pr(r_j) follows the convention that the code bits are transmitted with equal probability,

(31)

i.e., Pr(r_j) = vj∈{0,1} Pr(v_j) Pr(r_j|v_j) = 1 2Pr(rj|vj = 0) + 1 2Pr(rj|vj = 1).

For example, for BSCs with crossover probability p, where 0 < p < 1/2, the Fano metric for path v_(n−1) is given by:

M (v_(n−1)|r_(n−1)) = n−1 j=0 log₂Pr(r_j|v_j) + n(1− R), (2.1) where log₂Pr(r_j|v_j) = log₂(1− p), for r_j = v_j; log₂(p), for r_j = v_j. In terms of the Hamming distance, (2.1) can be re-written as:

Mv_(n−1)|r_(n−1)=−α · d_H(r_(n−1),v_(n−1)) + β· , (2.2) where α =− log₂[p/(1−p)] > 0, and β = n[1−R+log₂(1−p)]. It can be easily observed from (2.2) that a larger Hamming distance between the path labels and the respective portion of the received vector results in a smaller path metric. This property guarantees that when the received vectorr is exactly the transmitted codeword, and R < 1 + log₂(1− p) (equivalently, β > 0), the path metric increases along the correct code path, and the path metric along any incorrect path is smaller than that of the equally long correct path.3 Such a property is essential for a metric to work properly with sequential decoding.

The ZJ algorithm was discovered by Zigangirov [40] and later independently by Jelinek [21] to search over a code tree for the optimal codeword based on the Fano metric. The algorithm is also called the stack algorithm because a stack is required in its implementation. For completeness, the stack algorithm [25] is quoted below.

3 _{Without the assumption of error free reception, the code rate margin, below which the}

Fano-metric-based sequential decoding performs well, is thechannel capacity. For BSCs with crossover probability P , the

channel capacity is equal toC = 1 + p log₂(p) + (1 − p) log₂(1− p). The condition that R < 1 + log₂(1− p) =

C + p log₂[(1− p)/p], derived from β > 0, can only justify the subsequent argument under the special case

of error free reception. Channel capacity as a well-performed code rate margin for sequential decoding is beyond the scope of this dissertation. Interested readers can refer to [4].

(32)

<The Stack (ZJ) Algorithm>

Step 1. Load the stack with the origin node in the tree, whose metric is taken to be zero. Step 2. Compute the metrics of the successors of the top path in the stack.

Step 3. Delete the top path from the stack.

Step 4. Insert the new paths in the stack and rearrange the paths in the stack in order of decreasing metric values.

Step 5. If the top path in the stack ends at a terminal node in the tree, the algorithm stops. Otherwise, return to Step 2.

A major issue in the implementation of the stack algorithm is the efficient maintenance of the stack. For example, the efficiency in the rearrangement of paths in the stack in Step 4 will greatly affect the time consumed in the sequential search.

Another issue that a practical implementation of the stack algorithm may encounter is that the stack size is finite in practice, and therefore, may be insufficient to accommodate the possible large number of paths examined during the search process. The situation is usually addressed as stack overflow. A straightforward way to deal with the stack overflow problem is to discard the paths with smaller metric values [25], since they are less likely to be the optimal code path. The technical issue remained is the determination of the practical stack size such that the performance degradation due to path discarding is within acceptable region.

(33)

Chapter 3 MLSDA and the Proposed Early

Elimination Scheme

Assume that the binary codeword in a (N, K) linear block code∼ is transmitted over aC binary-input time-discrete channel with channel output r (r0, r1, . . . , rN−1). Deﬁne the

hard-decision sequence y (y0, y1, . . . , yN−1) corresponding tor as:

y_j

1, if φ_j < 0; 0, otherwise,

where φ_j log[Pr(r_j|v_j = 0)/ Pr(r_j|v_j = 1)], and Pr(r_j|v_j) is the channel transition proba-bility of r_j given v_j. According to the Wagner rule, the maximum-likelihood decoding output ˆ

v for received vector r is given by

ˆ

v = y ⊕ e∗_, _(3.1)

where “⊕” is the bit-wise exclusive-or operation, e∗is the one with the smallestN−1_j=0 e_j|φ_j| among all error patternse ∈ {0, 1}N satisfyingeHT =yHT, andH is the parity check matrix of∼. Here, superscript “T ” is used to denote the matrix transpose operation. Recall thatC a binary (n, k, m) convolutional code with input sequence of length L can be treated as a (N, K) linear block code with N = n(L + m) and K = kL. Based on the observation in (3.1), a new sequential-type decoder can be established by replacing the Fano metric in the

(34)

conventional sequential decoding algorithm by a metric deﬁned as: μx_(n−1)

n−1 j=0

μ(x_j), (3.2)

where x_(n−1) = (x0, x1, . . . , xn−1)∈ {0, 1}n represents the label of a path ending at level

in the (n, k, m) convolutional code tree, and μ(x_j) (y_j ⊕ x_j)|φ_j|. Since the new decoding metric is nondecreasing along the code path, and since finding e∗ is equivalent to finding the code path with the smallest metric in the code tree, it was proved in [17] that the new sequential-type decoder can always locate the maximum-likelihood codeword through the priority-first sequential codeword search. For this reason, the new sequential-type decoder is named the maximum-likelihood sequential decoding algorithm (MLSDA) [17].

By adding a second stack, the MLSDA can be made to operate on a code trellis instead of a code tree [17]. The two stacks used in the trellis-based MLSDA are referred to as the Open Stack and the Closed Stack. The Open Stack contains all paths that end at the frontier part of the trellis being thus far explored (cf. Fig. 3.4). The Open Stack functions similarly as the single stack in the conventional sequential decoding algorithm. The Closed Stack stores the information of the ending states and ending levels of the paths that had been the top paths of the Open Stack. The Closed Stack is used to determine whether two paths intersect in the code trellis during the sequential search. The trellis-based MLSDA [17] is quoted below for completeness.

<Trellis-Based MLSDA>

Step 1. Load the Open Stack with the origin node whose metric is zero.

Step 2. Put into the Closed Stack both the state and the same level of the end node of the top path in the Open Stack. Compute the path metric for each of the successor paths of the top path in the Open Stack by adding the branch metric of the extended branch to the path metric of the top path. Delete the top path from the Open Stack.

(35)

Step 3. Discard the successor paths in Step 2, which end at a node that has the same state and level as any entry in the Closed Stack. If any successor path ends at the same node as a path already in the Open Stack, eliminate the path with higher path metric.1

Step 4. Insert the remaining successor paths into the Open Stack in order of ascending path metrics. If two paths in the Open Stack have equal metric, sort them in order of descending levels. If, in addition, they happen to end at the same level, sort them randomly.

Step 5. If the top path in the Open Stack reaches the end of the convolutional code trellis, the algorithm stops; otherwise go to Step 2.

2 2.5 3 3.5 4 4.5 10−6 10−5 10−4 10−3 10−2 E_b / N₀ BER (2,1,6) (2,1,10)

Figure 3.1: Bit error rates of the MLSDA for (2, 1, 6) and (2, 1, 10) convolutional codes with L = 100.

1 _{For discrete channels, it may occur that the successor path not only ends at the same node as some}

path already in the Open Stack but also has equal path metric to it. In such case, just randomly eliminate one of them.

(36)

1 2 3 4 5 6 7 100 101 102 103 104 E_b / N₀

Average Decoding Complexity Per Information Bit

(2,1,10) (2,1,6)

Figure 3.2: Average decoding complexities of the MLSDA for (2, 1, 6) and (2, 1, 10) convo-lutional codes with L = 100.

We next show the simulation results of performance and average decoding complexity for the MLSDA. The bit error rates of the MLSDA for (2, 1, 6) and (2, 1, 10) convolutional codes are summarized in Fig. 3.1, while the decoding complexity as measured by the num-ber of metric computations is depicted in Fig. 3.2. Notably, the computational eﬀorts of sequential-search decoding algorithms, including the MLSDA, are in fact determined not only by the number of metrics computed but also by the cost of searching and inserting of the stack elements. The latter cost however can be made of comparable order to the former by adopting the double-ended heap (DEAP) [3] data structure in the stack implementation.2 This justiﬁes the common usage of number of metric computations as the key determinant of the algorithmic complexity of the sequential-search decoding algorithm.

2_{In practical decoder design, only stacks with ﬁnite size are available. When the stack is full, one common}

strategy is to remove the bottom path, that is, the path with the worst path metric. A double ended heap is thus useful in this regard because it can access the top path as well as the bottom path in case of stack overflow. Throughout this dissertation, we assume an infinite stack size and hence, no stack overflow strategy is required. However, we propose to use DEAP for future practical decoder implementation.

(37)

50 100 150 200 250 300 350 101

102 103

Message Length

Average Decoding Complexity Per Information Bit

(2,1,10) Codes, AWGN, E_b/N₀ = 3.5 dB

Figure 3.3: Average decoding complexity versus information length for the MLSDA applied to the (2, 1, 10) convolutional code.

It can be observed from Fig. 3.2 that the average decoding complexities for the MLSDA is high for low SNRs. An even more serious problem is that the average decoding complexity per information bit grows as the information length increases as shown in Fig. 3.3. This phenomenon restricts the usage of the MLSDA for long convolutional codes.

We therefore introduce the early elimination modification to alleviate the problem of growing complexity with respect to the information length. The modification is based on the following observation. Suppose that the path ending at node C in Fig. 3.4 is a portion of the final code path to be located at the end of the sequential search, and suppose that the path ending at node D happens to be the current top path. Then, expanding node D until all of its offsprings finally have decoding metrics exceeding those of the successors of the path ending at node C may consume considerable but unnecessary number of computational efforts. This observation hints that by setting a proper level threshold Δ and directly eliminating the top

(38)

path whose level is no larger than (_max− Δ), where _max is the largest level for all nodes that have been expanded thus far by the sequential search, the computational complexity of the MLSDA may be reduced without sacriﬁcing much of the performance.

Figure 3.4: Early elimination window Δ in the trellis-based MLSDA.

It should be mentioned that since the decoding metric is monotonically nondecreasing along the path portion to be searched, the path that updates the current max is always the

one with the smallest path metric among all paths ending at the same level [17]. In fact, this is the key to ensure that for the sequential search using the maximum-likelihood metric in (3.2), the ﬁrst top path that reaches the last level of the code tree or code trellis is exactly the maximum-likelihood code path.

Based on the above observation, we propose to set a level threshold Δ in the trellis-based MLSDA, and directly eliminate the top path whose level is no larger than (_max− Δ). For this modiﬁcation, we only need to modify Step 2 in the trellis-based MLSDA as follows. <Trellis-Based MLSDA with Early Elimination Modiﬁcation>

(39)

Initialization. Set a level threshold Δ. Assign _max= 0.

Step 2. Perform the following check before executing the original Step 2 in the trellis-based MLSDA.

• If the top path in the Open Stack ends at a node whose level is no larger than (_max−Δ), directly eliminate the top path, and go to Step 5; otherwise, update _max if it is smaller than the ending level of the current top path.

The choice of Δ is apparently a tradeoﬀ between complexity and bit error probability. Intuitively, the smaller the Δ, the higher the possibility that the maximum-likelihood path is early eliminated. From simulation results, we found that the performance degradation is almost negligible simply for a small Δ. This encourages us to analyze the least value of Δ to produce near maximum-likelihood performance as well as the complexity reduction due to this early elimination modiﬁcation.

(40)

Chapter 4 Analysis of the Window Size for

Negligible Performance Degradation

over BSC Channels

This chapter provides detailed derivation on the early elimination window that yields negligi-ble performance degradation for binary symmetric channel (BSC) channels. As the random coding analysis is the main technique used to analyze the window size for the MLSDA, we will ﬁrst review the random coding technique in the analysis of the truncation window size in Viterbi decoders in Section 4.1. Then, the derivation of the early elimination window for the MLSDA such that the performance degradation is negligible is presented in Section 4.2. Numerical and simulation results will be given in Section 4.3.

4.1 Random Coding Analysis of the Path Truncation

Window in Viterbi Decoder

In [10], Gallager considered the discrete memoryless channel with input alphabet size I, output alphabet size J and channel transition probability P_ji, and presented the random coding bound for the maximum-likelihood decoding error P_e of the (N, K) block code as:

(41)

for all 0≤ ρ ≤ 1, where R = log(IK_{)/N = (K/N ) log(I) is the code rate measured in nats}

per symbol, p = (p₁, p₂,· · · , p_I) is the input distribution adopted for the random selection of codewords, and E₀(ρ,p) − log J j=1 _I i=1 p_iP_ji1/(1+ρ) 1+ρ . (4.1)

Gallager’s result leads to the well-known random coding exponent: E_r(R) max

0≤ρ≤1maxp [−ρR + E0(ρ,p)] = max0≤ρ≤1[−ρR + E0(ρ)],

where E0(ρ) maxpE0(ρ,p) is the Gallager function [42]. Notably, the random coding

exponent is a lower bound of the channel reliability function E(R) lim_N→∞−(1/N) log(P_e) (provided the limit exists), and is tight for code rates above the cutoﬀ rate.

Figure 4.1: Single-input n-output encoder model considered in [36]. All elements are in GF(q),

whereq is either a prime or a power of a prime.

In [36], Viterbi applied similar random coding argument to the derivation of the decoding error for time-varying convolutional codes. Speciﬁcally, he considered a single-input n-output convolutional encoder with one (m + 1)-stage shift register as shown in Fig. 4.1. The n inner product computers may change with each new input symbol, and hence, a time-varying code trellis is resulted. As all elements are assumed to be in GF(q), each input symbol will induce q branches on the code trellis, and each branch is labelled by n channel symbols.

(42)

As a result of the attached m zeros at the end, the encoder will produce n(L + m) output channel symbols in response to the input sequence of L symbols. Under the above system setting, Viterbi showed that the maximum-likelihood decoding error P_e,c for time-varying convolutional codes can be upper-bounded by:

P_e,c ≤ q− 1

1− q−λ/R exp[−n(m + 1)E0(ρ)] (4.2) for all 0 ≤ ρ ≤ 1, where R log(q)/n is the code rate in unit of nats per symbol, and λ E0(ρ)− ρR is a constant. Since λ is required to be positive, it can be concluded that:

lim inf

n→∞ −

1

nlog Pe,c ≥ (m + 1)Ec(R),

where E_c(R) max_{{ρ∈[0,1] : E}₀_(ρ)>ρR}E₀(ρ). For symmetric channels, E₀(ρ) is an increasing and concave function in ρ with E₀(0) = 0; therefore, E_c(R) can be reduced to:

E_c(R) = ⎧ ⎨ ⎩ R₀, if 0≤ R < R₀; E0(ρ∗), if R0 ≤ R < C; 0, if R≥ C, (4.3)

where R0 = E0(1) is the cutoﬀ rate, C = E0(0) is the channel capacity, and ρ∗ = ρ∗(R) is

the unique solution of E0(ρ) = ρR. It is also shown in the same work that Ec(R) is a tight

exponent for R ≥ R₀.

In order to derive the path truncation window with near-optimal performance, Forney [9] treated the truncated convolutional code as a block code, and upper-bounded the additional decoding error P_e,T due to path truncation in the Viterbi decoder by means of Gallager’s technique as:

P_e,T ≤ exp[−nτE_r(R)], (4.4) where τ is the truncation window size. Forney then noticed that as long as

lim inf

n→∞ −

1

nlog Pe,T > lim sup_n→∞ − 1

(43)

the additional error P_e,T due to path truncation becomes exponentially negligible with respect to P_e,c. For R ≥ R₀, condition (4.5) reduces to

τ E_r(R) > (m + 1)E_c(R)

by inequality (4.4) and the tightness of E_c(R). A specific case is given in Fig. 4.2 in which the binary symmetric channel (BSC) with crossover probability 0.4 gives that the path truncation window at the cutoff rate R₀ = 0.0146 bit/symbol must be larger than E_c(R₀)/E_r(R₀) ≈ 0.0146/0.0025 = 5.84-fold of the code constraint length. This number parallels the one obtained under the very noisy channels, where 5.8-fold of the code constraint length is suggested for the path truncation window at the cutoff rate [37].

0 0.005 0.01 0.015 0.02 0.025 0.03 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 Rate (bit/symbol) Error Exponent

Cross Over Probability 0.4, Capacity = 0.0290, Cutoff Rate 0.0146 bits/symbol E_c(R) E_r(R)

0.0146

0.0025

Figure 4.2: Exponent lower bound E_r(R) of the additional error due to path truncation and

exponent E_c(R) of the maximum-likelihood decoding error for time-varying convolutional codes

(44)

4.2 Suﬃcient Large Window Size for the MLSDA

For simplicity, the analysis in this section is restricted to the simple BSC with crossover probability . Extension analysis to other discrete channels can be likewise established.

Refer to Fig. 3.4 in our analysis below. Suppose that the path ending at node B at level is the current top path of the Open Stack, and let the current _max be updated due to the expansion of node C. According to the merging operation at Step 3 of the trellis-based MLSDA, any two paths that survive in the Open stack can be traced back to a common node before which they shares common traces. Hence, we may assume that the path that ends at node B and the path that updates the current _maxhave common traces before node A, whose level, without loss of generality, can be assumed zero in the below analysis.

Observe that the current top path ending at node B is early-eliminated if, and only if, node C is expanded earlier than node B, provided ≤ _max−Δ. Since the decoding metric of the MLSDA is nondecreasing along the path portion to be searched, that node C is expanded prior to node B is equivalent to that

μx_(n−1)≥ μx˜₍_max_n−1), (4.6) which in turn is equivalent to

(1− )n(max−)· Prr

(n−1)x(n−1)≤ Prr(maxn−1)x˜(maxn−1)

. (4.7)

The above statement can be proved as follows. For the BSC with crossover probability 0 < < 1/2, φ_j log Pr(rj|vj = 0) Pr(r_j|v_j = 1) = log[(1− )/], if r_j = 0; log[/(1− )], if r_j = 1. Hence, r_j = y_j = 1, if φ_j < 0; 0, otherwise ,

(45)

and μ(x_j) = (y_j⊕ x_j)|φ_j| = (r_j⊕ x_j) log[(1− )/]. As a result, (4.6) is equivalent to μx_(n−1)≥ μx˜₍_max_n−1) ⇔ n−1 j=0 μ(x_j)≥ maxn−1 j=0 μ(˜x_j) ⇔ n−1 j=0 (r_j ⊕ x_j)≥ maxn−1 j=0 (r_j⊕ ˜x_j) ⇔ n−1 j=0 [(r_j ⊕ ˜x_j)− (r_j⊕ x_j)] + maxn−1 j=n (r_j⊕ ˜x_j)≤ 0. Similarly, (4.7) is equivalent to (1− )n(max−)_Pr_r (n−1)x(n−1)≤ Prr(maxn−1)x˜(maxn−1) ⇔ n−1 j=0

log Pr(r_j|x_j) + n(max− ) log(1 − ) ≤

maxn−1 j=0 log Pr(r_j|˜x_j) ⇔ n−1 j=0

[(1− r_j⊕ x_j) log(1− ) + (r_j⊕ x_j) log()] + n(max− ) log(1 − )

≤ maxn−1 j=0 [(1− r_j ⊕ ˜x_j) log(1− ) + (r_j ⊕ ˜x_j) log()] ⇔ log (1− ) _n−1 j=0 [(r_j⊕ ˜x_j)− (r_j ⊕ x_j)] + maxn−1 j=n (r_j⊕ ˜x_j) ≤ 0 ⇔ n−1 j=0 [(r_j⊕ ˜x_j)− (r_j ⊕ x_j)] + maxn−1 j=n (r_j ⊕ ˜x_j)≤ 0. Therefore, the desired equivalence of (4.6) and (4.7) is validated.

By noting that for the MLSDA, the path that updates the current _max is exactly the one with the smallest path metric among all paths ending at the same level [17], condition (4.7) can be equivalently re-written as:

(1− )n(max−)· Prr

(n−1)x(n−1)≤ _˜ max

x(maxn−1)∈ C∼max

Prr₍_max_n−1)x˜₍_max_n−1), (4.8) where∼C_max is the set of all labels of length maxn, whose corresponding paths consist of

(46)

be introduced by early elimination if (4.8) is valid for some and _max with ≤ _max− Δ, when x is the transmitted codeword.1

Continue the derivation by replacing _max by β for notational convenience. The proba-bility ξ(, β) that (4.8) occurs is given by:

ξ(, β) =

r(βn−1)∈{0,1}βn

Φ0r(βn−1)Prr(βn−1)x(βn−1), (4.9)

where Φ0r(βn−1)= 1 if (4.8) is valid, and 0, otherwise. From

Φ0r(βn−1)≤ ⎡ ⎢ ⎢ ⎢ ⎣ ˜ x(βn−1)∈ C∼β Prr_(βn−1)x˜_(βn−1)1/(1+ρ) (1− )n(β−)/(1+ρ)_Prr (n−1)x(n−1)1/(1+ρ) ⎤ ⎥ ⎥ ⎥ ⎦ ρ for ρ≥ 0, we obtain: ξ(, β) ≤ r(βn−1)∈{0,1}βn ⎡ ⎢ ⎢ ⎢ ⎣ ˜ x(βn−1)∈ C∼β Prr_(βn−1)x˜_(βn−1)1/(1+ρ) (1− )n(β−)/(1+ρ)_Prr_(n−1)x_(n−1)1/(1+ρ) ⎤ ⎥ ⎥ ⎥ ⎦ ρ Prr_(βn−1)x_(βn−1). Taking expectation of ξ(, β) with respect to random selection of codewords of length (βn) according to code bit selection distributionp = (p0, p1), where p0and p1 are the probabilities

1_{Since early-elimination of the path with label} _{x is always performed whenever (4.8) is valid, it is clear}

that additional error is introduced only when the transmitted labelx corresponds to the maximum-likelihood

code path. In other words, whenx does not label the maximum-likelihood code path, the validity of (4.8)

or early-elimination of the path with labelx will not add a new error to maximum-likelihood decoding. As

what we concern is an upper probability bound for the additional error due to early-elimination, it suﬃces to analyze the probability bound on the occurrence of (4.8).

(47)

respectively for bits 0 and 1, yields that: ξ(, β) ≤ (1 − )−n(β−)ρ/(1+ρ) r(βn−1)∈{0,1}βn ⎡ ⎣ ˜ x(βn−1)∈ C∼β Prr_(βn−1)x˜_(βn−1)1/(1+ρ) ⎤ ⎦ ρ ×Prr(n−1)x(n−1)1/(1+ρ)Prr(n,βn−1)x(n,βn−1) (4.10) ≤ (1 − )−n(β−)ρ/(1+ρ) r(βn−1)∈{0,1}βn ⎡ ⎣ ˜ x(βn−1)∈ C∼β Prr_(βn−1)x˜_(βn−1)1/(1+ρ) ⎤ ⎦ ρ ×Prr(n−1)x(n−1)1/(1+ρ)Prr(n,βn−1)x(n,βn−1) (4.11) = | C∼_β|ρ× (1 − )−n(β−)ρ/(1+ρ) r(βn−1)∈{0,1}βn Prr_(βn−1)x˜_(βn−1)1/(1+ρ) !_ρ Prr_(n−1)x_(n−1)1/(1+ρ)Prr_(n,βn−1)x_(n,βn−1),

where (4.10) holds since labels x_(n−1) and any labels in∼C_β are selected independently, and (4.11) is valid due to Jensen’s inequality with ρ ≤ 1. Finally, by noting that | C∼_β| ≤ 2kβ ₌

2nβR_{, we obtain:}

ξ(, β) ≤ 2−n[−ρR+E0(ρ,p)]· 2−(β−)n[−ρR+ρ log2(1−)/(1+ρ)+E1(ρ,p)]_, _(4.12)

where E₀(ρ,p) − log₂ 1 j=0 ₁ i=0 p_iPr(r = j|v = i)1/(1+ρ) 1+ρ and E₁(ρ,p) − log₂ ₁ j=0 ₁ i=0 p_iPr(r = j|v = i) ₁ i=0 p_iPr(r = j|v = i)1/(1+ρ) ρ .

Inequality (4.12) provides an upper probability bound for a top path ending at level being early-eliminated. Based on (4.12), we can proceed to derive the bound for the probability P_e,E that an incorrect codeword is claimed at the end of the sequential-type search because the correct path is early-eliminated during the decoding process.

二元迴旋碼之接近最大機率循序搜尋解碼演算法

國 立 交 通 大 學

電信工程學系

博 士 論 文

二元迴旋碼之接近最大機率循序搜尋解碼

演算法

Near Maximum-Likelihood Sequential-Search

Decoding Algorithms for Binary

Convolutional Codes

研 究 生：謝 欣 霖

二元迴旋碼之接近最大機率循序搜尋解碼演算法

Near Maximum-Likelihood Sequential-Search

Decoding Algorithms for Binary Convolutional Codes

研究生：謝欣霖

Student:

Shin-Lin

Shieh

指導教授：陳伯寧 博士 Advisor:

Dr.

Po-Ning

Chen

國立交通大學

電信工程學系

博士論文

A Dissertation

Submitted to Institute of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Communication Engineering

Hsinchu, Taiwan

二元迴旋碼之接近最大機率循序搜尋

解碼演算法

研究生：謝欣霖

指導教授：陳伯寧 博士

國立交通大學電信工程研究所

摘 要

NEAR MAXIMUM-LIKELIHOOD

SEQUENTIAL-SEARCH DECODING ALGORITHM

FOR BINARY CONVOLUTIONAL CODES

Student: Shin-Lin Shieh

Advisor: Dr. Po-Ning Chen

Department of Communication Engineering

National Chiao Tung University

Abstract

誌 謝

Contents

List of Figures

Chapter 1

Introduction

Chapter 2

Convolutional Codes, Channel Models

and Sequential Decoding algorithms

2.1

Convolutional code and its graphical

representa-tion

2.2

Channel models for hard-decision and soft-decision

decoders

2.3

Sequential decoding of convolutional codes

Chapter 3

MLSDA and the Proposed Early

Elimination Scheme

Chapter 4

Analysis of the Window Size for

Negligible Performance Degradation

over BSC Channels

4.1

Random Coding Analysis of the Path Truncation

Window in Viterbi Decoder

4.2

Suﬃcient Large Window Size for the MLSDA

國立交通大學

博士論文

研究生：謝欣霖

指導教授：陳伯寧博士 Advisor:

指導教授：陳伯寧博士

摘要

誌謝