基於搜尋演算法的不定長度錯誤更正前置碼之設計

(1)

國

立

交

通

大

學

電信工程研究所

博

士

論

文

基於搜尋演算法的不定長度錯誤更正前置碼之設計

Algorithmic Design of Variable-Length Error-Correcting Prefix Code

研究生：吳庭伊

指導教授：陳伯寧教授

(2)

基於搜尋演算法的不定長度錯誤更正前置碼之設計

Algorithmic Design of Variable-Length Error-Correcting Prefix Code

研究生：吳庭伊 Student：Ting-Yi Wu

指導教授：陳伯寧 Advisor：Po-Ning Chen

國立交通大學

電信工程研究所

博士論文

A Thesis

Submitted to Institute of Communications Engineering College of Electrical and Computer Engineering

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Communications Engineering

July 2013

Hsinchu, Taiwan, Republic of China

(3)

基於搜尋演算法的不定長度錯誤更正前置碼之設計

學生：吳庭伊

指導教授：陳伯寧

國立交通大學電信工程研究所博士班

摘

要

在這篇論文中，我們針對離散無記憶訊號源(discrete memoryless source)設計整合訊源-通道之不定長度錯誤更正前置碼(variable-length error correcting prefix code or VLECPC)。我們的研究成果包含：一、在給定自由距離(free distance)最小允許值的前提下，證明運用優先權搜尋演算法(priority first search algorithm)，於我們所新設計的搜尋樹結構中，可保證找到最低平均碼長的不定長度錯誤更正前置碼。二、為進一步降低解碼錯誤率，我們提出在所有可達最低平均碼長的不定長度錯誤更正前置碼中，可以使用錯誤率聯集上界(union bound)的主要項為依據，選取使主要項最低的最低平均碼長之不定長度錯誤更正前置碼，獲取較佳的容錯能力。三、對於較大的自由距離最小允許值、或是較多的離散訊號源個數，前述所提的搜尋演算法因過於費時而不適用，因此在損失些微平均碼長的前提下，另提出簡化快速搜尋演算法。四、在解碼端，針對接收端另知不定長度碼的傳送個數的條件，設計了低複雜度的最大事後機率(maximum a posteriori)解碼演算法。模擬結果顯示，我們所提出的編碼演算法在平均碼長與效能上，皆優於現有文獻的方法。另外，與傳統的分散式訊源-通道編碼相比較，在相當的解碼複雜度下，我們所設計的整合式訊源-通道編解碼系統可達更低的傳輸錯誤率。

(4)

Algorithmic Design of Variable-Length Error-Correcting Prefix Code

Student: Ting-Yi Wu

Advisor: Po-Ning Chen

Institute of Communications Engineering

National Chiao Tung University

ABSTRACT

A joint source-channel coding problem that combines the efficient compression of discrete memoryless sources with their reliable communication over memoryless channels via binary variable-length error-correcting prefix codes (VLECPCs) is considered. Under a fixed free distance constraint, a priority-first search algorithm is devised for finding an optimal VLECPC with minimal average codeword length. Two variations of the priority-first-search-based code construction algorithm are also provided. The first one improves the resilience of the developed codes against channel noise by additionally considering a performance parameter without sacrificing optimality in average codeword length. In the second variation, to accommodate a large free distance constraint as well as a large source alphabet such as the 26-symbol English data source, the VLECPC construction algorithm is modified with the objective of significantly reducing its search complexity while still yielding near-optimal codes. A low-complexity sequence maximum a

posteriori (MAP) decoder for all VLECPCs (including our constructed optimal code) is then

proposed under the premise that the receiver knows the number of codewords being transmitted. Simulations show that the realized optimal and suboptimal VLECPCs compare favorably with existing codes in the literature in terms of coding efficiency, search complexity and error rate performance.

(5)

Acknowledgements

First and foremost, I would like to express the deepest gratitude to my supervisor, Pro-fessor Po-Ning Chen, who has the attitude and the substance of a genius. He has given me invaluable comments and suggestions on my research. Without his guidance and per-sistent help, this dissertation would not have been possible. Also, he is very kind and friendly to me. The pleasant environments directed by him made me feel so comfortable to be working with him.

I also would like to show my greatest appreciation to Professor Fady Alajaji and Professor Yung-hsiang Han. Discussions with them have been illuminating. Professor Alajaji is a great person and a scholar, not to mention his kind support and hospitality when we visited Queen’s University. His precise comments and insights has always been a great help in my research. And Professor Han literally introduced me to this topic as well for the support on the way. His keen and brilliant sense of research always help my research to be taken into next level.

My sincere thanks also goes to my thesis committees for their encouragement, in-sightful comments, and hard questions. In addition, I would like to give a special thanks to one of committees, Professor Stefan M. Moser, for his detailed comments and useful

suggestions on LA_TEX.

My sincere gratitude is extended to my labmates in NTL Lab, especially Dr. Shih-Wei Wang and Mr. Chin-Fu Liu, for the stimulating discussions and for all the fun we have had in the last four years. Dr. Wang has been always thoughtful and caring to everyone in NTL Lab. Living with him in Kingston is one of my happiest time of my life. And Mr. Liu is such a wonderful human being. He is always willing to share his knowledge

(6)

with others. I truly have learned a lot from him, and it’s really an honor for me to sit next to him.

I am indebted to my mother and both of my bothers for everything they have done for me. Last but not least, I would like to express my deepest gratitude to my beloved girlfriend, Miss Pi-Ching Lin, for her unconditional support and love for more than a decade.

(7)

List of Figures

2.1 Trellis representations of a VLECPC. The red-color (solid), blue-color (dash-dot) and green-color (dotted) arrows correspond respectively to the

tran-sition of transmitting codewords c1, c2 and c3. . . 8

3.1 Relation between a parent node and its children in a search tree. . . 11 6.1 Error performances of using different decoders to decode the same VLECPC,

which is encoded by the optimal VLECPC listed in Table 6.2. The number of 3-bit source symbols per transmission block is 10, which is equivalent to 30 source information bits. . . 26 6.2 Average numbers of decoder branch metric computations of using different

decoders to decode the same VLECPC for different L at SNRs = 3 dB.

The VLECPC is obtained from Table 6.2. . . 27

6.3 Error performances of optimal VLECPCs for different p0. The VLECPCs

are obtained from the optimal VLECPCs with d∗

free = 7 in Table 6.1. The

number of 3-bit source symbols per transmission block is 10, which is equiv-alent to 30 source information bits. . . 28 6.4 Error performances of the optimal VLECPC for different L. The optimal

VLECPC is obtained from Table 6.2. . . 29

6.5 Error performances of different (3rd order) VLECPCs for a binary

non-uniform source with p0 = 0.8. The number of 3-bit source symbols per

transmission block is 10, which is equivalent to 30 source information bits.

(10)

6.6 Error performances of the VLECPCs of Table 6.7 with dfree = 11 for the

26-symbol English alphabet (with Distribution 1). The number of source symbols per transmission block is L = 10. . . 39 6.7 Error performances of the SSCC (specifically, first order Huffman + TBCC)

and the VLECPC of Table 6.7 with dfree = 10 for the 26-symbol English

alphabet (with Distribution 1). The number of source symbols per trans-mission block is L = 10. . . 41

(11)

List of Tables

6.1 Average codeword length per grouped symbol of a 8-ary alphabet generated

from binary non-uniform memoryless sources with different p0. . . 25

6.2 The optimal VLECPC with d∗

free = 7 and p0 = 0.8 (the one with an average

codeword length of 7.240) of Table 6.1. . . 26 6.3 Average (AVG) and maximum (MAX) numbers of decoder branch metric

computations for the codes of Figure 6.1. . . 27 6.4 Average (AVG) and maximum (MAX) numbers of decoder branch metric

computations for the codes of Figure 6.5. . . 29 6.5 The VLECPCs for the English alphabet with Distribution 1 obtained by the

suboptimal code construction algorithm for different values of free distance. 33 6.6 The VLECPCs for the English alphabet with Distribution 2 obtained by the

suboptimal code construction algorithm for different values of free distance. 34 6.7 List of the VLECPCs obtained by three existing code construction schemes

and the VLECPCs obtained by our suboptimal code construction algorithm for the 26-symbol English alphabet with Distribution 1 given in Table 6.5: (a) Average codeword lengths (ALs) of the found codes and execution time for each code construction algorithm; (b) Parameters used in each

algo-rithm. The suboptimal algorithm is initialized with Ub set to equal the

smallest of the average codeword lengths of the VLECPCs by Buttigieg, Lamy and Wang. . . 35

(12)

6.8 List of the VLECPCs obtained by three existing code construction schemes and the VLECPCs obtained by our suboptimal code construction algorithm for the 26-symbol English alphabet with Distribution 2 given in Table 6.6: (a) Average codeword lengths (ALs) of the found codes and execution time for each code construction algorithm; (b) Parameters used in each

algo-rithm. The suboptimal algorithm is initialized with Ub set to equal the

smallest of the average codeword lengths of the VLECPCs by Buttigieg, Lamy and Wang. . . 36 6.9 The complexities and performances of some different suboptimal code

con-struction for dfree = 4 for the 26-symbol English alphabet (Distribution 2

given in Table 6.6). . . 37 6.10 The complexities and performances of some other suboptimal code

con-struction for dfree = 7 for the 26-symbol English alphabet (Distribution 2

given in Table 6.6). . . 38 6.11 Average (AVG) and maximum (MAX) numbers of decoder branch metric

computations for the codes of Figure 6.6. . . 39 6.12 Average (AVG) and maximum (MAX) numbers of decoder branch metric

computations for the codes of Figure 6.7. The parameter λ used in PFSA is indicated inside the parentheses. . . 40

(13)

Chapter 1 Introduction

1.1 Overview

One of Shannon’s key contributions in information theory is the separation principle for source-channel coding [27], which states that the source and channel coding operations can be separately designed and performed in tandem without affecting the system’s opti-mality for reliably transmitting a data source over a noisy channel. However, this result hinges on the assumption that unlimited complexity and coding delay can be afforded by the system, which is unrealistic in today’s resource constrained communication systems. It is indeed well-known via both analytical and empirical studies (e.g., see [1, 2, 14, 33] and the references therein) that joint source-channel coding (JSCC) can significantly outper-form separate source-channel coding (SSCC), particularly when the system has stringent delay and complexity restrictions. JSCC, which may use codes of fixed or variable length, is typically realized in two ways: by coordinating the source and channel coding functions in tandem or by combining them within a single step (examples of various JSCC schemes can be found in [33]). In this dissertation, we focus on variable-length single-step JSCC with the objective of designing optimal or close-to-optimal variable-length error-correcting prefix codes (VLECPC) with low complexity for the efficient compression and commu-nication of data sources in the presence of channel noise. Here optimality is interpreted as achieving minimal average codeword length among all VLECPC designs subject to a fixed free-distance constraint. The successful development of such VLECPCs, which play the dual role of good data compression and error-correcting codes, provides an interesting

(14)

alternative to the classical SSCC scheme, particularly when the system’s complexity can be significantly reduced without degrading its error performance.

First introduced in [17, 5, 6], VLECPCs were thoroughly investigated by Buttigieg in [7, 9] and were shown to exhibit properties akin to those of convolutional codes: they have a memory structure, which can naturally be represented via a trellis, and they are best suited for being decoded via a sequence maximum-likelihood (ML) or maximum a posteriori (MAP) Viterbi-like decoder (as opposed to decoding their codewords instan-taneously). Furthermore, Buttigieg showed how the VLECPCs’ distance spectrum and the union bound can be used to predict their error performance under hard-decision ML decoding for the binary symmetric channel (BSC) and identified the codes’ free distance

dfree as a key parameter which, when maximized, can improve the codes’ performance.

In related works, the error exponent of VLECPCs is analyzed [3] and conditions for the existence of VLECPCs are studied [31, 24].

In [7], Buttigieg originally proposed two techniques to construct VLECPCs with a

given dfree value. They are respectively based on a greedy algorithm (GA) and a majority

vote algorithm (MVA). Specifically, he employs either the GA or MVA procedure to select as many codewords as possible of the same length, where the selected codewords must

satisfy certain minimum distance conditions in order to reach the required dfree. Later,

Lamy and Paccaut [23] replaced Buttigieg’s GA and MVA schemes with new algorithm designed to obtain a good trade-off between system complexity and coding efficiency. In [30], Wang et al. improved the coding efficiency of VLECPCs by iteratively replacing longer codewords with shorter ones. In [26], Savari and Kliewer focused on minimizing the average codeword length of VLECPCs. In their design, each codeword is required to have Hamming weight w, where w is a multiple of an integer greater or equal to 2,

resulting in a class of VLECPCs with dfree ≥ 2. In [11, 13, 18], Diallo et al. proposed

several algorithms for obtaining VLECPCs with maximal dfree under the premise that

all codeword lengths are known in advance. A similar approach was used in [12] for developing good error-correcting arithmetic codes.

(15)

VLECPCs and modified the Viterbi algorithm (VA) to realize a sequence MAP decoder, which is optimal in terms of minimizing the VLECPCs’ sequence error probability. Later in 2008, Huang et al. [19] proposed a trellis-based MAP priority-first search decoding algorithm for VLECPCs based on a suitable soft-decision MAP decoding criterion and empirically showed a significant complexity improvement over Buttigieg’s MAP decoder. MAP decoding techniques using an extended trellis under the assumption that the receiver knows both the number of transmitted bits and the number of transmitted codewords were developed in [4, 21]. Other decoding methods for variable-length codes (VLC) that use other trellis VLC representations include the sequence MAP decoder of [3] and iterative (Turbo-like) decoders of [4, 22].

In this dissertation, we present a novel priority-first search algorithm that can con-struct VLECPCs with minimal average codeword length and free distance no less than a

pre-given d∗

free. We next investigate how to select, among all obtained optimal1 VLECPCs,

the one with the best error correction capability. We observe that the codes’ Levenshtein

coefficient Bdfree plays an important role in their error performance: choosing the optimal

code with the smallest Bdfree yields the best system error rate. Furthermore, we modify

our construction algorithm to reduce its search complexity in order to accommodate large

values of dfree and large source alphabets such as the 26-symbol English data source. We

also propose a low-complexity two-phase sequence MAP decoder that can be applied to all VLECPCs (including our constructed optimal and suboptimal codes) under the as-sumption that the receiver knows both the number of transmitted bits and the number of transmitted codewords. We show by simulations that the resulting suboptimal VLECPCs outperform most existing VLECPCs in the literature in terms of compression efficiency, search complexity and error rate. We also compare our JSCC codes with traditional SSCCs.

The rest of this dissertation is organized as follows. In Chapter 2, we formulate our problem and present some background material about VLECPCs. In Chapter 3, we

de-1

We emphasize that, throughout the dissertation, an “optimal VLECPC” is defined as a VLECPC with minimal average codeword length. In other words, an optimal VLECPC does not guarantee to yield the best error rate performance.

(16)

scribe our code construction which guarantees the development of optimal VLECPCs with a given free distance constraint. In Chapter 4, two VLECPC construction modi-fications are proposed respectively for the design of optimal codes with enhanced error

correction capability and for the design of suboptimal VLECPCs for large dfree and large

source alphabet sizes. In Chapter 5, a low-complexity two-phase sequence MAP decoder is introduced. Simulation results illustrating the performance of the constructed optimal and suboptimal VLECPCs are given in Chapter 6. Finally, conclusions are stated in Chapter 7.

1.2 Contributions

The main contributions of this thesis are briefed as follows.

• The first algorithm that guarantees the construction of an optimal VLECPC (in the sense of minimizing the average codeword length) subject to a free distance constraint is proposed.

• The error correction capability of the constructed optimal VLECPC is enhanced by

choosing the optimal VLECPC with minimum Bdfree.

• Simplified suboptimal construction algorithm has a search complexity superior to the state-of-the-art code construction algorithms in the literature and can accom-modate large source alphabets such as the 26-symbol English text source.

• An efficient low-complexity sequence MAP decoder for a receiver knowing the num-ber of transmitted codewords is also proposed.

(17)

Chapter 2 Problem Formulation and

Preliminaries

We consider the JSCC problem of efficient compression of a discrete memoryless (inde-pendent and identically distributed) source and its reliable communication over a noisy channel via a single binary VLECPC. We assume a binary phase-shift keying (BPSK) modulated additive white Gaussian noise (AWGN) channel (although other channel mod-els can also be considered) and employ optimal sequence MAP decoding in the sense of

minimizing the code’s sequence error1 _{probability. The VLECPC’s free distance d}

free has

already been identified as a key error performance parameter, playing a similar role as for

convolutional codes: the larger dfree is, the better is the code’s error resilience particularly

at high signal-to-noise ratios (SNRs) [7, 9]. Our objectives are four-fold:

• Designing an algorithm that guarantees the construction of an optimal (i.e., with minimal average codeword length) binary VLECPC for a given free distance bound

d∗

free.

• Enhancing the error correction capability of the constructed optimal VLECPCs by

optimizing an important performance parameter Bdfree.

• Ensuring that the construction algorithms have a search complexity superior to the state-of-the-art code construction algorithms in the literature so that they can accommodate large source alphabets such as the 26-symbol English data source. • Designing an efficient low-complexity sequence MAP decoder under the premise 1

A sequence error occurs when a decoded sequence of VLECPCs is not exactly the same as the transmitted one.

(18)

that the receiver knows the total number of transmitted VLECPC codewords (in addition to the total number of transmitted code bits).

The successful achievement of these objectives has interesting applications for the ef-fective compression and error-resilient transmission of text documents over noisy channels. In what follows, we present some preliminary background about VLECPCs. Consider

a K-ary discrete memoryless source with alphabet S , {α1, α2, . . . , αK} and respective

symbol probabilities p1, p2, . . . , pK (such that PK_i=1pi = 1). A (first-order) VLECPC

encoder maps each symbol αi ∈ S to a binary variable-length codeword ci, where i =

1, 2, . . . , K. The set of codewords is denoted by C = {c1, c2, . . . , cK} and the average

codeword length for code C is given by

C ,

K

X

i=1

pi|ci|, (2.1)

where |ci| is the length of codeword ci.

2.1 Sequence MAP Decoding Criterion

Let XL,N , {x1x2x3· · · xL: ∀xi ∈ C and L X i=1 |xi| = N} (2.2)

be a set of bitstreams consisting of L (concatenated) codewords with overall length N. Define

XN ,

[

i≥1

Xi,N (2.3)

as a set of bitstreams consisting of some (concatenated) codewords with overall length N. Assume that a sequence of VLECPC codewords of overall length N is transmitted over

the binary-input AWGN channel and that r , (r1, r2, . . . , rN) is received at the channel

output. The sequence MAP (soft-decision) decoder then outputs ˆv , (ˆv1, ˆv2, . . . , ˆvN) if ˆv

satisfies [19] N X i=1 (yi⊕ ˆvi)kφik1− ln Pr(ˆv) ≤ N X i=1 (yi⊕ vi)kφik1− ln Pr(v) (2.4)

(19)

for all

v ∈

(

XN if the receiver only knows N ,

XL,N if the receiver knows both L and N ,

where ⊕ is modulo-2 addition, Pr(·) denotes probability, k · k1 denotes absolute value, φi

is a log-likelihood ratio given by

φi , ln

Pr(ri|0)

Pr(ri|1)

(2.5)

and yi is the hard decision of ri given by

yi ,

(

1 if φi < 0,

0 otherwise. (2.6)

2.2 VLECPC Trellis Diagrams

In [7, 9], Buttigieg employed a VLECPC decoding trellis TN as exemplified in Figure 2.1(a)

for C = {00, 010, 0110}, in which state Sj denotes that the number of bits decoded thus

far is j.

We can construct an extended trellis TL,N as defined in [4, 21] under the assumption

that the receiver knows both L and N. An example of such extended trellis for C =

{00, 010, 0110} is shown in Figure 2.1(b), where Si,j denotes that the number of decoded

symbols and the number of decoded bits thus far are i and j, respectively.

2.3 Free Distance

In [7], in order to analyze the error performance of a trellis-based VLECPC decoder, Buttigieg defined the free distance as the minimal Hamming distance between any two

distinct paths converge at the same node in the trellis. Thus, the free distance dfree of

C as defined in [7] depends on the structure of its decoding trellis diagram. For the

computation of dfree, we will assume throughout the dissertation that the receiver knows

both L and N. Therefore, dfree is defined based on XL,N and is given by

(20)

(a) Trellis TN

(b) Trellis TL,N

Figure 2.1: Trellis representations of a VLECPC. The red-color (solid), blue-color (dash-dot) and green-color (dotted) arrows correspond respectively to the transition of

(21)

where d(a, b) denotes the Hamming distance between bitstreams a and b. The following

lower bound on dfree(C) has been shown in [7, 9]

dfree(C) ≥ min{db(C), dc(C) + dd(C)}, (2.8)

where db(C) is the “overall minimum block distance” defined as

db(C), min{d(ci, cj) : ci, cj ∈ C, ci 6= cj and |ci| = |cj|}, (2.9)

dc(C) is the “minimum converge distance” given by

dc(C), min{d(ci, c′j) : ci, cj ∈ C, |ci| < |cj|, c′j is the suffix of cj and |c′j| = |ci|},

(2.10)

and dd(C) is the “minimum diverge distance” defined as

dd(C), min{d(ci, c′j) : ci, cj ∈ C, |ci| < |cj|, c′j is the prefix of cj and |c′j| = |ci|}.

(22)

Chapter 3 Optimal VLECPC Construction

We herein present a new search algorithm for constructing an optimal VLECPC with a

given free-distance bound d∗

free. The search algorithm always outputs an optimal VLECPC

with its dfree ≥ d∗free. This algorithm, which is a modification and extension of the

algo-rithm introduced in [20] for finding optimal lossless data compression codes with reversible VLC structure, uses a new search tree and a priority-first search method.

To construct an optimal VLECPC with K codewords and dfree≥ d∗free, we use a search

tree in which each node X contains three components given by the triplet {CX, AX, f (X)}.

Here, CX = {cX1, cX2, . . . , cXt} denotes the set of t codewords that have been selected for the

desired VLECPC, and AX = {aX₁, aX₂, . . .} is the set of all bitstreams, which can be future

candidate codewords and hence do not contain any bitstreams for which the codewords

currently in CX are their prefixes. These bitstreams are listed in order of nondecreasing

lengths: |aX

1| ≤ |aX2| ≤ · · · .1 Finally, f (X) denotes the metric employed for finding an

optimal VLECPC and is given by

f (X), t X i=1 pi· |cXi| + K X i=t+1 pi· |aXi−t|. (3.1)

The search tree is binary (i.e., each of its nodes except a leaf or terminal node has two children); the relation between a parent node and its children is illustrated in Figure 3.1. Specifically, for a parent node P, its left child L is obtained by adding the next candidate

codeword aP

1 into CL. Since aP1 is now a codeword in CL, the set AL needs to be updated

by removing all bitstreams in AP whose prefix is aP1. Hence, the triplet of the left child L

1_{Recall that candidate codewords of equal length can be listed in any order without affecting the}

(23)

Figure 3.1: Relation between a parent node and its children in a search tree. becomes

CL = CP∪ {aP1} (3.2)

AL = {aL₁, aL₂, . . .}

= {a : a ∈ AP and aP₁ is not a prefix of a} (3.3)

f (L) = t X i=1 pi· |cPi| + pt+1· |aP1| + K X i=t+2 pi· |aLi−t−1|. (3.4)

On the other hand, the right child R is obtained by rejecting the next candidate codeword

aP

1 from its parent node. So, the triplet of the right child R becomes

CR = CP (3.5) AR = {aP2, aP3, . . .} = AP\ {aP1} (3.6) f (R) = t X i=1 pi· |cPi| + K X i=t+1 pi· |aPi−t+1|. (3.7)

Finally, since the root node has not yet selected any codeword, all bitstreams are its candidates; thus its components are given by

Croot = ∅ (3.8)

Aroot = {aroot1 , aroot2 , . . .}

= {0, 1, 00, 01, 10, 11, 000, 001, . . .} (3.9) f (root) = K X i=1 pi· |arooti |. (3.10)

Since every possible VLECPC can be obtained by traversing the search tree from the root node to its corresponding leaf nodes, a priority-first search algorithm can be applied on the tree to find a VLECPC whose average codeword length is smallest among all

VLECPCs with free distances no less than d∗

free. To reduce the search space, the average

codeword length of any known VLECPC with free distance no less than d∗

free is denoted

(24)

uncompetitive nodes during the search process. The search algorithm for finding an optimal VLECPC is described as follows.

Step 1: Push the root node into the Encoding Stack.2 _{Set upper bound U}

b as the average

codeword length of an existing VLECPC with free distance no less than d∗

free.

Step 2: If the top node of the Encoding Stack has selected K codewords (i.e., |Ctop| = K)

and dfree(Ctop) ≥ d∗free, then output Ctop as the optimal VLECPC and stop the

algorithm.

Step 3: Generate the two children of the top node as in Figure 3.1 and then delete the top node from the Encoding Stack. If the left child has selected K codewords

with its free distance ≥ d∗

free and its associated metric f is smaller than Ub, then

update Ub = f .

Step 4: Discard a child node which satisfies any of the following conditions:

1. It has selected more than K codewords for its Cchild;

2. There is no more candidate in Achild and the size of Cchild is less than K

(i.e., Achild = ∅ and |Cchild| < K);

3. The metric f (child) is larger than Ub;

4. Its associated free distance dfree(Cchild) is less than d∗free.3

Step 5: Insert the remaining children (those children which are not discarded in Step 4) into the Encoding Stack, and reorder the Encoding Stack in order of ascending metrics. Go to Step 2.

2

The Encoding Stack can be implemented via the data structure named HEAP [10]. One important property of the HEAP structure is that it can access the node with the minimal metric (i.e., the top node in the Encoding Stack ) within O( log(n) ) complexity, where n denotes the number of nodes in the HEAP.

3

In order to check this condition efficiently, the lower bound on the free distance given in (2.8) is first computed; if it is less than d∗

free, then Dijkstra’s algorithm [12] is adopted to determine the exact

free distance. This is realized by transforming the finite-state VLECPC encoder into a pairwise distance graph and applying Dijkstra’s algorithm to find the graph’s shortest path, where the resulting shortest path yields the VLECPC’s free distance. To our knowledge, Dijkstra’s algorithm is the most efficient method to evaluate dfree.

(25)

It should be emphasized that the above construction algorithm focuses only on prefix-free VLECPCs as most previous works did [7, 9, 11, 12, 23, 26, 30]. Although non-prefix-free but uniquely decodable VLECPCs can also be constructed, they are not herein considered due to the added complexity in testing their unique decodability. The proof of the optimality of the above algorithm is provided in Appendix A.

(26)

Chapter 4 Modified VLECPC Constructions

In this chapter, two modifications on the optimal VLECPC construction algorithm in-troduced in Chapter 3 are proposed. The first modification further enhances the error-correcting capability of the found optimal VLECPC by examining the union bound

coeffi-cient Bdfree of all equivalent1 optimal VLECPCs satisfying the free distance constraint and

then outputting the one with the smallest Bdfree, where Bdfree is a Levenshtein parameter

defined in Section 4.1 below. By targeting a suboptimal VLECPC instead of an optimal one, the second modification reduces considerably the search complexity of the optimal construction algorithm in order to make feasible the construction of VLECPCs for larger

alphabet sizes (such as the 26-symbol English data source) along with a large d∗

free (such

as d∗

free = 10).

4.1 Finding an optimal VLECPC with the smallest

B

dfree

In [7, 9], Buttigieg found that under hard-decision ML decoding, the symbol error

prob-ability Pe(C) of a VLECPC C transmitted over the BSC with crossover probability ǫ can

be upper-bounded by Pe(C) ≤ ∞ X h=dfree(C) ˜ BhPh, (4.1) 1

(27)

where ˜ Bh , ∞ X N =1 X a_∈XN Pr(a) ·   X b_{: b∈XN}_{and d(a,b)=h} L(a, b)   (4.2) and Ph ,    Ph e=(h+1)/2 h eǫ e_{(1 − ǫ)}h−e _{if h is odd,} 1 2 h h/2ǫ h/2_{(1 − ǫ)}h/2₊Ph e=h₂+1 h eǫ e_{(1 − ǫ)}h−e _{if h is even.} (4.3) Note that in Buttigieg’s derivation, the symbol errors are counted using the Levenshtein

distance2 _{L(·, ·) between transmitted sequence and decoded sequence, and the receiver}

decodes based on trellis TN with N extending to infinity.

With a slight modification, a similar bound can be derived under the additional as-sumption that the receiver also knows the number of transmitted codewords L. In

par-ticular, (4.1) remains of the same form with ˜Bh replaced with Bh, where

Bh , ∞ X L=1 ∞ X N =1 X a_∈XL,N Pr(a) ·   X b_{: b∈XL,N} _{and d(a,b)=h} L(a, b)  . (4.4)

The coefficient Bh, as expressed in (4.4), can be regarded as the average Levenshtein

distance between all converging path pairs that are at a Hamming distance h from each

other in the extended trellis TL,N. Thus, it is evident that Bh plays a key role in the

union bound (4.1), particularly the first term Bdfree , Bhmin, where hmin is the smallest

integer h no less than dfree(C) such that Bh is positive. Accordingly, given a set of optimal

VLECPCs, the one with the smallest Bdfree is expected to have a better error performance.

It should be mentioned that in this dissertation we use a soft-decision MAP decoder with respect to the AWGN channel. The simplified union bound for the BSC (not we used at (4.2)–(4.4)); however, can provide a much simplified view on the system performance

and hence the parameters dfree(C) and Bdfree obtained from (4.1) are adopted in our code

design.3

We now modify the algorithm in Chapter 3 to find the optimal VLECPC with the

smallest Bdfree among all optimal VLECPCs that has the minimum average codeword

2

The Levenshtein distance, also called edit distance, between two sequences is the minimum number of character edits (including insertion, deletion and substitution) required to change one sequence into the other.

3

(28)

length. This can be achieved by continuing the algorithm, even if the top node of the Encoding Stack reaches the leaf node in Figure 3.1 (see Step 2 in Chapter 3), until the average codeword length of the new top node is greater than that of the optimal VLECPC. This continuation then guarantees that all optimal VLECPCs (of equal average codeword

length) are examined and the one with the smallest Bdfree can be selected. As a result,

only the first two steps need to be modified:

Step 1′_{: Push the root node into the Encoding Stack. Set upper bound U}

b as the average

free,

and initialize B∗

dfree = ∞.

Step 2′_{: If the metric f (namely, the average codeword length) of the top node is}

strictly greater than Ub, then output C∗ and stop the algorithm; else if the

top node of the Encoding Stack has selected K codewords (i.e., |Ctop| = K),

and dfree(Ctop) ≥ d∗free, and Bdfree(Ctop) < Bdfree∗ , then retain C∗ = Ctop and

B∗

dfree = Bdfree(Ctop). Delete the top node and reorder the Encoding Stack in

order of ascending metrics.

4.2 Suboptimal code construction with parameters

(∆, Γ, D, I)

The complexity and memory demand of the optimal code construction algorithm in Chap-ter 3 grows significantly when searching for VLECPCs corresponding to a large source

alphabet size K and a large free distance requirement d∗

free. We herein alleviate the

algo-rithm’s complexity and memory demand by constructing a suboptimal VLECPC, which can accommodate higher free distance targets and larger source alphabet sizes. This is done based on four complexity reduction procedures.

First, we reduce the computational complexity incurred in examining the exact free distance of the top node by using its lower bound in (2.8) instead. Furthermore, Buttigieg recently observed [8] that good codes usually have converging and diverging distances

(29)

by one (for odd values of dfree). Thus, we only focus on VLECPCs with the above property.

In other words, the new suboptimal code construction only searches for the VLECPC C that satisfies the following conditions:

(

min{db(C), dc(C) + dd(C)} ≥ d∗free, and

|dc(C) − dd(C)| ≤ 1.

(4.5) With this modification, the actual free distance of the output VLECPC may be strictly

larger than the required d∗

free; yet, this saves considerable computational effort in

calcu-lating the exact free distance for each node visited during the code search process. Second, we adopt the early-elimination concept from [28], in which an efficient near-optimal sequential decoding algorithm for convolutional codes was proposed. In short, the authors in [28] propose to directly remove those nodes that are far behind the farthest node having been explored during the search process. Since the metric used in our code construction algorithm is also nondecreasing along every path in the trellis as in [28], these “far-behind” nodes are highly unlikely to result in a K-codeword offspring node whose average codeword length is small, and hence can be early-eliminated.

The third modification, also borrowed from [28], is to set a proper Encoding Stack size limitation in order to fix the memory demand and indirectly to reduce the search complexity.

In the last modification, we attempt to compensate for potential losses in coding efficiency (average codeword length) caused by the previous three modifications. Recall that the average codeword length of any existing VLECPC can be used as the upper bound

Ub in our search algorithm. Hence, when our suboptimal approach results in a VLECPC

whose average codeword length is smaller than the given Ub, we can update the value

of Ub with this average codeword length and launch a new execution of our algorithm.

This step can then be repeated in a number of iterations until no improvements in coding efficiency are realized or a prescribed maximal number of iterations is reached.

Four parameters (∆, Γ, D, I) are accordingly added corresponding to the last three modifications.

(30)

number of codewords |Ctop| is less than lmax− ∆, where lmax is the largest |C| among

all expanded nodes.

2: Encoding Stack size Γ: When the number of nodes in the Encoding Stack is larger than Γ, nodes are recursively deleted from the Encoding Stack according to one of the two criteria described below.

1. Deletion criterion D = Dl: Delete the node with the smallest code size |C|.

2. Deletion criterion D = Dm: Delete the node with the largest metric f .

3: The maximal number of iterations I.

The suboptimal algorithm, characterized by four parameters (∆, Γ, D, I), can thus be obtained by modifying the optimal algorithm in Chapter 3 and adding a new Step 6 as follows:

Step 1′′_{: Push the root node into the Encoding Stack. Set upper bound U}

b as the average

free.

Alternatively for the followup iteration, set upper bound Ub as the average

codeword length of the output VLECPC obtained from the previous iteration.

Initialize the target VLECPC C∗ _{as the empty set and l}

max = 0.

Step 2′′_{: If the Encoding Stack is empty and C}∗ _{6= ∅, then output C}∗ _{as the optimal}

VLECPC and stop the algorithm; else if both the Encoding Stack and C∗ _are

empty, then report a code search failure and stop the algorithm.4

If |Ctop| < lmax− ∆, then directly delete the top node from the Encoding Stack

and redo Step 2′′_{; else if l}

max< |Ctop|, update lmax= |Ctop|.

If the top node of the Encoding Stack has selected K codewords (i.e., |Ctop| = K)

4

Even if Ub is the average codeword length of an existing VLECPC, the search space could be forced

to become empty due to extra node exclusions of the first three complexity reduction modifications, i.e., requiring the free distance lower bound to be no less than d∗

free, early eliminations, and node deletions for

a fully filled Encoding Stack. Note that when a node is excluded, all of its offspring nodes can no longer be visited; hence, it is possible that all the valid nodes (i.e., all the valid VLECPCs) are removed after several recursions of Steps 2′′_–5′′_.

Since, in the two earlier optimal code construction algorithms, the nodes corresponding to optimal VLECPCs will never be excluded, the Encoding Stack can never be empty prior to finding the optimal

(31)

and Ctop satisfies condition (4.5), then output Ctopas the optimal VLECPC and

stop the algorithm.

Step 3′′_{: Generate the two children of the top node as in Figure 3.1 and then delete the}

top node from the Encoding Stack. Then update Ub as the metric f of left child

and put left child as C∗ _{if left child satisfies all of the following conditions:}

1. The left child has selected K codewords in his Cleft;

2. Cleft satisfies condition (4.5);

3. Its associated metric f is smaller than Ub.

Step 4′′_{: Discard the child node which satisfies any of the following conditions:}

1. It has selected more than K codewords for its Cchild;

2. There is no more candidate in Achild and the size of Cchild is less than K

(i.e., Achild = ∅ and |Cchild| < K);

3. The metric f (child) is larger than Ub;

4. It disobeys condition (4.5).

Step 5′′_{: After inserting the remaining children into the Encoding Stack, recursively}

delete nodes from the Encoding Stack based on the chosen deletion criterion D until the Encoding Stack size is no greater than Γ. Reorder the Encoding Stack

in order of ascending metrics. Go to Step 2′′_.

Step 6 : Repeat Steps 1′′_–5′′ _{until either the maximum number of iterations I is reached}

or the upper bound Ub remains the same as the previous iteration.

We end this chapter with a remark about the free distances of the VLECPCs found by the three code construction algorithms introduced in this dissertation.

Recall that the two optimal code construction algorithms, respectively introduced in Chapter 3 and Section 4.1, guarantee to output the VLECPC whose average codeword length is smallest among all VLECPCs with free distance never smaller than the target free distance. In all cases we have examined, however, the free distance of the resulting

(32)

optimal VLECPCs is always equal to the target free distance; although we conjecture the validity of this observation, we could not confirm it with a formal proof.

As expected, the suboptimal code construction algorithm may produce a (suboptimal)

VLECPC with free distance strictly larger than d∗

free. However, in the particular case

of the 26-symbol English alphabet (as will be presented in Chapter 6), the suboptimal code construction algorithm also consistently deliver a (suboptimal) VLECPC with free

distance equal to d∗

free, which indicates that the free distance lower bound in (2.8) is indeed

tight for the found suboptimal VLECPC. It should be mentioned that the tightness of (2.8) depends on the distribution of the source. In [13] and [18], it is shown that the tightness of (2.8) may be weak when the source distribution is highly unbalanced. Details will be given in Chapter 6.

(33)

Chapter 5 Two-Phase Sequence MAP

(TP-SMAP) Decoding

In [19], an efficient sequence MAP decoder with the assumption that the receiver knows only the number of transmitted bits N was proposed. This decoder therefore can only

operate on the traditional trellis TN shown in Figure 2.1(a). With the additional

infor-mation about the number of transmitted symbols L, we herein propose a new two-phase

sequence MAP (TP-SMAP) decoder, which can now operate on the extended trellis TL,N

(cf. Figure 2.1(b)), and whose average decoding complexity is only slightly greater than

that for running the Viterbi algorithm (VA) on TN (even if TL,N has significantly more

nodes and more transitions than TN). We next describe the TP-SMAP decoding scheme.

In trellis TL,N, as defined in Section 2.2 and illustrated in Figure 2.1(b), a path

travers-ing from S0,0 to Si,j can be labeled as x(i,j)_(0,0) , x1x2· · · xi ∈ Xi,j, where each xi ∈ C. Then,

by following the MAP decoding criterion described in Section 2.1, the path metric of x(i,j)_(0,0)

is defined as gx(i,j)_(0,0)= j X ℓ=1 (yℓ⊕ bℓ)kφℓk1− ln Pr x(i,j)_(0,0), (5.1)

where b1b2· · · bj denotes the binary representation of path x(i,j)_(0,0). Based on this new

notation, the objective of the MAP decoder that knows both L and N is to find a path

whose metric is the smallest among all valid paths x(L,N )_(0,0) from S0,0 to SL,N.

In short, the TP-SMAP scheme first performs backward VA on TN, whose size is

significantly smaller than that of TL,N, and preserves the metric of each backward survivor

(34)

Step 1: Associate a zero path metric to node SN in TN, i.e., h(SN) = 0.

Step 2: Apply the backward VA with path metric given by (5.1) starting from SN in

TN, and record the metric and survivor path for each state as h(Si) and p(Si),

respectively.

Step 3: If the number of codewords correspond to survivor path p(S0) is equal to L,

then output path p(S0) as the MAP decision and stop the algorithm; otherwise,

go to phase 2.

In the second phase, the TP-SMAP applies a priority-first search algorithm [15] on

TL,N with the decoding metric of path x(i,j)_(0,0) being re-defined as

m x(i,j)_(0,0) = gx(i,j)_(0,0) + h (Sj) . (5.2)

The second phase of the decoder is next described.

Step 1: Initialize the path metric of x(0,0)_(0,0) as m(x(0,0)_(0,0)) = h(S0), and load it into the

Decoding Stack.1

Step 2: If the top node of the Decoding Stack reaches the final state SL,N in TL,N, then

output its associated path as the MAP decision and stop the algorithm.

Step 3: Mark the state of the top node as visited. Then extend the top node to all its successors and compute their metrics according to (5.2). Delete the top node from the Decoding Stack.

Step 4: Discard the successors if they had been marked as visited. Also, discard the successors for which the number of decoded symbols exceeds L or the number of decoded bits exceeds N.

Step 5: Insert the remaining successors (those successors which are not discarded in Step 4) into the Decoding Stack and reorder the Decoding Stack in order of ascending m-metrics defined in (5.2). Go to Step 2.

1

The role of the Decoding Stack is similar to that of the Encoding Stack, except that the Decoding Stack stores the nodes of TL,N as its elements. It is also implemented via the data structure named

(35)

It can be noted that the second phase of the decoder follows similar procedures as the code construction algorithm introduced in Chapter 3, except that the priority-first

algorithm is now applied on the trellis TL,N instead of applying it on a search tree for

code construction. Since some paths of the trellis TL,N run across the same node, the

priority-first algorithm must avoid expanding the same node on the trellis TL,N more than

once. We therefore need to mark the expanded node (top node) as visited in Step 3, and discard the successors which have already been marked as visited in Step 4. The proof of optimality for the above decoding algorithm is provided in Appendix B.

(36)

Chapter 6 Simulation Results

In this chapter, we assess via simulations the error performances of the found VLECPCs

in terms of reconstructed source symbol error rate (SER).1 _{In all simulations, the source}

is assumed memoryless and the channel is the BPSK-modulated AWGN channel. The decoding complexity of the proposed two-phase sequence MAP (TP-SMAP) decoder is also examined. Furthermore, comparisons with other systems in literature, including three known VLECPC schemes and a traditional SSCC system, are provided. For measuring the time to search for the optimal and suboptimal VLECPCs, the experiments were carried using the C programing language under a 64-bit operation system Linux (Ubuntu 10.04 LTS) executed on a desktop computer with a Intel-Core2 Duo E6600 2.4GHz CPU and 4GB memory. It should noted that the decoders of VLECPCs in the following simulations are assumed to be TP-SMAP, if they are not be specified.

As usual, the system signal-to-noise ratio (SNR) is given by SNR , E/N0, where E

is the signal energy per channel use and N0/2 is the variance of the zero-mean additive

channel noise sample. To account for the coding redundancy of systems with different code rates, SNR per source symbol is used in presenting the simulation results, which is given by SNRs = Es N0 = E N0 · 1 R, (6.1)

where Es is the energy per source symbol, and R is the overall (average) system rate

defined as the number of transmitted source symbols per channel use. For an SSCC 1_{As a convention, the SER here is the Levenshtein distance between the transmitted sequence and the}

(37)

system, the overall rate R satisfies R = Rc/Rs, where Rs is the source coding rate (in

coded bits/source symbol) and Rc is the channel coding rate (in coded bits/channel use).

Hence, for an SSCC system employing a kth-order Huffman VLC2_{followed by a tail-biting}

convolutional code, Rs is the average codeword length of the Huffman code divided by

k, and Rc is the rate of the tail-biting convolutional code. Note that a VLECPC (or a

single-step JSCC) can be regarded as having Rc = 1 with Rs being its averaged source

coding rate, since no explicit channel coding is performed.

Table 6.1: Average codeword length per grouped symbol of a 8-ary alphabet generated

from binary non-uniform memoryless sources with different p0.

Buttigieg’s Lamy’s Wang’s Opt. VLECPC

p0 0.7 0.8 0.7 0.8 0.7 0.8 0.7 0.8 d∗ free = 3 4.500 4.000 4.500 4.000 4.500 4.000 4.473 3.992 d∗ free = 5 6.443 5.912 6.443 5.912 6.443 5.912 6.340 5.592 d∗ free = 7 8.326 7.864 8.473 7.936 8.326 7.864 8.016 7.240

In Table 6.1, we compare the VLECPCs found by the proposed method in Chapter 3 with Buttigieg’s codes [7], Lamy’s codes [23] and the codes by Wang et al. [30]. Here, we group three information bits, generated from a binary non-uniform memoryless source

with bit probability p0 , Pr(0) ∈ {0.7, 0.8}, as one source symbol; hence, the VLECPCs

are 3rd order VLCs (i.e., k = 3), and the size of the source alphabet is K = 23 _{= 8. Since}

our proposed algorithm guarantees to find VLECPCs with minimal average codeword

length under a fixed d∗

free, the resulting VLECPCs have a shorter average codeword length

than any other code with identical free distance.

We then investigate the improvement in both error performance and decoding com-plexity of the proposed TP-SMAP decoder. In Figure 6.1, 30 information bits (i.e., 10

grouped symbols) are encoded by the optimal VLECPC with d∗

free = 7 and p0 = 0.8 of

Table 6.1, which is listed in Table 6.2. The dotted lines show the performance of the MAP decoder under the assumption that the receiver only knows the number of transmitted bits, N. The solid line portrays the MAP decoder’s performance under the assumption that receiver knows both number of symbols, L, and transmitted bits, N. As shown in

2

Recall that a kth order VLC maps a block of k source symbols onto a variable-length codeword. So its average source coding rate is given by the average codeword length divided by k.

(38)

Table 6.2: The optimal VLECPC with d∗

free = 7 and p0 = 0.8 (the one with an average

codeword length of 7.240) of Table 6.1.

Grouped Symbol Probability Optimal VLECPC

with dfree = 7 and p0 = 0.8

000 0.512 00100 001 0.128 01011010 010 0.128 100111001 100 0.128 1111111111 011 0.032 11010110011 101 0.032 000110010011 110 0.032 110011101011 111 0.008 1111110001011 1 1.5 2 2.5 3 3.5 4 10−5 10−4 10−3 10−2 SNRs(dB) S y m b o l E rr o r R a te MAP on TN MAP on TL,N

Figure 6.1: Error performances of using different decoders to decode the same VLECPC, which is encoded by the optimal VLECPC listed in Table 6.2. The number of 3-bit source symbols per transmission block is 10, which is equivalent to 30 source information bits.

(39)

Table 6.3: Average (AVG) and maximum (MAX) numbers of decoder branch metric computations for the codes of Figure 6.1.

Eb/N0 1 dB 2 dB 3 dB 4 dB

decoder AVG MAX AVG MAX AVG MAX AVG MAX

Viterbi on TN 459 768 459 768 459 768 459 768

Viterbi on TL,N 1651 2600 1651 2600 1651 2600 1651 2600

TP-SMAP TL,N 461 2970 460 1619 459 863 459 768

Figure 6.1, about 0.3 dB in coding gain is realized by knowing L (in addition to N). Table 6.3 summarizes the decoding complexities of different decoders in terms of the branch metric computations. From the table, we remark that the TP-SMAP decoder has

a similar decoding complexity as the Viterbi algorithm on TN while achieving about 0.3

dB coding gain in error performance. For identical error performance, the TP-SMAP decoding algorithm spends almost 4 times less in branch computations than the Viterbi

algorithm on TL,N. 10 20 30 40 50 0 1 2 3 4 5 6x 10 4 L

Average number of branch metric computation

Viterbi Algorithm on TN

Viterbi Algorithm on TL,N

TP-SMAP on TL,N

Figure 6.2: Average numbers of decoder branch metric computations of using different

decoders to decode the same VLECPC for different L at SNRs = 3 dB. The VLECPC is

(40)

We further test the decoding complexities of different decoders for different L. In

Figure 6.2, the optimal VLECPC of Table 6.2 is transmitted at SNRs = 3.0 dB. This figure

indicates that the decoding complexities of TP-SMAP are similar to those of the Viterbi

algorithm on TN. The result also shows that the decoding complexities of TP-SMAP

decoder are proportional to the size of transmission block L. It should be emphasized that the decoding complexities of TP-SMAP are one order less than those of the Viterbi

algorithm on TL,N, in which both have the same error performances.

1 1.5 2 2.5 3 3.5 4 10−5 10−4 10−3 10−2 10−1 SNRs S y m b o l E rr o r R a te

Optimal VLECPC with p0= 0.7

Optimal VLECPC with p0= 0.8

Figure 6.3: Error performances of optimal VLECPCs for different p0. The VLECPCs are

obtained from the optimal VLECPCs with d∗

free = 7 in Table 6.1. The number of 3-bit

source symbols per transmission block is 10, which is equivalent to 30 source information bits.

We next investigate the error performances of optimal VLECPCs for different values

of p0 and L. Figure 6.3 shows that the optimal VLECPC for p0 = 0.8 performs about 0.8

dB better than the optimal VLECPC for p0 = 0.7 at SER of 10−3. Figure 6.4 shows that

the optimal VLECPC performs better when size of transmission block L is smaller. These two figures indicate that the optimal VLECPCs are better when the source distribution

(41)

1 1.5 2 2.5 3 3.5 4 10−5 10−4 10−3 10−2 SNRs(dB) S y m b o l E rr o r R a te L = 50 L = 40 L = 30 L = 20 L = 10

Figure 6.4: Error performances of the optimal VLECPC for different L. The optimal VLECPC is obtained from Table 6.2.

Table 6.4: Average (AVG) and maximum (MAX) numbers of decoder branch metric computations for the codes of Figure 6.5.

SNRs 1 dB 2 dB 3 dB 4 dB

AVG MAX AVG MAX AVG MAX AVG MAX Lamy’s VLECPC 511 3631 510 1858 510 970 510 731 Buttigieg’s and Wang’s VLECPCs 500 3439 499 1303 499 720 499 670 Optimal VLECPC 461 2970 460 1119 459 719 459 668 Optimal VLECPC with smallest Bdfree 462 3040 460 1144 459 712 459 668

(42)

1 1.5 2 2.5 3 3.5 4 10−5 10−4 10−3 10−2 10−1 SNRs(dB) S y m b o l E rr o r R a te Lamy’s VLECPC, R = Rc/Rs= 1/2.645 = 0.378

Buttigieg’s and Wang’s VLECPCs, R = 1/2.621 = 0.381 Optimal VLECPC with Bdfree= 1.8268, R = 1/2.413 = 0.414

Optimal VLECPC with smallest Bdfree= 0.0164, R = 1/2.413 = 0.414

Figure 6.5: Error performances of different (3rd order) VLECPCs for a binary non-uniform

source with p0 = 0.8. The number of 3-bit source symbols per transmission block is 10,

which is equivalent to 30 source information bits. The free distance dfree for all VLECPCs

(43)

We next examine in Figure 6.5 the improvement in error performance between the optimal code construction in Chapter 3 and the modified optimal one (that guarantees

to output the optimal VLECPC with the smallest Bdfree) in Section 4.1. Here, we group

three information bits, generated from a binary non-uniform memoryless source with

bit probability p0 , Pr(0) = 0.8, as one source symbol; hence, the VLECPCs are 3rd

order VLCs (i.e., k = 3), and the size of the source alphabet is K = 23 _{= 8. Also}

shown in the same figure are the error performances of three VLECPCs respectively obtained by Buttigieg’s [7], Lamy’s [23] and Wang’s [30] code construction algorithms,

which have the same free distance dfree = 7 as the optimal VLECPCs we constructed,

where Buttigieg’s and Lamy’s algorithms coincidentally yield an identical code in this case. In each simulation, 10 source symbols (equivalently, 30 source information bits) are encoded and transmitted as a block. All codes are decoded using the TP-SMAP decoder of Chapter 5. Figure 6.5 shows that our optimal VLECPC constructed by the algorithm proposed in Chapter 3 has around 0.8 dB coding gain over the three existing VLECPCs;

it also indicates that minimizing Bdfree can further pick up another 0.1 dB in performance

gain.

Table 6.4 summarizes the decoding complexity of the TP-SMAP for the VLECPCs of Figure 6.5. We notice that a VLECPC with higher average codeword length requires a higher decoding complexity. This is somehow anticipated since the decoding trellis is larger for a VLECPC with higher average codeword length. Along this observation, the

optimal VLECPC and the optimal VLECPC with the smallest Bdfree have expectedly

similar decoding complexity because they have identical average codeword length. In addition, with a smaller (actually, the minimum) average codeword length, our optimal VLECPC decodes faster via the TP-SMAP than the other three VLECPCs.

We next test the performance of the suboptimal code construction algorithm of Sec-tion 4.2 for the 26-symbol English data source. Since there are two different distribu-tions for the English alphabet that are generally used in the literature for constructing VLECPCs (e.g., compare [25, 30, 20, 26] with [7, 9, 13, 18]), we provide simulation results for both distributions; we will refer to them as Distributions 1 and 2, respectively. The

(44)

VLECPCs we obtain via our suboptimal code construction algorithm are presented in Tables 6.5 and 6.6 for Distributions 1 and 2, respectively.

In Table 6.7(a), we list, for different values of dfree, the average codeword lengths

(ALs) of the resulting VLECPCs under Distribution 1 as well as the execution time needed for their construction via our suboptimal algorithm and the three algorithms referred above. For the sake of completeness, the parameters used in each algorithm

are reported in Table 6.7(b).3 _{These parameters are chosen through a number of trials}

in targeting a VLECPC with smaller average codeword length. The results indicate that by manipulating the parameters, the VLECPCs obtained by our suboptimal code construction algorithm can outperform all other three VLECPCs in average codeword length. Table 6.7(a) also shows that our suboptimal code construction algorithm is worse

than Lamy’s or Wang’s algorithms in terms of execution time for dfree ≤ 9; however, we

can prevent the construction complexity of our algorithm from growing too quickly for

dfree ≥ 10 by properly adjusting its parameters under the premise that our algorithm

can still yield a better code than the other three algorithms. Similar conclusions can be drawn about the performance of the above algorithms under Distribution 2; the results are presented in Table 6.8.

Analogously to other schemes, many combinations of parameters need to be tested in our suboptimal algorithm to arrive at a good VLECPC construction. The main parame-ters that control the algorithm’s complexity are the early-elimination window ∆ and the Encoding Stack size Γ. Usually, complexity increases when either ∆ or Γ increase, albeit with the benefit of improving the VLECPC average codeword length. In general, it is not straightforward to decide on the right choice of values for these parameters before testing them. Despite this inconvenience, the proposed suboptimal approach is efficient enough to test many combinations of parameters in reasonable time. For example, to get the

sub-optimal VLECPC with dfree= 3 in Table 6.5, we simulated all combinations of the

follow-ing parameters: ∆ = {1, 3, 5, 7, 9, 11, 13}, Γ = {20, 40, 60, 80, 100, 200, 300, 400, 500, 1000} 3

Buttigieg’s algorithm (specifically, MVA in [7]) and Wang’s algorithm [30] are characterized by two parameters, L1 and Lmax. An additional parameter Ls is needed for Lamy’s algorithm (specifically,

(45)

Table 6.5: The VLECPCs for the English alphabet with Distribution 1 obtained by the suboptimal code construction algorithm for different values of free distance.

Alphabet Probability dfree= 3 dfree= 5 dfree= 7 dfree= 9 dfree= 10 dfree= 11

E 0.14878610 0111 00001 00000000 00101101 000100000 0000000000 T 0.09354149 00101 011110 11111111 111111100 0000011110 00001011111 A 0.08833733 11011 0101011 000011111 1111000111 00101100111 000111101001 O 0.07245769 000110 1010000 111100001 11001000100 11011011000 0011010101111 R 0.06872164 010011 00110100 0011010100 110001111011 010111101100 00111100111001 N 0.06498532 101111 10010011 1100110011 0101010010100 101010010011 11101011100110 H 0.05831331 111010 11101111 01011010010 1001001100011 0110111000010 010101110010101 I 0.05644515 0001011 011001011 10101010101 00010000010001 1111100111101 111011101111010 S 0.05537763 1000100 101111100 11000101001 10100010101010 10110011110101 0111110110110011 D 0.04376834 1011001 110000100 001111001100 001100101001000 11001100001011 1100011111011100 L 0.04123298 1110010 1011110111 010101100010 100000110110011 011010110110011 01101101110011010 U 0.02762209 00000011 1101000010 101010010001 0001101010110111 100111011101111 11010010111110110 P 0.02575393 00000100 11000100111 110000111100 0100011011001010 111101101010100 101001001101110100 F 0.02455297 10001111 11110101000 0101001110110 1000000001110000 0110001101111001 1101011110110111100 M 0.02361889 10010101 110001010011 0110011000011 01000010011110111 1011110110000101 1110100001100110110 C 0.02081665 10100001 110111001100 0110100111001 01011011101011001 01001001110110110 01110010111111001010 W 0.01868161 10100110 1100010101000 1001011011001 10000110110001010 10100010110010001 011110001011111010011 G 0.01521216 11000000 1101110000010 1001110000110 000101101101110010 11010101111101001 0110011110111111000011 Y 0.01521216 010000011 11011101010111 1010001101100 010000110101010101 010010011101010011 1101010001011110010110 B 0.01267680 010000100 11011101101000 00110011110010 100110111010001000 110000101110001100 10100110101111111000101 V 0.01160928 100100000 110111000010111 01011001100111 0001101011011010001 111111100111111010 11101100000001111010010 K 0.00867360 110001101 110111010111001 01100101111100 0100001011101101010 1000110101001001101 010001101100111111000011 X 0.00146784 1000001001 110111011001100 01101100001011 0100001100000010000 1100101011110000111 101110010000111111010010 J 0.00080064 1100001111 1101110001111001 10011001011001 00010110111110110001 1110010001011110110 111011000001000110110110 Q 0.00080064 1100011100 1101110101100100 10100110010101 00011010110010011110 11100010011011000111 0101010101000111111000011 Z 0.00053376 10000010100 1101110110111111 001101101111001 01000011010011001000 11100101110110101011 1010110010001001110110010 33

(46)

Table 6.6: The VLECPCs for the English alphabet with Distribution 2 obtained by the suboptimal code construction algorithm for different values of free distance.

Alphabet Probability dfree= 3 dfree= 5 dfree= 7 dfree= 9 dfree= 10 dfree= 11

E 0.1270 0111 000000 0011111 00000101 000100000 0000000001 T 0.0906 00011 111111 01000110 001110011 0000011110 00001111101 A 0.0817 11101 0001110 000010000 0101101000 00101100111 011100011010 O 0.0751 001010 1111000 111101101 01110111001 11011011000 0111011101010 I 0.0697 010011 00101001 0001001001 001111100110 010111101100 10110110111000 N 0.0674 101111 11010110 1110111000 110010011000 101010010011 11011101000111 S 0.0633 110110 010110100 00000101100 0100111011111 0110111000010 101100101001110 H 0.0609 0010010 101100110 10001110011 01110110110110 1111100111101 111011010110000 R 0.0599 0100000 110010011 11110000001 10001011011100 10110011110101 1011100111010011 D 0.0425 1000110 111001101 010110100010 100011100011010 11001100001011 1110011000101100 L 0.0403 1011001 0100010101 100111010001 110100010101110 011010110110011 11011010110010100 C 0.0278 1101011 0101001011 101001111010 0000101011001010 100111011101111 110110100010011110 U 0.0276 10001011 1000110010 111000100101 1010111100111110 111101101010100 111011101101100010 M 0.0241 10010100 1010011001 0001001110101 1101011111010000 0110001101111001 1011101101111000000 W 0.0236 10100001 1011100101 1000011101010 01010000110010010 1011110110000101 1101101111000101111 F 0.0223 10100110 01001010101 1100100100001 10011111111001011 01001001110110110 1110010010011010010 G 0.0202 11000010 01011001011 00100010100011 11111011010111110 10100010110010001 10111011000010101110 Y 0.0197 11000101 10001100011 10110011110100 010111111101011110 11010101111101001 11101010111101110100 P 0.0193 000000100 10110100101 11001111110011 100111001010001010 010010011101010011 111001001001011110110 B 0.0149 100000001 011001000111 11011000101010 111000010011001110 011101110110101011 111010110000101101010 V 0.0098 100001111 100001010011 001001000100011 0010111010011001110 111000001010001101 111110100111111011101 K 0.0077 100100010 111010100011 011011001111100 1001000011010101010 1000110101001001101 1011101100000110000110 J 0.0015 0000001111 0011111100011 100111011100000 1110011011110010010 1011011101101111010 1110101101111111101101 X 0.0014 0000011010 00100111100011 0010001011100000 01101000011111001011 1110001001011110101 10111011011011111101100 Q 0.0010 00000111010 11000010100011 1101000011110100 10100000011010111110 11100111011011000111 11101000100101100010111 Z 0.0007 000001110010 011101111100011 1110001100100011 11011101111011001110 11111100100110111001 11111010001100010111010 34

(47)

Table 6.7: List of the VLECPCs obtained by three existing code construction schemes and the VLECPCs obtained by our suboptimal code construction algorithm for the 26-symbol English alphabet with Distribution 1 given in Table 6.5: (a) Average codeword lengths (ALs) of the found codes and execution time for each code construction algorithm; (b)

Parameters used in each algorithm. The suboptimal algorithm is initialized with Ub set to

equal the smallest of the average codeword lengths of the VLECPCs by Buttigieg, Lamy and Wang.

(a)

Algorithm Buttigieg’s Lamy’s Wang’s Suboptimal AL Time AL Time AL Time AL Time dfree= 3 6.272617 2m2s 6.309980 4s 6.266612 <1s 6.189350 18s dfree= 5 8.378035 6m42s 8.400986 44s 8.378035 12s 8.333866 2m27s dfree= 7 10.559646 4h31m 10.599945 5m43s 10.488923 27s 10.302508 8m41s d_free_{= 9} _12.737255 _6h27m _12.806644 _9m52s _12.737255 _2m30s _12.532291 _5m29s dfree= 10 12.757672 11h45m 12.867893 17m54s 12.757672 47m46s 12.593140 9m35s d_free_{= 11} _14.876166 _19h14m _15.354549 _21m43s _15.024952 _2h15m _14.580329 _14m53s (b)

Algorithm Buttigieg’s Lamy’s Wang’s Suboptimal

Parameters (L1, Lmax) (L1, Lmax, Ls) (L1, Lmax) (∆, Γ, D, I)

dfree = 3 (4, 13) (4, 13, 10) (4, 13) (5, 300, Dm, 2) dfree = 5 (6, 15) (6, 15, 12) (6, 15) (3, 500, Dl, 1) dfree = 7 (7, 16) (7, 16, 13) (7, 16) (5, 2000, Dm, 1) dfree = 9 (9, 18) (9, 18, 15) (9, 18) (1, 60, Dm, 1) dfree = 10 (10, 19) (10, 19, 15) (10, 19) (1, 40, Dl, 2) dfree = 11 (12, 21) (12, 21, 17) (12, 21) (1, 4, Dl, 1)

(48)

Table 6.8: List of the VLECPCs obtained by three existing code construction schemes and the VLECPCs obtained by our suboptimal code construction algorithm for the 26-symbol English alphabet with Distribution 2 given in Table 6.6: (a) Average codeword lengths (ALs) of the found codes and execution time for each code construction algorithm; (b)

Parameters used in each algorithm. The suboptimal algorithm is initialized with Ub set to

equal the smallest of the average codeword lengths of the VLECPCs by Buttigieg, Lamy and Wang.

(a)

Algorithm Buttigieg’s Lamy’s Wang’s Suboptimal AL Time AL Time AL Time AL Time dfree= 3 6.4038 20s 6.4047 14s 6.3574 <1s 6.2560 7s dfree= 5 8.4740 5m16s 8.5049 47s 8.4740 9s 8.3223 1m13s dfree= 7 10.5388 1h55m 10.5110 12m01s 10.5388 47s 10.3615 12m13s d_free_{= 9} _12.8898 _3h14m _12.9644 _13m04s _12.8898 _4m22s _12.6647 _6m03s dfree= 10 12.8959 9h10m 13.0095 58m29s 12.8959 19m41s 12.7507 8m49s d_free_{= 11} _15.0345 _17h37m _15.0846 _38m53s _15.0345 _1h20m _14.6521 _16m12s (b)

Algorithm Buttigieg’s Lamy’s Wang’s Suboptimal

Parameters (L1, Lmax) (L1, Lmax, Ls) (L1, Lmax) (∆, Γ, D, I)

dfree = 3 (4, 13) (4, 13, 13) (4, 13) (6, 200, Dm, 1) dfree = 5 (6, 15) (6, 15, 13) (6, 15) (2, 250, Dm, 1) dfree = 7 (7, 18) (7, 18, 15) (7, 18) (1, 3000, Dm, 1) dfree = 9 (9, 18) (9, 18, 16) (9, 18) (1, 20, Dl, 1) dfree = 10 (10, 20) (10, 20, 17) (10, 20) (3, 40, Dl, 1) dfree = 11 (11, 21) (11, 21, 18) (11, 21) (1, 12, Dl, 1)

基於搜尋演算法的不定長度錯誤更正前置碼之設計

國

立

交

通

大

學

電信工程研究所

博

士

論

文

基於搜尋演算法的不定長度錯誤更正前置碼之設計

Algorithmic Design of Variable-Length Error-Correcting Prefix Code

研 究 生：吳庭伊

指導教授：陳伯寧 教授

基於搜尋演算法的不定長度錯誤更正前置碼之設計

Algorithmic Design of Variable-Length Error-Correcting Prefix Code

研 究 生：吳庭伊 Student：Ting-Yi Wu

指導教授：陳伯寧 Advisor：Po-Ning Chen

國 立 交 通 大 學

電信工程研究所

博 士 論 文

基於搜尋演算法的不定長度錯誤更正前置碼之設計

學生：吳庭伊

指導教授：陳伯寧

國立交通大學電信工程研究所博士班

摘

要

Algorithmic Design of Variable-Length Error-Correcting Prefix Code

Student: Ting-Yi Wu

Advisor: Po-Ning Chen

Institute of Communications Engineering

National Chiao Tung University

ABSTRACT

Acknowledgements

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Overview

1.2

Contributions

Chapter 2

Problem Formulation and

Preliminaries

2.1

Sequence MAP Decoding Criterion

2.2

VLECPC Trellis Diagrams

2.3

Free Distance

Chapter 3

Optimal VLECPC Construction

Chapter 4

Modified VLECPC Constructions

4.1

Finding an optimal VLECPC with the smallest

B

4.2

Suboptimal code construction with parameters

(∆, Γ, D, I)

Chapter 5

Two-Phase Sequence MAP

(TP-SMAP) Decoding

Chapter 6

Simulation Results

研究生：吳庭伊

指導教授：陳伯寧教授

研究生：吳庭伊 Student：Ting-Yi Wu

國立交通大學

博士論文