Maximum-Likelihood Priority-First Search Decodable Codes for Combined Channel Estimation and Error Correction

(1)

Maximum-Likelihood Priority-First Search

Decodable Codes for Combined Channel

Estimation and Error Correction

Chia-Lung Wu, Student Member, IEEE, Po-Ning Chen, Senior Member, IEEE,

Yunghsiang S. Han, Senior Member, IEEE, and Ming-Hsin Kuo

Abstract—The coding technique that combines channel

es-timation and error correction has received attention recently, and has been regarded as a promising approach to counter the effects of multipath fading. It has been shown by simulation that a proper code design that jointly considers channel estimation can improve the system performance subject to a fixed code rate as compared to a conventional system which performs channel estimation and error correction separately. Nevertheless, the major obstacle that prevents the practice of such coding technique is that the existing codes are mostly searched by computers, and subsequently exhibit no apparent structure for efficient decoding. Hence, the operation-intensive exhaustive search becomes the only decoding option, and the decoding complexity increases dramatically with codeword length. In this paper, a systematic construction is derived for a class of structured codes that support joint channel estimation and error correction. It is confirmed by simulation that these codes have comparable performance to the best simulated-annealing-based computer-searched codes. Moreover, the systematically constructed codes can now be max-imum-likelihoodly decoded with respect to the unknown-channel criterion in terms of a newly derived recursive metric for use by the priority-first search decoding algorithm. Thus, the decoding complexity is significantly reduced as compared with that of an exhaustive decoder.

Index Terms—Channel coding, fading channels, multipath

chan-nels, frequency-selective fading, maximum likelihood decoding, se-quential decoding.

I. INTRODUCTION

C

URRENTLY, a typical receiver in a wireless communica-tion system performs channel estimacommunica-tion and data estima-tion separately. The former task estimates channel characteris-tics based on a known training sequence or pilot, while the latter uses these characteristics to estimate the transmitted coded data. Manuscript received December 12, 2007; revised December 13, 2008. Cur-rent version published August 19, 2009. This work was supported by the NSC of Taiwan, R.O.C., by Grant NSC 95-2221-E-009-054-MY3. The material in this paper was presented in part at the International Symposium on Information Theory and Its Applications, Auckland, New Zealand, December 2008.

C.-L. Wu and P.-N. Chen are with the Department of Communication Engineering, National Chiao-Tung Universtiy, Taiwan, R.O.C. (e-mail: clwu@ banyan.cm.nctu.edu.tw; qponing@mail.nctu.edu.tw).

Y. S. Han is with the Graduate Institute of Communication Engineering, Na-tional Taipei University, 237 Taiwan, R.O.C. He is also with the Department of Computer Science and Information Engineering, National Chi Nan University, Taiwan, R.O.C. (e-mail: yshan@mail.ntpu.edu.tw).

M.-H. Kuo is with Pagetron Corp., Taipei, Taiwan, R.O.C. (e-mail: shin30@ gmail.com).

Communicated by H.-A. Loeliger, Associate Editor for Coding Techniques. Color versions of Figures 1–6 in this paper are available online at http://iee-explore.ieee.org.

Digital Object Identifier 10.1109/TIT.2009.2025548

Recent research results [3], [6], [13], [14] have confirmed that better system performance can be obtained by jointly per-forming channel and data estimation, as compared to a typical system that performs these tasks separately. In 1994, Seshadri [13] proposed a blind maximum-likelihood sequence estimator (MLSE) that performs the two tasks simultaneously. Skoglund et al. [14] later provided a milestone evidence that a code de-sign that jointly considers channel estimation and error correc-tion is able to counter multipath block fading more efficiently than the approach with a separate error-correcting code and channel estimation scheme. They also applied the same idea to a multiple-input multiple-output (MIMO) system as described in a subsequent publication [6]. In short, Skoglund et al., by computer search, identified nonlinear codes that support joint channel estimation and error correction in a multipath block fading channel. Through simulations, they found that a commu-nication system using these nonlinear codes can outperform a typical communication system with perfect channel estimation by 2 dB. Their results hint that a single, perhaps nonlinear, code may improve the transmission rate in a highly mobile environ-ment in which traditional channel estimation becomes techni-cally infeasible. A similar idea was also proposed by [3], and the authors actually named such codes training codes.

One of the drawbacks of these joint estimation codes found by computer search is that they lack a systematic structure, and can therefore be decoded only by an operation-intensive exhaustive search. This naturally leads to the research query of how to con-struct an efficiently decodable code that supports joint channel estimation and error correction.

In this paper, this query was resolved first by discovering that regardless of the fading statistics, the codeword that maximizes the system signal-to-noise ratio (SNR) must be orthogonal to the delayed version of itself. We termed this property self-or-thogonality. Second, we found that the code that consists of properly chosen self-orthogonal codewords has a performance comparable to that of the simulated-annealing-based computer-searched code. Because the maximum-likelihood metrics for self-orthogonal codewords can be equivalently transformed into a recursively formulated metric, it is finally shown that these structured codes can be maximum-likelihoodly decoded by the priority-first search algorithm [2], [7], [9], [12], resulting in a decoding complexity significantly smaller than that required by exhaustive decoding.

(2)

for this work. Section III establishes the self-orthogonal code-word-selection condition that optimizes the system SNR regard-less of the fading statistics, and then uses it to construct codes for joint channel and data estimation. The recursive maximum-like-lihood decoding metrics for the constructed codes are derived in Section IV. Simulations are summarized and discussed in Section V. Section VI concludes the paper.

In this work, superscripts “ ” and “ ” are specifically re-served for the matrix operations of Hermitian transpose and transpose, respectively [8].

II. BACKGROUND

A. System Model and Maximum-Likelihood Decoding Criterion

Suppose a codeword of an

code is transmitted over a block fading (specifically, quasi-static fading) channel of memory order , where

each . Denote the channel coefficients by

, and assume that they are constant within a coding block of length . By letting the codeword matrix be .. . . .. ... .. . . .. . .. .. . . .. ... ...

the complex-valued received vector is given by

(1) where is zero-mean complex-Gaussian distributed with

, and is the identity matrix. We

then make the following assumptions: both transmitter and receiver know nothing about the channel coefficients , but have knowledge of the multipath parameter . Also, there are adequate guard periods between consecutive encoding blocks such that zero interblock interference is guaranteed. Based on the system model in (1) and the above two assumptions, the least square estimate of the channel coefficients for a

given (alternatively, ) equals , and the

joint maximum-likelihood (ML) decision for the transmitted codeword becomes [1]

(2)

where . Note that the mapping from a

codeword to the corresponding transformed codeword is not one-to-one unless is fixed. For convenience, we will al-ways set for the codes we construct.1

1_{Under the setting, it is obvious that the largest code rate attainable by our}

code design is(N 0 1)=N.

B. Code Designs for Joint Channel and Data Estimation In the literature, no systematic code constructions have been proposed for joint channel and data estimation in quasi-static fading channels. Efforts have mostly been invested in computer searches for codes that counter channel fading [3], [6], [10], [11], [14], [15], [17]. The decoding of such structureless com-puter-searched codes thus becomes an engineering challenge.

In 2002, Skoglund et al. [14] relied on simulated annealing to search for nonlinear binary block codes suitable for joint channel and data estimation in quasi-static fading channels. As optimization criterion, they used the sum of all pairwise error probabilities (PEP) under equal prior probabilities. Although the operating signal-to-noise ratio (SNR) for the code search was set at 10 dB, their simulation results demonstrated that their codes perform well under a wide range of SNRs. In addition, the mis-match in the relative powers of different channel coefficients, as well as in the channel Rice factors [16], has no big effect on the resulting performance either. Their results indicate that the nonlinear estimation codes can outperform a typical linear error correcting code operated with a perfect channel estimator.

Later in 2005, Coskun and Chugg [3] replaced the PEP sum by a properly defined pairwise distance measure between two code-words, and proposed a suboptimal greedy algorithm to speed up the code search process. In 2007, Giese and Skoglund [6] re-ap-plied their original idea to single and multiple-antenna systems, and used the asymptotic PEP and the generic gradient-search algorithm, respectively, in place of the PEP and the simulated-annealing algorithm in [14] to reduce system complexity.

In [14], the authors point out that “an important topic for fur-ther research is to study how the decoding complexity of the proposed scheme can be decreased.” Moreover, they state that “one main issue is to investigate what kind of structure should be enforced on the code to allow for simplified decoding.” Mo-tivated by these remarks, we take here a different approach for code design. Specifically, we establish a systematic code design constraint for joint channel and data estimation in quasi-static fading channels, and show that the codes constructed based on this constraint can maximize the system SNR regardless of the fading statistics. As it so happens that the computer-searched codes in [14] also satisfy this constraint, their insensitivity to SNR and channel mismatch now find a theoretical support.

Although a recursive metric had been derived in [1] from joint maximum-likelihood decoding metric, however, there is no ef-ficient decoding algorithm that can exploit it due to structure-less code design. Taking advantage of the systematic structure of our codes, we can then derive a recursive maximum-likeli-hood decoding metric that can be used in the priority-first search decoding algorithm. The decoding complexity is therefore sig-nificantly decreased in contrast to that of the exhaustive decoder required by the structureless computer-searched codes.

It is worth mentioning that although the codes selected by computer search in [6] and [14] target unknown channels, for which the channel coefficients are assumed constant within a given coding block, the evaluation of the PEP criterion does presume knowledge of the channel statistics. Even if the de-pendence of the code design on channel statistics is relaxed in [3], the pairwise distance criterion proposed therein is still for computer search, and no systematic code design is resulted. The

(3)

code constructed based on the algorithm we propose, however, is guaranteed to achieve an acceptable system SNR regardless of the statistics of the channel. This suggests that our systemati-cally constructed codes are also suitable in cases where channel blindness becomes a stringent system restriction.

C. The Maximum-Likelihood Priority-First Search Decoding Algorithm

A code tree of an binary code represents every code-word as a path on a binary tree. Each branch on the code tree is labeled with the appropriate code bit . We can then denote the path ending at a node at level by the sequence of branch labels it traverses. For convenience, we

abbre-viate as , and will drop the subscript when

. The successor paths of a path are those whose first labels are exactly the same as .

The priority-first search algorithm (also known as the best-first search algorithm) is a common graph search algorithm that explores a graph by expanding the most promising path selected according to some criterion. Examples are Algorithm [12], Dijkstra’s Algorithm [2], or Stack Algorithm [9]. In implemen-tation, the most promising path is usually drawn from a list of candidates in a stack or a priority queue. One of the main distinc-tions among the family of priority-first search algorithms is the metric associated with paths on the search graph.2_{By adopting}

different metrics, some algorithms guarantee optimal search re-sults, while some can only yield suboptimal ones. A typical pri-ority-first search algorithm is exemplified by the following se-quence of operations.

Step 1. Load the stack with the path that ends at the original node.

Step 2. Evaluate the metric values of the successor paths of the current top path in the stack. Then delete this top path from the stack.

Step 3. Insert the successor paths obtained in Step 2 into the stack such that the paths in the stack are ordered according to their ascending metric values.

Step 4. If the top path in the stack ends at a terminal node in the code tree, output the labels corresponding to the top path, and the algorithm stops; otherwise, go to Step 2.

Next, we give a sufficient condition under which the above priority-first search algorithm is guaranteed to locate the path with the smallest metric among all paths.

Lemma 1: If the metric is nondecreasing along every path in the code tree, i.e.

(3)

then the priority-first search algorithm always yields the code path with the smallest metric value among all code paths of .

Proof: Let be the first top path that reaches a terminal node (and hence, is the output code path of the priority-first search

2_{In the optimization literature, this metric is sometimes called evaluation}

function. Since we apply the algorithm in decoding, we adopt the term metric

in this work.

algorithm.) Then, Step 3 of the algorithm ensures that is no larger than the metric value of any path currently in the stack. Since condition (3) guarantees that the metric value of any other code path, which should be the offspring of some path cur-rently existing in the stack, is no less than , we have

Consequently, the lemma follows.

When defining a metric , it is convenient to represent it as the sum of two components

The first component is directly defined based on the max-imum-likelihood metric such that

After is defined, the second component is designed to

val-idate (3) with for any . Then from

for all , the desired maximum-likeli-hood priority-first search decoding algorithm is established. A typical interpretation of the so-called heuristic function is that it helps predict a future route from the end node of the current path to a terminal node [7]. Notably, the design of the heuristic function that validates (3) is not unique. Different designs may result in variations in computational complexity.

III. CODECONSTRUCTION

A. Code Constraint That Maximizes the Average SNR Regardless of Channel Statistics

From the system model in (1), it can be derived that the av-erage SNR conditional on the input satisfies

(4) Since both transmitter and receiver know nothing about the channel coefficients , the average SNR can be as worse as

where is a certain (possibly unknown) power level on the channel coefficients . We then found that such a worst-case SNR can be upper-bounded by a constant, i.e.

where the above inequality holds since an upper bound can be

resulted by taking any that satisfies into

, and here we take to be zero-mean i.i.d. with . It is thus straightforward from (4) that this constant SNR bound can be achieved even if the system is totally blind on channel coefficients (as well as the power

(4)

level ), when the codeword is designed to be self-orthogonal in the sense that

(5) Condition (5) actually has an operational meaning. It ensures that every codeword is orthogonal to the shifted version of it-self, and hence temporal diversity can be implicitly realized even under completely no knowledge on channel statistics. We henceforth say that codewords constrained on (5) maximize the average SNR attainable regardless of the statistics of [5].

Unfortunately, a codeword sequence satisfying (5) is only guaranteed to exist for with odd (and trivially, for ). In some other cases, such as , one can only de-sign codes to approximately satisfy (5). For example

for even and

for odd

We therefore relax (5) and allow some off-diagonal entries in to be either or 1 whenever it is impossible to strictly satisfy (5). We will denote such a matrix as .

After the establishment of (5), we find that this particular structure of can really be observed in the simulated-an-nealing-based computer-searched codes. Specifically, for and even, the best computer-searched half-rate codes that minimize the sum of PEPs under complex zero-mean

Gaussian distributed with and all

satisfy the relation

(6) We have also obtained and examined the computer-searched code used in [14] for , and found as anticipated that every codeword satisfies (6).

We close this subsection by stating some existing results in the literature that correspond to condition (5). The authors in [4] suggest that for an optimal channel estimation, the training sequences can be chosen such that is proportional to . Their observation agrees with what we obtained in (5). More-over, condition (5) also has been identified in [6] where the au-thors remark ([6, p. 1591]) that a code sequence with a certain aperiodic autocorrelation property possibly could be exploited in future code design approaches. This is indeed one of the main research goals of this paper.

B. Equivalent System Model for Joint Channel and Data Estimation

By noting that is idempotent and symmetric, and both

and equal , where denotes the

op-eration to transform a matrix into a vector,3_{the joint ML decision}

in (2) can be reformulated as

(7) This implies that the ML decision can be obtained by finding the codeword whose Euclidean distance to is the smallest. We can then bound the ML error probability by

(8) where is the th codeword of an block code, and denotes the equivalent th codeword in the -domain. By the

self-orthogonal property, . The

PEP-based upper bound in (8) then suggests that a good self-or-thogonal code design should have an adequately large pairwise Euclidean distance

(9) between all codeword pairs and , where is the equiva-lent th codeword in the -domain. Based on this observation, we may infer under equal prior probabilities that a uniform draw

of codewords satisfying may asymptotically

re-sult in a good code. This is conceptually equivalent to a uniform pick of codewords in a set of self-orthogonal binary sequences. We recall that our initial research query is how to construct an efficiently decodable code that supports joint channel esti-mation and error correction. In order to achieve this goal for the priority-first search decoding algorithm, we need an efficient and systematic way to generate the successor paths of the top path. In particular, we would like to have a code tree that can be spanned in an on-the-fly or bit-by-bit fashion. The uniform pick principle then suggests that considering only the self-orthogonal sequences with the same prefix , the ratio of the number of self-orthogonal codewords satisfying to the number of all self-orthogonal sequences having the same must be made equal to the similar ratio for self-orthogonal codewords satis-fying , whenever possible. Mathematically, this can be expressed as

(10)

3_{For an}_{M 2 N matrix ; vec( ) is defined as}

(5)

where is the set of all codewords whose first bits

equal , and is the set of all binary

sequences of length , whose first bits equal ,

and whose -representation satisfies . Accordingly,

given the index of the codeword, where ,

and given the previous bits , whether

the next code bit is or can be determined

con-ceptually by checking whether is less than or larger than . A specific code design algorithm will be given in the next subsection.

C. Exemplified Code Design Algorithm for Channels of Memory Order One

In this subsection, we provide an exemplified code design algorithm based on the uniform pick principle for channels of memory order 1, namely, . The code design algorithm for channels with higher memory order can be similarly built.

For , we define

Note that when cannot be satisfied as

aforementioned for even, and will be used instead to define the relaxed self-orthogonal codewords. In such case, the uniform pick principle again suggests that half of the code-words should be uniformly drawn from binary sequences satis-fying , and the other half of codewords are selected

according to . The proposed codeword selection

process is simply to list all the sequences satisfying the desired self-orthogonal property in binary-alphabetical order, starting from zero, and uniformly pick the codewords from the ordered list in every interval with

for (11)

where for odd, and for even. As a

result, the selected codewords are those sequences with indices

closest to for . The

codeword mapping algorithm is summarized by the following list.

Step 1. Input the index of the requested codeword in the

block code, where .

Step 2. Set for odd, and for

even. Also, set

. Compute according

to (11). Initialize and

. Let the minimum

sequence index .

Step 3. Execute , and compute

.

If , then choose the next code bit

;

otherwise choose the next code bit , and

readjust .

Step 4. If , output the corresponding codeword , and the algorithm stops; otherwise, go to Step 3.

In implementing the above algorithm, it is perhaps more con-venient to calculate recursively4_{such that the codeword}

map-ping can be performed in an on-the-fly or bit-by-bit systematic fashion with respect to the given codeword index . This recur-sive nature also facilitates the priority-first decoding search at the receiver, since branches of the code tree will only be spanned when necessary.

IV. MAXIMUM-LIKELIHOODMETRICS FORPRIORITY-FIRST SEARCHDECODING

In this section, we will establish two different metric func-tions to be used by the priority-first search algorithm. The first metric is

(12)

where is derived in Section IV-A, and is

the all-zero function (cf. Section IV-B). The second metric is (13)

with the same as in , and with defined in

Section IV-C. Both metrics will lead to an ML decoding. The difference is that can be computed on-the-fly, and will there-fore cause much less delay in the decoding. For the evaluation of , however, one needs to know all received symbols, but the computational complexity of is one order of magnitude less than that of .

A. Recursive Maximum-Likelihood Metric

Let subcode be the set of codewords that satisfy

, where takes value in . Hence, , and

whenever . Since a transmitted codeword be-longs to only one of the subcodes, to maintain individual stacks for priority-first codeword searching over each subcode will in-troduce considerable unnecessary decoding burden, especially for the subcodes that the transmitted codeword does not belong to. Hence, only one stack is maintained during the entire pri-ority-first search, and the metric function values for different subcodes are compared and sorted in the same stack. The path

4_Initializing_{b = 0; m = and
= jA(b j} _{)j, and setting m} ₌

m 0 b b for0 ` < N, we obtain for P = 2 that if jm + b j N 0 `, and (b ; b ) = (01; 1); = 1 1 2(N 0 `)1 (N 0 ` 0 m ) 0 1 N 0 ` + m + 1 11 fjm + 2j N 0 ` 0 1g where1f 1 g is the set indicator function. If jm + b j N 0 `, but (b ; b ) 6= (01; 1), then = 1 1 2(N 0 `)1 (N 0 ` + m + 1 0 b b + b ) 11 fjm 0 b b + b j N 0 ` 0 1g : If howeverjm + b j > N 0 `, then = 0; for (b ; b ) 6= (01; 1) or((b ; b ) = (01; 1) and m 6= 0N + ` 0 1) 1; otherwise.

(6)

to be expanded next is therefore the one whose metric function value is the smallest globally.

By denoting , and letting the matrix

entry of be , we can continue the derivation from (7) as follows:

where for convenience, we put for . After

ad-justing indices, the derivation can be resumed as

(14) where

As the maximum-likelihood decision remains unchanged by adding a constant that is independent of the codeword , we add a constant to make the decision criterion nonnegative5_:

It remains to prove that the metric of

5_{Here, a nonnegative maximum-likelihood criterion makes possible the later}

definition of path metricg(bbb ) to be nondecreasing along any path in the code tree. It can then be anticipated (cf. Section IV-B) that letting the heuristic func-tion be zero for all paths in the code tree suffices to result in a metric funcfunc-tion satisfying the condition (3) in Lemma 1.

Note that the additive constant that makes the metric function nondecreasing along any path in the code tree can also be obtained by first definingg based on (14), and then determining its respective' according to (3). Such an ap-proach however complicates the determination of the heuristic function' when we additionally require the metric function to be recursive-computable. The al-ternative approach that directly defines a recursive-computableg based on a nonnegative maximum-likelihood criterion is accordingly adopted in this work.

can be computed recursively. To that aim, we define for every path over code tree f

Then, by for every and ,

we have for

where

(15) and for

This shows that we can recursively compute

and from the previous and

using and , and

setting as initial condition for

.

A final remark in this discussion is that although the compu-tational burden of in (15) increases linearly with , such a linearly increasing burden can be moderately compensated for by the fact that it is only necessary to compute once for each and , because it can be shared for all paths ending at level over the code tree .

B. Heuristic Function

We next derive the first heuristic function that validates (3). Taking the maximum-likelihood metric into the sufficient con-dition in (3) yields

(7)

Fig. 1. The maximum-likelihood word error rates (WERs) of the computer-searched rate code by simulated annealing in [14] (SA-22), the constructed half-rate code with double code trees (Double-22), and the constructed half-half-rate code with single code tree (Single-22). The codeword length isN = 22.

Hence, in addition to , the heuristic function should satisfy

(16)

It is apparent that the all-zero function is the largest one that satisfies this inequality subject to no dependence on the future route and future receptions, i.e., and

. Hence, we choose .

Note that is trivially on-the-fly computable, and hence so is . In comparison with the exhaustive-search decoding, decoding based on recursive priority-first search shows a sig-nificant decrease in computational complexity especially at medium-to-high SNRs.

C. Heuristic Function

If we drop the requirement that the metric must be inde-pendence of future receptions, we can further reduce the com-putational complexity. Upon reception of all , the

heuristic function that satisfies (16) regardless of , can be increased to

(17)

where for and

with initial conditions , and

. Simulations show that compared to the zero-heuristic function , the heuristic function in (17)

(8)

Fig. 2. The maximum-likelihood word error rates (WERs) of the computer-searched code by simulated annealing (SA-N) and the constructed half-rate code with double code trees (Double-N ).

further reduces the number of path expansions during the decoding process up to one order of magnitude (cf. Table I).

V. SIMULATIONRESULTS

In this section, we examine the performance of the codes pro-posed in Section III. We also illustrate the decoding complexity of the maximum-likelihood priority-first search decoding algo-rithm presented in the previous section. For ease of comparison, the channel parameters used in our simulations follow those in [14], where is zero-mean complex-Gaussian distributed with and . The average system SNR is, thus, given by

(18)

since for all simulated codewords.6

6_{The authors in [14] directly define the channel SNR as}_{1= . It is apparent}

that their definition is exactly the limit of (18) asN approaches infinity. Since it is assumed that an adequate guard period between two encoding blocks exists (so that there is no interference between two consecutive decoding blocks), the computation of the system SNR for finiteN should be adjusted to account for this muting (but still part-of-the-decoding-block) guard period. For example, in comparison of the(6; 3) and (20; 10) codes over channels with memory order 1 (i.e.,P = 2), one can easily observe that the former can only transmit 18 code bits in the time interval of 21 code bits, while the latter pushes out up to 20 code bits in the period of the same duration. Thus, under fixed code bit transmission power and fixed component noise power , it is reasonable for the(20; 10) code to result in a higher SNR than the (6; 3) code.

Fig. 1 illustrates the simulation results of three codes: the computer-searched half-rate code obtained by the simulated an-nealing algorithm in [14] (SA-22), the constructed double-tree code with half of the codewords satisfying and the remaining half satisfying (Double-22), and the con-structed single-tree code whose codewords are all selected from the candidate sequences satisfying (Single-22). We observe from Fig. 1 that the Double-22 code performs al-most the same as the SA-22 code. Actually, the simulations il-lustrated in Fig. 2 provide evidence that the performance of the constructed double-tree half-rate codes is as good as the com-puter-searched half-rate codes for all . However, when , the Double- code performs slightly worse than the SA- code. This is because for the approximation in (10) can no longer be well maintained due to the restriction that

must be an integer.

In addition to the Double-22 code, Fig. 1 also depicts simula-tion results of the Single-22 code. Since the pairwise codeword distance in the sense of (9) for the Single-22 code is in general smaller than that of the Double-22 code, its performance has a 0.2 dB degradation compared with that of the Double-22 code. However, we will see in Fig. 3 that the Single-22 code actually has the smallest decoding complexity among the three codes. This suggests that to select codewords uniformly from a single code tree should not be ruled out as a candidate design, espe-cially when the decoding complexity becomes the main system concern.

(9)

Fig. 3. The average numbers of node expansions per information bit for the simulated-annealing-based computer-searched code in [14] by exhaustive decoding (EXH-SA-22), and the constructed single-tree (SEQ-Single-22) and double-tree (SEQ-Double-22) codes using the priority-first search decoding guided by either metric functionf or metric function f .

(10)

Fig. 5. WERs for the codes of Single-22, Double-22, Single-26, Double-26, Single-30, and Double-30.

In Fig. 3, the average numbers of node expansions per infor-mation bit are illustrated for the codes examined in Fig. 1. Since the number of node expansions is exactly equal to the number of tree branch metrics (i.e., one recursion of -function values) computed, the equivalent complexity of exhaustive decoding is correspondingly plotted. It can then be observed that in com-parison with the exhaustive decoder, a significant reduction in computational burden is achieved at moderate-to-high SNRs by adopting the Double-22 code and the priority-first search de-coder with on-the-fly computable metric [see (12)]. Further reduction can be achieved if the Double-22 code is replaced with the Single-22 code. This is because performing the sequential search over multiple code trees introduces extra node expan-sions for those code trees that the transmitted codeword does not belong to. An additional order-of-magnitude reduction in node expansions can be achieved when the metric

[see (13)] is used instead.

The authors in [3] and [14] only focus on the word error rate (WER). No bit error rate (BER) performances that involve the mapping design between the information bit patterns and the codewords are presented. Yet, in certain applications, such as voice transmission and digital radio broadcasting, the BER is generally considered a more critical performance index. In ad-dition, the adoption of the BER performance index, as well as the signal-to-noise ratio per information bit, facilitates the com-parison of codes of different code rates.

Fig. 4 depicts the BER performance of the same codes whose WER performances were depicted in Fig. 1. The corresponding

is computed according to , where

is the code rate. The mapping between the bit pat-terns and the codewords of the given computer-searched code is obtained through simulated annealing by minimizing the upper bound of

BER transmitted

where, other than the notations defined in (8), is the in-formation sequence corresponding to the th codeword, and is the Hamming distance. For the constructed codes of Section III-C, the binary representation of the index of the re-quested codeword in Step 1 is directly taken as the information bit pattern corresponding to the requested codeword. The result illustrated in Fig. 4 then indicates that the BER performance of the three curves are almost the same. Hence we conclude that taking the binary representation of the requested codeword index as the information bit pattern for the constructed code not only makes its implementation easy, but also yields a BER per-formance similar to that of the best simulated-annealing-based computer-searched codes.

Last, we demonstrate the WER and BER performances, re-spectively, of Single-26, Double-26, Single-30, and Double-30 codes, together with those of Single-22 and Double-22 codes, over the quasi-static fading channels in Figs. 5 and 6. Both figures show that the Double-30 code has the best maximum-likelihood performance not only in WER but also in BER. This result concurs with the intuition that a longer code will perform better provided that the channel coefficients remain unchanged

(11)

Fig. 6. BERs for the codes of Single-22, Double-22, Single-26, Double-26, Single-30, and Double-30. TABLE I

AVERAGENUMBER OFNODEEXPANSIONSPERINFORMATIONBIT FOR THEPRIORITY-FIRSTSEARCHDECODING OF THECONSTRUCTEDHALF-RATECODES OF

LENGTH22, 26,AND30

in a coding block. The decoding complexities of the codes are listed in Table I, from which we observe that the saving of de-coding complexity of metric with respect to metric in-creases as the codeword length inin-creases. It is worth mentioning that at very high SNR, the priority-first search decoding over the AWGN channels will directly go all the way down to the terminal nodes, and result in a decoding complexity of approx-imately two node expansions per information bit. However, for

fading channels, the decoding complexity cannot reach the ideal two node expansions per information bit even with zero additive noise, as shown in the last column of Table I. In this regard, metric still reaches a better ultimate decoding complexity than metric .

We close this section by commenting on the attained diver-sity level of the simulated codes. The diversity level serves as approximation of the word error probability at high SNR, i.e.,

(12)

TABLE II

THEATTAINEDDIVERSITYLEVELS OFCODES, WHICHARELEAST-SQUARE-APPROXIMATEDBASED ONWER PERFORMANCECURVESWITHIN8–15 dB

. From Table II, we observe that the attained di-versities of codes of length are around , which is close to the anticipated value of . The tables also suggest that the diversities degrade at small , and the computer-searched codes have somewhat higher diversities within the considered SNR range. We conclude that under the constraint of the self-or-thogonal structure, the simulated codes can turn the second de-layed channel path into another diversity. This results in a blind detection performance of diversity level close to .

VI. CONCLUSION

In this paper, we introduce an algorithm to construct codes that allow joint channel estimation and error correction at the receiver side of a block fading channel. In contrast with previ-ously published codes, our codes are designed systematically and allow for an ML decoding with a much smaller compu-tational complexity than the operation-intensive exhaustive de-coding that was previously used in [3], [6], [14] to decode the structureless computer-searched codes. The given algorithm is based on the optimal signal-to-noise ratio framework and re-quires every codeword to satisfy a self-orthogonal property that helps to counter the effects of multipath fading.

The improved decoding algorithm is a tree-based priority-first search decoding algorithm that uses a recursive maximum-likelihood metric. Simulations demonstrate that the constructed codes have almost identical performance as the best computer-searched codes, but with much smaller decoding complexity.

Moreover, we propose two different maximum-likelihood decoding metrics. The first one can be used in an on-the-fly fashion, while the second one that results in a much lower decoding complexity requires the knowledge of all channel out-puts. We hence have a tradeoff between decoding complexity versus decoding delay.

Note that so far we have ignored an implicit problem of codes that absorb the training sequence into the error-cor-recting codewords: in traditional packet-switched systems, frame synchronization is often achieved by the same training sequence. Without synchronizing the codeword margins, de-coding may become technically infeasible. Nevertheless, there are recent standards starting to consider to partly separate the tasks of frame synchronization and channel estimation. For example, in IEEE 802.16e, initial frame synchronization is performed by means of a preamble, and is later shared by all users. Pilots are then added amid user data for individual channel estimation during data transmission [18]. It is then fair to say that at this stage, the joint channel estimation and error correction codes may only fit well in an initial-sync, or circuit-switched, or TDD-based system environment. It will be an interesting, but quite challenging, future task to further

enhance the proposed codes to possess self-synchronization capability.

ACKNOWLEDGMENT

The authors would like to thank Prof. M. Skoglund, Dr. J. Giese and Prof. S. Parkvall of the Royal Institute of Technology (KTH), Stockholm, Sweden, for kindly providing us their codes for further study in the preparation of this paper.

REFERENCES

[1] K. M. Chugg and A. Polydoros, “MLSE for an unknown channel—Part I: Optimality considerations,” IEEE Trans. Commun., vol. 44, no. 7, pp. 836–846, Jul. 1996.

[2] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction

to Algorithms, 2nd ed. Cambridge, MA: MIT Press, 2001. [3] O. Coskun and K. M. Chugg, “Combined coding and training for

unknown ISI channels,” IEEE Trans. Commun., vol. 53, no. 8, pp. 1310–1322, Aug. 2005.

[4] S. N. Crozier, D. D. Falconer, and S. A. Mahmound, “Least sum of squared errors (LSSE) channel estimation,” Proc. Inst. Elect. Eng., vol. 138, pt. F, pp. 371–378, Aug. 1991.

[5] G. Ganesan and P. Stoica, “Space-time block codes: A maximum SNR approach,” IEEE Trans. Inf. Theory, vol. 47, no. 4, pp. 1650–1659, May 2001.

[6] J. Giese and M. Skoglund, “Single- and multiple-antenna constella-tions for communication over unknown frequency-selective fading channels,” IEEE Trans. Inf. Theory, vol. 53, no. 4, pp. 1584–1594, Apr. 2007.

[7] Y. S. Han and P.-N. Chen, “Sequential decoding of convolutional codes,” in The Wiley Encyclopedia of Telecommunications, J. Proakis, Ed. New York: Wiley, 2002.

[8] D. Harville, Matrix Algebra From a Statistician’s Perspective, 1st ed. New York: Springer, 2000.

[9] S. Lin and D. J. Costello, Jr., Error Control Coding, 2nd ed. Engle-wood Cliffs, NJ: Prentice-Hall, 2004.

[10] J. C. L. Ng, K. B. Letaief, and R. D. Murch, “Complex optimal se-quences with constant magnitude for fast channel estimation initializa-tion,” IEEE Trans. Commun., vol. 46, no. 3, pp. 305–308, Mar. 1998. [11] S. I. Park, S. R. Park, I. Song, and N. Suehiro, “Multiple-access

interfer-ence reduction for QS-CDMA systems with a novel class of polyphase sequence,” IEEE Trans. Inf. Theory, vol. 46, no. 4, pp. 1448–1458, Jul. 2000.

[12] J. Pearl, Heuristics: Intelligent Search Strategies for Computer

Problem Solving. Reading, MA: Addison-Wesley, 1984.

[13] N. Seshadri, “Joint data and channel estimation using blind trellis search techniques,” IEEE Trans. Commun., vol. 42, no. 2/3/4, pp. 1000–1011, Feb./Mar./Apr. 1994.

[14] M. Skoglund, J. Giese, and S. Parkvall, “Code design for combined channel estimation and error protection,” IEEE Trans. Inf. Theory, vol. 48, no. 5, pp. 1162–1171, May 2002.

[15] C. Tellambura, Y. J. Guo, and S. K. Barton, “Channel estimation using aperiodic binary sequences,” IEEE Commun. Lett., vol. 2, no. 5, pp. 140–142, May 1998.

[16] C. Tepedelenlio˘glu, A. Abdi, and G. B. Giannakis, “The RiceanK factor: Estimation and performance analysis,” IEEE Trans. Wireless

Commun., vol. 2, no. 4, pp. 799–810, Jul. 2003.

[17] S.-A. Yang and J. Wu, “Optimal binary training sequence design for multiple-antenna systems over dispersive fading channels,” IEEE

Trans. Veh. Technol., vol. 51, no. 5, pp. 1271–1276, Sep. 2002.

[18] Draft Standard for Local and Metropolitan Area Networks, Part 16:

Air Interface for Broadcast Wireless Access Systems, P802.16Rev2/D6,

(13)

Chia-Lung Wu (S’08) received the B.Eng. degree in information and computer engineering from the Chung-Yuan Christian University, Chung-Li, Taiwan, in 1994, the M.S. degree in communications engineering from the National Chiao-Tung University, Hsin Chu, Taiwan, in 2000.

He is currently working toward the Ph.D. degree under the advisory of Prof. P.-N. Chen. He held visiting positions with Queen’s University, Kingston, Canada, in 2005, and the Royal Institute of Technology (KTH), Stockholm, Sweden, in 2008. His research interests lie in noncoherent communications and channel coding.

Po-Ning Chen (S’93–M’95–SM’01) was born in Taipei, R.O.C., in 1963. He received the B.S. and M.S. degrees in electrical engineering from the National Tsing-Hua University, Taiwan, in 1985 and 1987, respectively, and the Ph.D. degree in electrical engineering from University of Maryland, College Park, in 1994.

From 1985 to 1987, he was with Image Processing Laboratory, National Tsing-Hua University, where he worked on the recognition of Chinese char-acters. During 1989, he was with Star Tech. Inc., where he focused on the development of fingerprint recognition systems. After he received the Ph.D. degree in 1994, he jointed Wan Ta Technology Inc. as a Vice General Manager, conducting several projects on Point-of-Sale systems. In 1995, he became a member of Research Staff with the Advanced Technology Center, Computer and Communication Laboratory, Industrial Technology Research Institute, Taiwan, where he led a project on Java-based Network Managements. Since 1996, he has been an Associate Professor with the Department of Communica-tions Engineering, National Chiao-Tung University, Taiwan, and was promoted to a full Professor in 2001. His research interests generally lie in information and coding theory, large deviation theory, distributed detection, and sensor networks.

Dr. Chen received the annual Research Award from the National Science Council, Taiwan, R.O.C., five years in a row since 1996. He then received the 2000 Young Scholar Paper Award from Academia Sinica, Taiwan. His Experi-mental Handouts for the course of Communication Networks Laboratory have been awarded as the Annual Best Teaching Materials for Communications Ed-ucation by the Ministry of EdEd-ucation, Taiwan, R.O.C., in 1998. He was elected Chair of the IEEE Communications Society Taipei Chapter, in 2006 and 2007, during which the IEEE ComSoc Taipei Chapter won the 2007 IEEE ComSoc Chapter Achievement Awards (CAA) and 2007 IEEE ComSoc Chapter of the

Year (CoY). He has served as Chairman of the Department of Communications Engineering, National Chiao-Tung University, during 2007–2009. He has been selected as the Outstanding Tutor Teacher of the National Chiao-Tung Univer-sity in 2002. He was also the recipient of the Distinguished Teaching Award from the College of Electrical and Computer Engineering, National Chiao-Tung University, in 2003.

Yunghsiang S. Han (S’90–M’93–SM’08) was born in Taipei, Taiwan, on April 24, 1962. He received the B.Sc. and M.Sc. degrees in electrical engineering from the National Tsing Hua University, Hsinchu, Taiwan, in 1984 and 1986, respectively, and the Ph.D. degree from the School of Computer and Information Science, Syracuse University, Syracuse, NY, in 1993.

He was a Lecturer with Ming-Hsin Engineering College, Hsinchu, from 1986 to 1988 . He was a Teaching Assistant from 1989 to 1992, and a Research As-sociate with the School of Computer and Information Science, Syracuse Uni-versity, from 1992 to 1993. He was an Associate Professor with the Depart-ment of Electronic Engineering, Hua Fan College of Humanities and Tech-nology, Taipei Hsien, Taiwan, from 1993 to 1997. He was with the Department of Computer Science and Information Engineering, National Chi Nan Univer-sity, Nantou, Taiwan, from 1997 to 2004, where he was promoted to Professor in 1998. He was a visiting scholar with the Department of Electrical Engineering, University of Hawaii at Manoa, from June to October 2001, and the SUPRIA Visiting Research Scholar with the Department of Electrical Engineering and Computer Science and CASE center at Syracuse University, from September 2004 to January 2004. He is now with the Graduate Institute of Communica-tion Engineering, NaCommunica-tional Taipei University, Taipei. His research interests are in error-control coding, wireless networks, and security.

Dr. Han was a winner of the 1994 Syracuse University Doctoral Prize.

Ming-Hsin Kuo was born in Tainan City, Taiwan, R.O.C., in 1984. He received the B.S. and M.S. degrees in communications engineering from the National Chiao-Tung University, Taiwan, in 2006 and 2008, respectively.

He is currently with Pegatron Corporation, Taiwan. His research interests are in channel coding, wireless communication, and networking.