• 沒有找到結果。

具高密度位元檢測碼之隨機列舉解碼法

N/A
N/A
Protected

Academic year: 2021

Share "具高密度位元檢測碼之隨機列舉解碼法"

Copied!
69
0
0

加載中.... (立即查看全文)

全文

(1)

國 立 交 通 大 學

電信工程學系

博 士 論 文

具高密度位元檢測碼之隨機列舉解碼法

Stochastic List Decoding of

High-Density Parity-Check Codes

研 究 生:李昌明

指導教授:蘇育德

(2)

具高密度位元檢測碼之隨機列舉解碼法

Stochastic List Decoding of High-Density

Parity-Check Codes

研究生:李昌明

Student:

Chang-Ming

Lee

指導教授:蘇育德 博士 Advisor:

Dr.

Yu

T.

Su

國立交通大學

電信工程學系

博士論文

A Dissertation

Submitted to Institute of Communication Engineering

College of Electrical and Computer Engineering

National Chiao Tung University

in Partial Fulfillment of the Requirements

for the Degree of Doctor of Philosophy

in

Communication Engineering

Hsinchu, Taiwan

(3)

具高密度位元檢測碼之隨機列舉解碼法

研究生:李昌明 指導教授:蘇育德 博士

國立交通大學電信工程研究所

中文摘要

在本論文中,我們研究了數種關於具有高密度位元檢測矩陣之線

性方塊碼的隨機解碼法。我們的方法可被視為一具有可移動中心的隨

機球體解碼法,它會根據一球體對稱的機率分佈來選取在中心向量附

近的候選碼。此機率分佈中心向量的更新是根據一被稱為交錯熵

(Cross-Entropy)方法的蒙地卡羅法來實現。在每一次的交錯熵方

法遞迴過程中,一別具含意的隨機樣本集合被產生並轉化為合法碼。

根據這些隨機產生的合法碼與接收向量間的歐幾理得距離,我們選擇

E

個較佳的候選碼組成一菁英集合並用來修正機率分佈進而影響往後

遞迴中產生的隨機樣本。為了確保新產生的隨機樣本會越來越集中在

正確的傳送碼附近,不僅中心向量將會移動到傳送碼,其蘊含的機率

分佈最終亦會退化為只在傳送碼有值的奇異函數。此外,每次遞迴被

更新的機率分佈參數應該要促使新的機率分佈與最佳分佈間的庫柏

克萊不勒(Kullback-Leibler)距離越來越接近。

我們在本論文裡提出了三類隨機解碼法。前兩類是特別針對

n

.

k

)里得所羅門碼所提出的設計。在第一種解碼法中,被產生的

隨機樣本代表了一隨機錯誤指標向量集合,其中每一個向量都指出了

接收字碼中

n-k

個應該被擦拭的位置。我們將接收字元中被指定的

相對位置擦拭後即可利用只具擦拭(Erasures-Only)解碼器還原成

候選碼。在第二種方法中,

n

維的實數隨機向量被產生並代表著接收

字碼的可靠度向量,而其中

n-k

個最不可靠的座標會假設為應該要被

(4)

擦拭。針對每個隨機樣本,我們將其

k

個最可靠的座標做硬式決策

(hard-decision)並利用只具擦拭解碼器將其還原為合法碼。第三

種演算法利用一連續位元翻轉演算法來將隨機樣本向量轉換為合法

碼 。 值 得 一 提 的 是 前 兩 種 演 算 法 只 對 可 最 大 距 離 分 離

(Maximum-Distance Separable)碼有用而第三種演算法則沒有這個

限制。我們的演算法相對於信賴傳遞(Belief Propagation)演算法

與部分現有的代數演算法提供了性能與複雜度上的改善,尤其是具有

高碼率、高密度位元檢測矩陣的方塊碼。

(5)

Stochastic List Decoding of High-Density

Parity-Check Codes

Student: Chang-Ming Lee Advisor: Yu T. Su Department of Communications Engineering

National Chiao Tung University

Abstract

In this thesis, we present several novel stochastic decoding algorithms for linear high-density parity-check (HDPC) codes. Our approach can be regarded as a randomized sphere decoding with moving center that selects candidate codewords around a center vector according to a sphere-symmetric probability distribution. The center (median) vector of the distribution is updated according to a Monte Carlo based approach called the Cross-Entropy (CE) method. The CE method produces, in every iteration, a set of random samples which can be transformed into valid codewords. Based on the Eu-clidean distances between the received word and the random codewords, we select the best E candidates to form the elite set which is then used to modify the probability distribution that govern the generations of the random samples in the ensuing iteration. To ensure that the newly generated samples are concentrated more and more on a small neighborhood of the correct codeword and either the median vector will move to or the underlying distribution will eventually degenerate to a singularity at the transmit-ted codeword, the parameters of the updatransmit-ted distribution should be such that the new distribution is closest to the optimal distribution in the sense of the Kullback-Leibler distance (i.e CE).

(6)

We propose three classes of stochastic decoding algorithms in this thesis. The first two are specifically designed for decoding (n, k) Reed-Solomon (RS) codes. For the first decoder, the random samples represent a set of random error locator vectors, each in-dicates n− k possible erasure positions within the received word. We associate each error locator vector with a candidate codeword by erasures-only (EO) decoding the re-ceived word, assuming that erasure locations are those indicated by the error locator vector. The n-dimensional real random vectors in the second algorithm represent reli-ability vectors whose least reliable n− k coordinates are assumed to be erasures. For each sample, we make component-wise hard-decisions on the most reliable k coordinates and EO-decoding the resulting binary vector. The third algorithm uses a sequential bit flipping procedure to convert each random sample into a legitimate codewords. The first two algorithms are valid for MDS codes only while the third algorithm can be used for decoding any linear block code. Our algorithms offer both complexity and performance advantage over BP and some existing algebraic decoding algorithms, especially for high rate linear HDPC codes of short or medium lengths.

(7)

誌 謝

在塵世打滾三十年許,雖無大過亦無寸功,所幸可留此

拙著而不至全無痕跡。本論文得以付梓端賴指導教授育德師

多年悉心教誨。師之典範不囿於專業領域,亦可見於待人處

事之應對進退,令昌明獲益匪淺。

感謝研究室的伙伴,社群互動使我的博士生涯備添光

彩。感謝吾妻芳先與雙親無條件的支持,讓我得以完成學業。

僅以此論文獻給不辭辛勞拉拔我於襁褓之中,卻於近日

離苦得樂的外婆,祝您一路順風。

註:感謝易利信獎助學金於就學期間之慷慨贊助,使我安心

於學業減免金錢之憂。

(8)

Contents

Chinese Abstract i

English Abstract iii

Acknowledgements v

Contents vi

List of Tables ix

List of Figures x

1 Introduction 1

2 The Cross-Entropy Method 9

2.1 Introduction . . . 9

2.2 The CE Method for Rare-Event Simulation . . . 10

2.3 The CE-Method for Optimization Problem . . . 15

(9)

3 Stochastic Erasure-Only List Decoding of RS codes 19

3.1 Preliminary . . . 19

3.2 Stochastic List Decoding Algorithm . . . 21

3.2.1 Algebraic Erasures-Only (EO) Decoding . . . 21

3.2.2 A Stochastic List Decoding Idea . . . 23

3.2.3 Convergence and Complexity . . . 24

3.3 List Decoding via Erasure Location Estimation . . . 26

3.3.1 Importance Density and Sample Format . . . 26

3.3.2 Update Parameters . . . 27

3.4 List Decoding via Virtual Received Words . . . 28

3.4.1 Importance Density and Sample Format . . . 28

3.4.2 Update Parameters . . . 29

3.5 Simulation Results and Discussion . . . 30

4 Stochastic List Decoding of Linear Block Codes 33 4.1 Preliminary . . . 33

4.2 Sequential Bit-Flipping Algorithm . . . 36

4.3 Predicament of Decoding via SBF algorithm . . . 40

4.4 SBF Algorithm with Cyclic Shifts . . . 41

(10)

4.5.1 Importance Density and Sample Format . . . 42

4.5.2 Update Parameters . . . 42

4.5.3 Stochastic Sequential Bit Flipping Algorithm . . . 43

4.6 Simulation Results and Discussions . . . 44 5 Conclusions and Future Works 49 A The Proof of Lemma 4.1 51 B The Proof of Theorem 4.1 52

(11)

List of Tables

(12)

List of Figures

1.1 A correctly decoding example of a bounded distance decoder. . . 2

1.2 An example of erroneous decoding for a bounded distance decoder. . . . 3

1.3 Decoding failure by a bounded distance decoder. . . 3

1.4 Decoding beyond FEC bound by enlarging the decoding sphere. . . 4

1.5 Belief propagation - successful decoding. . . 5

1.6 Belief propagation - trapped in a pseudo codeword. . . 6

1.7 A set of random samples are generated and the random samples in the small dash circle are better directions we want. . . 7

1.8 After updating the parameter of the random mechanism, the new set of generated random samples points the correct way more often. . . 7

3.1 Idea of the algebraic erasures-only decoding. . . 22

3.2 Flow chart of a stochastic decoder for RS codes. . . 25

3.3 Virtual received words are generated around the received LLR vector ¯Γ by hard-limiting the sample vectors generated by an importance probability density whose parameter values evolved according to the CE principle. . 29

3.4 Codeword error probability performance of the (15,11) Reed-Solomon code; 10 iterations. . . 31

3.5 Codeword error probability performance of the (31,25) Reed-Solomon code; 10 iterations. . . 32

(13)

4.2 Error rate performance of the (15,11) Hamming Code; Ns = 10, Es= 1 . 45

4.3 Error rate performance of the (7,5) RS Code; Ns= 10, Es = 1 . . . 46

4.4 Error rate performance of the (22,16) single error correction Code; Ns=

10, Es = 1 . . . 46

4.5 Error rate performance of the (39,32) single error correction Code; Ns=

10, Es = 1 . . . 47

4.6 Error rate performance of the (72,64) single error correction Code; Ns=

10, Es = 1 . . . 47

4.7 Error rate performance of the (31,26) BCH Code. . . 48 4.8 Error rate performance of the (15,11) RS Code. . . 48

(14)

Chapter 1

Introduction

Linear block codes are popular forward error-correcting (FEC) codes due to their simple structures and satisfactory FEC performance. For instance, Reed-Solomon (RS) codes [1] are used in a wide variety of commercial applications, most prominently in CDs, DVDs and Blue-ray discs, in data transmission technologies such as DSL and WiMAX, in broadcast systems such as DVB and ATSC, and in computer applications such as RAID 6 systems. Low density parity-check (LDPC) codes [2] form another class of linear block codes which offer FEC capability close to the theoretical maximum–the Shannon limit [3]. In recent years, LDPC codes have been adopted by several digital broadcast and communication standards such as the DVB-S2 [4], the IEEE 802.3an (10GBASE-T) [5], the IEEE 802.16e (WiMAX) [6], and the IEEE 802.11n (WiFi) [7]. Although many decoding algorithms for block codes are available, more efficient decoding algorithms which can provide performance enhancement and complexity reduction are still of high demand.

Most hard-decision decoding algorithms are bounded-distance decoders (BDD). They select the codeword c, if exits, whose Hamming distance (HD) to the hard-limiting received word z, say d(z, c), is less than or equal to b(dmin− 1)/2c = tmin, where dmin is

the minimum distance of the code C. As shown in Fig. 1.1, if z is within the decoding sphere centered at the transmitted codeword cT and then the BDD can correctly output

(15)

cT c1

c2

dm in

z

Figure 1.1: A correctly decoding example of a bounded distance decoder.

[10] and the Euclidean algorithm [11] for RS codes all belong to the class of BDDs. When z falls into another decoding sphere, e.g., a sphere centered at other legitimate codeword c1 as shown in Fig. 1.2, a BDD will make an incorrect decision such that a decoding

error occurs. A decoding failure is declared if z does not belong to any decoding sphere of radius tmin.

In general, a BDD can only correct up to b(dmin − 1)/2c errors while a maximum

likelihood soft-decision based decoding algorithm can easily correct beyond tmin at the

expense of much higher complexity. There are two general approaches to improve the performance without incurring too much complexity increase. The first one is trying to enlarge the decoding sphere (see Fig. 1.4) in order to correct errors beyond tmin. For

RS codes, the errors-and-erasures decoding [9], Forney’s generalized minimum distance (GMD) decoding [12], the algebraic list decoding algorithm invented by Guruswami and Sudan (GS) [13] and the algebraic soft decision decoding (SDD) algorithm proposed by Koetter and Vardy (KV) [14] belong to this category. Note that the latter three

(16)

algo-cT c1

c2

dm in

z

Figure 1.2: An example of erroneous decoding for a bounded distance decoder.

cT c1

c2

dm in

z

(17)

cT c1

c2 dm in

z

Figure 1.4: Decoding beyond FEC bound by enlarging the decoding sphere. rithms are also members of the so-called list decoding algorithms because the enlarged decoding sphere may include more than one codewords.

Another idea for performance enhancement is to sequentially modify and move z from its original position so that the new location becomes closer and closer to cT.

Decoding methods based on this idea include the Chase II algorithm [15], and the combined Chase II-GMD algorithm [16]. The belief propagation (BP) based algorithms such as the sum product algorithms (SPA) or its less-complex approximation, the min-sum algorithms (MSA) [17] and their variations are also members of this category. A successful decoding based on BP algorithm will gradually update the estimated soft output and move the modified received vector toward the true transmitted codeword cT; see Fig. 1.5. Unfortunately, the BP process may be trapped in some local minimum

and the modified received vector coincides with a pseudo codeword cp as is shown in

(18)

cT c1 c2 dmin z z z z

Figure 1.5: Belief propagation - successful decoding.

algorithms such as the annealed BP algorithm [18]. Another possible solution combines the BP algorithm with the BDD such as the algorithms proposed in [8] and [19]. If the pseudo codeword cp belongs to the decoding sphere of cT, successful decoding is

achieved although the BP algorithm makes z coincides with cp.

In this thesis, we investigate a novel idea of iterative decoding which is a randomized sphere decoding with moving center. If statistical information about possible locations of the transmitted codeword cT around the received word z is given, the order of search

should follow the most possible direction. However, we don’t have such information usually and hence we search follow a probability distribution which is learned by random sampling. Each sample is transformed into a valid codeword and we choose samples whose corresponding code words having smaller Euclidean distance (ED) to z to modify the distribution and update (move) z. As the iteration goes by, newly generated samples are concentrated more and more on a small neighborhood of the correct codeword. The modified distribution becomes closer in the Cross-Entropy (CE) sense to the optimal

(19)

cT c1 c2 dmin z z z cP

Figure 1.6: Belief propagation - trapped in a pseudo codeword.

(Dirac) distribution centered at the true transmitted codeword. The center thus move closer to cT accordingly. This concept is implemented by the CE method [22] which has

the following two phases:

1. Explore possible directions pointing the shortest way to the transmitted codeword cT via a set of random samples generated from a specific random mechanism.

2. Choose better directions to update the parameters of the random mechanism in order to find better direction in next iteration.

Fig. 1.7 and Fig. 1.8 illustrate the basic principle of the above idea.

The rest of this thesis is organized as follows. Chapter 2 introduces the CE method which is an elegant practical principle for efficiently simulating rare events and can be converted into an optimization solver. A stochastic erasure-only list decoding (SEOLD) algorithm uses the extended CE method for optimization problem by considering an optimal event as a rare event is illustrated in Chapter 3. In Chapter 4, we investigate

(20)

cT c1 c2 dm in z z z z z z

Figure 1.7: A set of random samples are generated and the random samples in the small dash circle are better directions we want.

cT c1 c2 dm in z z z z z z

Figure 1.8: After updating the parameter of the random mechanism, the new set of generated random samples points the correct way more often.

(21)

another stochastic list decoding algorithm based on a novel sequential bit flipping pro-cedure. Finally, we summarize our major contributions and suggest some future works in Chapter 5.

(22)

Chapter 2

The Cross-Entropy Method

The cross-entropy (CE) method which was originally developed as an adaptive algorithm for rare-event simulation based on variance minimization [20]. It was soon modified to a randomized optimization technique [21], where the original variance minimization program was replaced by an associated CE minimization problem. We summarize the basic concept of this simple, efficient, and general method in this chapter and more detailed investigations can be found in [22].

2.1

Introduction

In the field of rare-event simulation, the CE method is used in conjunction with im-portance sampling (IS), a well-known variance reduction technique in which the system is simulated under a different set of parameters, called the reference parameters (or different probability distribution) so as to make the occurrence of the rare event more likely. A major drawback of the conventional IS technique is that the optimal reference parameters to be used in IS are usually very difficult to obtain. Traditional techniques for estimating the optimal reference parameters [23] typically involve time consuming variance minimization programs. The advantage of the CE method is that it provides a simple and fast adaptive procedure for estimating the optimal reference parameters in the IS.

(23)

can be readily applied by first translating the underlying optimization problem into an associated estimation problem, named associated stochastic problem (ASP), which typically involves rare-event estimation. Estimating the rare-event probability and the associated optimal reference parameter for the ASP via the CE method translates effec-tively back into solving the original optimization problem.

In general, the CE algorithm is an iterative procedure that consists of the following two phases in each iteration.

• Generate samples from the specified importance density given by the parameters from the previous iteration.

• Update the parameters for next iteration according to the order of the score values associated with the drawn samples and the minimizing CE criterion.

The significance of the CE concept is that it defines a precise mathematical framework for deriving fast and good updating/learning rules.

2.2

The CE Method for Rare-Event Simulation

In this section, the basic idea behind the CE algorithm for rare event simulation is illustrated. Let x be a random vector taking values in some spaceX . Let {f(·; v)} be a family of probability density functions (pdfs) on X , with respect to some base measure µ where v is a real-valued parameter (vector). Therefore,

E[H(x)] =

Z

X H(x)f (x; v)µ(dx), (2.1)

for any function H. For simplicity, for the rest of this section we take µ(dx) = dx because of µ is either a continuous measure or the Lebesgue measure in most cases.

Let S be some real function onX . Suppose we are interested in the probability that S(x) is greater than or equal to some real number γ under f (x; u). This probability can be expressed as

(24)

If this probability is very small, say smaller than 10−5, we call {S(x) ≥ γ} a rare event. A straightforward way to estimate ` is to use crude Monte-Carlo simulation: Draw a random sample x1,· · · , xN from f (x; v); then

ˆ ` = 1 N N X i=1 I{S(xi)≥γ} (2.3)

is an unbiased estimator of `. However this poses serious problems when {S(x) ≥ γ} is a rare event since a large simulation effort is required to estimate ` accurately, that is, with a small relative error or a narrow confidence interval.

An alternative is based on importance sampling: take a random sample x1,· · · , xN

from an importance sampling density g on X , and estimate ` using the likelihood ratio (LR) estimator ˆ ` = 1 N N X i=1 I{S(xi)≥γ} f (xi; u) g(xi) . (2.4)

The best way to estimate ` is to use the change of measure with density g∗(x) = I{S(x)≥γ}f (x; u)

` . (2.5)

By using this change of measure we have in (2.4) I{S(xi)≥γ}

f (xi; u)

g∗(x i)

= `, (2.6) for all i. Since ` is a constant, the estimator (2.4) has zero variance, and we need to produce only N = 1 sample.

The obvious difficulty is that g∗ depends on the unknown parameter `. Moreover, it is often convenient to choose a g in the family of densities {f(·; v)}. The idea now is to choose the reference parameter v such that the distance between the density g∗ above and f (x; v) is minimal. A particularly convenient measure of distance between two densities g and h is the Kullback-Leibler (KL) distance defined as

D(g, h) = Eg " lng(x) h(x) # = Z g(x) ln g(x)dx− Z g(x) ln h(x)dx (2.7)

(25)

which is also termed the cross-entropy (CE) between g and h.

Minimizing the Kullback-Leibler distance between g∗ in (2.5) and f (x; v) is

equiva-lent to solve the maximization problem maxv

Z

g∗(x) ln f (x; v)dx (2.8) Substituting g∗ from (2.5) into (2.8) we obtain the maximization program

maxv

Z I

{S(x)≥γ}f (x; u)

` ln f (x; v)dx (2.9) which is equivalent to the program

maxv D(v) = maxv EuhI{S(x)≥γ}ln f (x; v)i (2.10) where D is implicitly defined above. Again using importance sampling, with a change of measure f (x; w) we can rewrite (2.10) as

maxv D(v) = maxv EwhI{S(x)≥γ}W (x; u, w) ln f (x; v)i, (2.11) for any reference parameter w, where

W (x; u, w) = f (x; u)

f (x; w) (2.12) is the likelihood ratio between f (x; u) and f (x; w). The optimal solution of (2.11) can be written as

v∗ = arg maxv EwhI{S(x)≥γ}W (x; u, w) ln f (x; v)i. (2.13) We may estimate v∗ by solving the following stochastic program

maxv D(v) = maxˆ v 1 N N X i=1 h I{S(xi)≥γ}W (xi; u, w) ln f (xi; v) i , (2.14) where x1,· · · , xN is a random sample from f (x; w). In typical applications the function

ˆ

D in (2.14) is convex and differentiable with respect to v, in which case the solution of (2.14) may be readily obtained by solving the following system of equations:

1 N N Xh I{S(xi)≥γ}W (xi; u, w)∇ ln f(xi; v) i = 0. (2.15)

(26)

The advantage of this approach is that the solution of (2.15) can often be calculated analytically. In particular, this happens if the distributions of the random variables belong to a natural exponential family (NEF).

We have to note that the CE program (2.14) or (2.15) are useful only if the probability of the target event {S(x) ≥ γ} is not too small under w, say greater than 10−5. For

rare-event probabilities, due to the rareness of the events {S(xi) ≥ γ}, most of the

indicator random variables I{S(xi)≥γ}, i = 1,· · · , N, will be zero, for moderate N. It

makes the program (2.14) and (2.15) difficult to carry out. A multilevel algorithm can be used to overcome this difficulty. The basic idea is to construct a sequence of reference parameters {vt, t≥ 0} and a sequence of levels {γt, t≥ 1}, and iterate in both vt and

γt.

We initialize by choosing a not very small %, say % = 10−2 and by defining v 0 = u.

Next, we let γ1 (γ1 < γ) be such that, under the original density f (x; u), the probability

`1 = EuI{S(xi)≥γ1} is at least %. We then let v1 be the optimal CE reference parameter

for estimating `1, and repeat the last two steps iteratively with the goal of estimating the

pare{`, v}. In other words, each iteration of the algorithm consists of two main phases.

In the first phase γt is updated, in the second vt is updated. Specifically, starting with

v0 = u we obtain the subsequent γt and vt as follows:

1. Adaptive updating of γt For a fixed vt−1, let γt be a (1− %)-quantile of S(x)

under vt−1. That is, γt satisfies

Pvt−1(S(x)≥ γt)≥ %, (2.16)

Pvt−1(S(x)≤ γt)≥ 1 − %, (2.17)

where x ∼ f(x; vt−1).

A simple estimator ˆγtof γtcan be obtained by drawing a random sample x1,· · · , xN

from f (x; vt−1), calculating the performances S(xi) for all i, ordering them from

(27)

%)-quantile as

ˆ

γt= S(d(1−%)N e) (2.18)

Note that S(j) is called the j-th order-statistic of the sequence S(x1),· · · , S(xN).

Note also that ˆγt is chosen such that the event{S(x) ≥ ˆγt} is not too rare (it has

a probability of around %), and therefore updating the reference parameter via a procedure such as (2.18) is not void of meaning.

2. Adaptive updating of vt For fixed γt and vt−1, derive vt from the solution of

the following CE program maxvD(v) = Evt−1

h

I{S(x)≥γt}W (x; u, vt−1) ln f (x; v)

i

. (2.19) The stochastic counterpart of the above equation is as follows: for fixed ˆγt and

ˆ

vt−1, derive ˆv from the solution of following program

maxv D(v) = maxˆ v 1 N N X i=1 h I{S(xi)≥ˆγt}W (xi; u, ˆvt−1) ln f (xi; v) i . (2.20) Thus, at the first iteration, starting with ˆv0 = u, to get a good estimate for ˆv1,

the target event is artificially made less rare by (temporarily) using a level ˆγ1 which is

chosen smaller than γ. The value of ˆv1 obtained in this way will (hopefully) make the

event {s(x) ≥ γ} less rare in the next iteration, so in the next iteration a value ˆγ2 can

be used which is closer to γ itself. The algorithm terminates when at some iteration t a level is reached which is at least γ and thus the original value of γ can be used without getting too few samples.

The above rationale results in the following algorithm: 1. Define ˆv0 = u. Set t = 1.

2. Generate a sample z1,· · · , xN from the density f (x; vt−1) and compute the sample

(1− %)-quantile ˆγtof the performances according to (2.18), provided ˆγtis less than

(28)

3. Use the same sample x1,· · · , xN to solve the stochastic program (2.20). Denote

the solution by ˆvt.

4. If ˆγt< γ, set t = t + 1 and reiterate from Step 2. Else proceed with Step 5.

5. Estimate the rare-event probability ` using the LR estimate ˆ ` = 1 N N X i=1 I{S(xi)≥γ}W (xi; u, ˆvT) (2.21)

where T denotes the final number of iterations.

2.3

The CE-Method for Optimization Problem

Consider the following general maximization problem: Let X be a finite set of states, and let S be a real-valued performance function on X . We wish to find the maximum of S over X and the corresponding state at which this maximum is attained. Let us denote the maximum by γ∗. Thus,

S(x∗) = γ∗ = max

x∈X S(x). (2.22)

The starting point in the methodology of the CE method is to associate with the optimization problem (2.22) a meaningful estimation problem. To this end we define a collection of indicator functions nI{S(x)≥γ}

o

on X for various levels γ ∈ R. Next, let {f(·; v), v ∈ V} be a family of (discrete) probability densities on X , parameterized by a real-valued parameter (vector) v. For a certain u ∈ V we associate with (2.22) the problem of estimating the number

`(γ) = Pu(S(x)≥ γ) =

X

x

I{S(x)≥γ}f (x; u) = EuI{S(x)≥γ}, (2.23)

where Pu is the probability measure under which the random state x has probability

density function (pdf) f (x; u), and Eu denotes the corresponding expectation operator.

We will call the estimation problem (2.23) the associated stochastic problem (ASP). To indicate how (2.23) is associated with (2.22), suppose for example that γ is equal to γ∗

(29)

and that f (x; u) is the uniform density on X . Note that, typically, `(γ∗) = f (x; u) =

1/|X | where |X | denotes the number of elements in X is a very small number. Thus, for γ = γ∗ a natural way to estimate `(γ) would be to use the LR estimator (2.21) with

reference parameter v∗ given by

v∗ = arg maxv EuhI{S(x)≥γ}ln f (x; v)i. (2.24) This parameter could be estimated by

ˆ v∗ = arg maxv 1 N h I{S(xi)≥γ}ln f (xi; v) i (2.25) where the xi are generated from pdf f (x; u). It is plausible that, if γ is close to γ∗,

that f (x; v∗) assigns most of its probability mass close to x∗, and thus can be used to generate an approximate solution to (2.22). However, it is important to note that the estimator (2.25) is only of practical use when I{S(x)≥γ} = 1 for enough samples. This

means for example that when γ is close to γ∗, u needs to be such that P

u(S(x) ≥ γ)

is not too small. Thus, the choice of u and γ in (2.22) are closely related. On the one hand we would like to choose γ as close as possible to γ∗, and find (an estimate of) v

via the procedure above, which assigns almost all mass to state(s) close to the optimal state. On the other hand, we would like to keep γ relative large in order to obtain an accurate estimator for v∗.

The situation is very similar to the rare-event simulation case. The idea is to adopt a two-phase multilevel approach in which we simultaneously construct a sequence of levels ˆγ1, ˆγ2,· · · , ˆγT and parameter (vectors) ˆv0, ˆv1,· · · , ˆvT such that ˆγT is close to the

optimal γ∗ and ˆv

T is such that the corresponding density assigns high probability mass

to the collection of states that give a high performance. This strategy is embodied in the following procedure: 1. Define ˆv0 = u. Set t = 1.

2. Generate a sample x1,· · · , xN from the density f (x; vt−1) and compute the sample

(30)

3. Use the same sample x1,· · · , xN and solve the stochastic program (2.20) with

W = 1. Denote the solution by ˆvt.

5. If for some t≥ d, say d = 5, ˆ

γt= ˆγt−1=· · · = ˆγt−d, (2.26)

then stop (let T denote the final iteration); otherwise set t = t + 1 and reiterate from Step 2.

Note that the initial vector ˆv0, the sample size N , the stopping parameter d, and the

number % have to be specified in advance.

The above procedure can, in principle, be applied to any discrete and continuous optimization problem. For each individual problem two essential ingredients need to be supplied:

1. We need to specify how the samples are generated. In other words, we need to specify the family of densities {f(·; v)}.

2. We need to calculate the updating rules for the parameters, based on cross-entropy minimization.

In general there are many ways to generate samples from X , and it is not always immediately clear which way of generating the sample will yield better results or easier updating formulas.

2.4

Updating Rules of Some Useful Densities

In this section we will derive the updating rules for two pdfs which are commonly used for the CE method. The first one is the Bernoulli distribution and the second is the Gaussian distribution.

(31)

Suppose the random vector xi = (xi1,· · · , xin)∼ Ber(p) where Ber(p) is Bernoulli

distribution with parameter p = (p1,· · · , pn). Consequently, the pdf is

f (xi; p) = n

Y

j=1

pxiji (1− pi)1−xij, (2.27)

and since each xij can only be 0 or 1,

∂ ∂pj ln f (xi; p) = xij pj − 1− xij 1− pj = 1 (1− pj)pj (xij − pj). (2.28)

Now we can find the maximum in (2.20) (with W = 1) by setting the first derivatives with respect to pj equal to zero, for j = 1,· · · , n:

∂ ∂pj N X i=1 I{S(xi)≥γ}ln f (xi; p) = 1 (1− pj)pj N X i=1 I{S(xi)≥γ}(xij − pj) = 0. (2.29)

Thus, we get the updating rule pj = PN i=1I{S(xi)≥γ}xij PN i=1I{S(xi)≥γ} . (2.30)

Next, consider the Gaussian density f (x; µ, σ2) = √ 1 2πσ2e −1 2 (x−µ)2 σ2 , x∈ R. (2.31)

The optimal solution of (2.20) (with W = 1) follows from minimization of 1 σ2 N X i=1 Ii(xi− µ)2+ ln(σ2) N X i=1 Ii, (2.32)

where Ii = I{S(xi)≥γ}. It is easily seen that this minimum is obtained at (ˆµ, ˆσ2) given by

ˆ µ = PN i=1Iixi PN i=1Ii (2.33) and ˆ σ2 = PN i=1Ii(xi− ˆµ)2 PN i=1Ii (2.34)

(32)

Chapter 3

Stochastic Erasure-Only List

Decoding of RS codes

In this chapter, we apply the Cross-Entropy (CE) method [22] to develop a Monte Carlo based iterative SDD algorithm which renders an improved algebraic SDD decoding per-formance. The CE method is an elegant practical principle for simulating rare events which approximates the probability of the rare event by means of a family of parameter-ized probabilistic models. Our stochastic erasure-only list decoding (SEOLD) algorithm uses the extended CE method for optimization problem by considering an optimal event as a rare event.

3.1

Preliminary

Let C be an (n, k) RS code over GF(2m) with minimum Hamming distance d min =

n− k + 1. Let c = (c0,· · · , cn−1) be a codeword in C. For binary transmission, every

code symbol must be expanded into binary with symbols from GF(2) ={0, 1}. Let α be primitive in GF(2m), then the ith symbol c

i can be uniquely represented by the binary

m-tuple c(b)i = (ci,0,· · · , ci,m−1) where ci = ci,0α0 +· · · + ci,m−1αm−1, ∀ci,j ∈ GF(2).

Therefore, the codeword c can be uniquely mapped into the binary expansion vector ¯

c = (c(b)0 , c(b)1 ,· · · , c(b)

n ) = (¯c0, ¯c1,· · · , ¯cnm−1).

(33)

codeword ¯c into the bipolar vector

Ψ(¯c) = ¯x = (¯x0,· · · , ¯xnm−1), ¯xj = Ψ(¯cj) = (−1)cj¯ (3.1)

and sends it over an additive white Gaussian noise (AWGN) channel with zero mean and power spectral density N0/2. The received sequence at the output of the matched

filter is ¯y = (¯y0,· · · , ¯ynm−1) where ¯yj = ¯xj + ¯wj and ¯wj’s are statistically independent

Gaussian random variables with zero mean and variance N0/2.

Let ¯z = (¯z0,· · · , ¯znm−1) be the hard decision binary vector of the received bit sequence

¯ y, i.e., ¯ zj = ( 0, ¯yj > 0 1, otherwise (3.2) and z = (z0,· · · , zn−1) be the corresponding symbol vector. Denoted by ¯Γ = (¯γ1,· · · , ¯γnm−1)

the reliability vector of ¯y in which ¯γj is the magnitude of the log-likelihood ratio (LLR)

associated with the corresponding hard-limited bit ¯zj

L (¯cj) = log

P ( ¯cj = 0| ¯y)

P ( ¯cj = 1| ¯y)

, (3.3)

and define the symbol reliability vector Γ = (γ0,· · · , γn−1) of z by

γi = min

j γ¯j, j ∈ {im, · · · , (i + 1)m − 1} (3.4)

Assume that the ith symbol ci of c is uniformly distributed over GF(2m) and the n

received symbols are independent and uniformly drawn from GF(2m). Then P (c

i = β|¯y),

the probability that ci = β was transmitted given the observation ¯y can be easily

evaluated [14]: P (ci = β|¯y) = P (ci = β|¯yi) = P P (¯yi|ci = β)P (ci = β) ω∈GF(2m) P (¯yi|ci = ω)P (ci = ω) = P P (¯yi|ci = β) GF(2m) P (¯yi|ci = ω) (3.5)

(34)

where

¯

yi = (¯yim, ¯yim+1,· · · , ¯yim+m−1),

P (¯yi|ci = β) = m−1

Y

j=0

P (¯yim+j|¯cim+j = βj) , β = β0α0+· · · + βm−1αm−1.

The q× n matrix R = [Rβi = P (ci = β|¯y)], q = 2m, will be referred to as the reliability

matrix of the received vector ¯y.

3.2

Stochastic List Decoding Algorithm

3.2.1

Algebraic Erasures-Only (EO) Decoding

It is well-known that RS codes are maximum-distance separable (MDS) which implies that any k coordinates (symbols) in an RS codeword can be used to determine the remaining n− k symbols. Hence it is sufficient to decide k correct (message) or n − k incorrect (error) coordinates of a codeword. Let ELbe the collection of all combinations

of n− k error coordinates, EL = ( s = (s0,· · · , sn−1) si ∈ {0, 1}, X i si = n− k ) (3.6) where si = 1 if the ith coordinate is in error. Then a straightforward decoding schedule

is given as below:

(a). For all s∈ EL, erase the corresponding n−k error coordinates of the received word

z and decode by the erasures-only (EO) decoder. The resulting codeword set is denoted by Cz.

(b). Choose the codeword from Cz with the best score, e.g., the one whose Euclidean

distance from the received word is the smallest, as the decoder output.

The basic idea of the above procedure is shown in Fig. 3.1. It can be easily confirmed that for any c∈ Cz, dH(c, z)≤ n − k, where dH(c, z) is the Hamming distance between

(35)

n -k

z

: vectors with Hamming distance n-k away from z

: codewords belong to Cz

: transmitted codeword : re-encode process

(36)

symbols is less than dmin. Furthermore, (b) is equivalent to the following minimization

problem

arg min

c d (Ψ(¯c), ¯y) subject to c∈ C

z (3.7)

where Ψ(·) is defined by (1) and d(¯a, ¯b) is the Euclidean distance (ED) between the nm-ary real vectors ¯a and ¯b.

3.2.2

A Stochastic List Decoding Idea

Each error locator vector (ELV) s ∈ EL represents a particular set of n− k possible

error coordinates and has a corresponding codeword cs that belongs to Cz. We denote

the latter relationship by s7→ cs. Although more than one ELV may be associated with

the same codeword, the complexity of searching for the optimal solution c∗ in the error location domain EL is still extremely high because the cardinality of ELis

n

k



and only a few (or one) elements in EL, depending on the number and locations of the received

errors, can be used to reconstruct c∗.

Suppose we model the selection of the ELV s from EL as a stochastic (vector-valued)

experiment governed by a family of parameterized distributions {f(s; u)} with u ∈ ν being a real-valued parameter vector. Usually f (s; u) is assumed to be uniformly distributed due to the lack of priori information whence the search in (3.7) is exhaustive unless some algebraic properties of the code are used. One way to solve (3.7) efficiently is to find a parameter v∗ such that f (s; v∗) = δ(s− s∗) where s7→ c. Then drawing

one sample from f (s; v∗) is sufficient to obtain the optimal solution c. To get around

the difficulty that c∗ is not known, one notices that the optimization problem (3.7) is

related to the estimation of the probability P (d(Ψ(¯cs), ¯y) ≤ η|s 7→ cs), which is a rare

event when η = η∗ = d(Ψ(¯c), ¯y). The connection comes from the fact that efficient

estimation of a rare event can be achieved by the method of importance sampling and in this case the optimal importance density is f (s; v∗). Without the knowledge of the threshold η∗, we start with a proper importance density f (s; ˆv) to generate samples of s

(37)

and compute an initial estimate ˆη for η. Ideally, we can use those drawn samples which satisfy d(Ψ(¯cs), ¯y)≤ ˆη to obtain new parameter value ˆv0 such that f (s; ˆv0) is closest to

f (s; v∗) in the Kullback-Leibler (KL) sense, i.e., the CE between f (s; ˆv0) and f (s; v)

is minimized. Since v∗ is unknown, we choose ˆv0 such that f (s; ˆv0) is closest to the

empirical distribution of s in those samples that are generated by f (s; ˆv) and satisfy d(Ψ(¯cs), ¯y) ≤ ˆη for this empirical distribution is likely to be a good approximation of

f (s; v∗). New samples of s are then produced by the updated importance density f (s; ˆv0) to compute new estimate ˆη0. This iterative procedure continues until |ˆη− ˆη0| is less than

a predetermined threshold.

The above method is known as the CE method [22] which is an iterative procedure that consists of the following two phases in each iteration.

• Generate samples from the specified importance density given by the parameters from the previous iteration.

• Update the parameters for next iteration according to the order of the score values associated with the drawn samples and the minimizing CE criterion.

Based on the above discussion, we propose a generic Monte Carlo based SDD algo-rithm as shown in Fig. 3.2 and in Table 3.1 with some detailed description given in Section 3.3 and 3.4.

3.2.3

Convergence and Complexity

Different convergence conditions and results have been discussed for the deterministic CE method and its extensions in [24] where it is also proved that convergence in distribution or η can be guaranteed but needs a proper tuning of the parameters of the algorithm such as the number of samples N , number of elites E, and smoothing factor ρ. Convergence to the global minimum is ensured only if a large sample size N is used. On the other hand, the computing complexity is related to N and is given by O(N (n− k)2). It is obvious

(38)

Sample

Generator

Erasure-Only

Decoder

Sample

Evaluator

Parameter

Updator

S

(t)

v

(0)

v

(t+1)

S

E(t)

D

(t)

Store &

Choose the

Best

d*

(t) *

ˆc

S

(t)

{f( ;v)}

Figure 3.2: Flow chart of a stochastic decoder for RS codes.

1. Define a family of probability densities {f(·; v), v ∈ ν} on the search space Rnm.

Initialize v(0). Set t = 1.

2. Generate a sample set S(t) whose N random vector samples are drawn from f (·; v(t)).

Regard the magnitudes as bit LLRs and convert them into symbol reliabilities.

Erase the n− k least reliable symbols and decode the received word by EO decoding. 3. Evaluate Euclidean distances between the decoded codewords and the received word.

Select the E vector samples with best metrics as the new elite set SE(t) ⊂ S(t)

and store the best decoded codeword d∗(t) in D(t).

4. Evaluate the new parameter v(t+1) by solving

v(t+1) = arg max v 1 |SE(t)| P s(t) ` ∈SE(t)ln f (s (t) ` ; v).

Update v(t+1) via v(t+1) = ρv(t+1)+ (1− ρ)v(t) where 0 < ρ < 1.

5. Terminate decoding if the stopping criterion is met. Choose the best codeword from the list nd∗(t); ∀ to, say ˆc, as the decoder output. Otherwise increase t by 1 and

return to step 2.

(39)

that the decoding performance can be improved by using a larger N . As we retain the best sample at the end of each iteration, the decoding performance is also improved by increasing the iteration number T . As an early-stopping at any iteration produces a decoded codeword, we say the algorithm converges if the sequence of decoded codewords converges. With a modest N , we found that the decoded codewords converge to the same codeword within 10 iterations in most cases. Our algorithm yields good performance although uniform convergence in distribution or η within a limited iterations is not guaranteed.

3.3

List Decoding via Erasure Location Estimation

In this section, we propose an novel algorithm to solve the discrete optimization problem described in (3.7) by utilizing the stochastic list decoding idea. This algorithm is named as the first kind stochastic erasure-only list decoding (SEOLD-I) algorithm which is used to efficiently estimate the most possible dmin− 1 locations of erasures about the received

word z.

3.3.1

Importance Density and Sample Format

Suppose the reliability matrix R is known at the receiver. Define the distrust function, fd: GF(2m)7→ (0, ∞), of the ith coordinate zi of z as

fd(zi) = P βRβ,i Rzi,i , β ∈ GF(2m )\ {zi} (3.8)

The larger the value of fd(zi) is, the higher the probability that zi should be erased at

the decoder.

Let s = (s0,· · · , sn−1) be a random vector where s0,· · · , sn−1 are independent

Bernoulli random variables with success probabilities p0,· · · , pn−1, i.e., P (si = 1) =

1− P (si = 0) = pi. We write s∼ Ber(p), where p = (p0,· · · , pn−1). In our case, si = 1

represents the ith symbol zi of z should be erased. On the other hand, zi will be reserved

(40)

The initial parameters p(0) =p(0) 0 ,· · · , p (0) n−1  are defined as p(0)i = ( 1− , fd(zi) > 1−  fd(zi), otherwise (3.9)

where  is an arbitrary real value between (0,1). At the tth iteration, let s(t)1 , s

(t)

2 ,· · · , s (t)

N be N trials drawn from Ber

 p(t) which satisfy n−1 X j=0 s(t)`,j = dmin− 1, ∀` ∈ {1, · · · , N} (3.10) where s(t)` = (s (t) `,0,· · · , s (t)

`,n−1). We then collect samples from these N trials to form a

sample set S(t) ={s(t) 1 ,· · · , s (t) N}.

3.3.2

Update Parameters

Let d(t)1 ,· · · , d (t)

N be the output codewords of the EO decoder. We compute the ED

between each candidate codeword and the received word y and sort the corresponding random vectors according to the descending order of their associated EDs. Define the elite set SE(t) at the tth iteration to be the E vectors with the smallest EDs to y, i.e.,

the corresponding codewords are more likely to have been transmitted. We always store the best one in SE(t) up to the current iteration for the final decision when the maximum

number of iteration is reached.

Suppose the parameters used to generate samples at the tth iteration are p(t) =



p(t)0 ,· · · , p (t) n−1



. The parameters for the next iteration are updated by considering the information provided by both p(t) and S

E(t). More precisely, the ith parameter p(t+1)i is

obtained by [22] p(t+1)i = (1− ρ)p (t) i + ρ· P `∈SE(t)s (t) `,i E (3.11)

(41)

3.4

List Decoding via Virtual Received Words

In Step 2 of Table 3.1 we try to find the most likely message/error coordinates such that the associated EO-decoded codeword is closest to the received vector. Note that the random samples are used to determine the erasure locations only, and the searching sphere of the algebraic list decoding described in Section 3.2.1 is always centered at the hard-limited received word z with radius equals to n−k. To increase our search range and improve decoding performance, we include some extra codewords which lie statistically in a small neighborhood of the received word in our expanded search, such that some of them may in fact be closer in ED to the true transmitted codeword c; see Fig. 3.3. More specifically, the expansion is accomplished by eliminating the requirement that the search be centered at z. Instead, we randomized the search center by EO-decoding the hard-decision versions of the drawn sample vectors which we refer to as virtual received words. If the importance density does converge to the desired density δ(s− s∗), such an

expanded search will eventually contract and converge to the true transmitted codeword.

3.4.1

Importance Density and Sample Format

Let ¯s = (¯s0,· · · , ¯snm−1) be a random vector where ¯s0,· · · , ¯snm−1 are independent

Gaus-sian random variables with means µ0,· · · , µnm−1 and variances σ20,· · · , σnm−12 . We write

¯s∼ N (~µ, ~σ), where ~µ = (µ0,· · · , µnm−1) and ~σ = (σ0,· · · , σnm−1) are initialized by

µ(0)j = ¯γj (3.12)

σj(0) =

q

|¯γj| (3.13)

At the tth iteration, N random samples ¯s(t)1 , ¯s(t)2 ,· · · ,¯s(t)N are drawn fromN ~µ(t), ~σ(t)

to form the sample set ¯S(t). Each sample vector represents the bit reliabilities of an

associated virtual received word. By using (3.4) to convert the bit reliabilities into symbol reliability, the n−k coordinates with smallest symbol reliabilities are erased; the

(42)

c

z

z

1

z

2

z

3

z

N -1

z

N

n-k

Figure 3.3: Virtual received words are generated around the received LLR vector ¯Γ by hard-limiting the sample vectors generated by an importance probability density whose parameter values evolved according to the CE principle.

remaining bit positions are hard-limited, mapped into symbol decisions and the resulting virtual received word is then decoded by an EO decoder.

3.4.2

Update Parameters

Let d(t)1 ,· · · , d (t)

N be the output codewords of the EO decoder. We compute the ED

between each candidate codeword and the received word y and sort the corresponding random vectors according to the descending order of their associated EDs. Define an elite set ¯S(t)E which includes E vectors with the smallest EDs to y, i.e., the corresponding

codewords are more likely to have been transmitted. We always store the best one in ¯S(t)E

up to the current iteration for the final decision when the maximum number of iteration is reached.

Then the two sets of parameters ~µ(t+1) and ~σ(t+1) are updated by [22]

µ(t+1)j = (1− λ)µ (t) j + λ· P ¯s(t) ` ∈¯S (t) E s¯ (t) `,j E (3.14)

(43)

and σj(t+1)= (1− η)σ (t) j + η· P ¯s(t) ` ∈¯S (t) E  ¯ s(t)`,j − µ(t+1)j 2 E (3.15)

where λ and η are real values between (0, 1) used to smooth the variation of these parameters. The algorithm described in this section is called the second kind stochastic erasures-only listing decoding (SEOLD-II) algorithm.

3.5

Simulation Results and Discussion

In this section, some simulated performance of two SEOLD algorithms (SEOLD-I and SEOLD-II) are presented and compared with that of other well known RS decoding algorithms. A standard binary input AWGN channel is assumed over which the BPSK modulated codewords are transmitted. We model the receive matched filter output as the sum of a±1−valued sequence and Gaussian sequence with zero-mean i.i.d. components. The average performance bounds on the ML error probability of RS codes over an AWGN channel developed in [25] are used as the performance lower limits.

Due to the complexity and the decoding delay considerations, the SEOLD algorithms will not terminate until convergence is assured. Instead, we limit our decoding procedure to T iterations in all simulations.

Fig. 3.4 shows the codeword error rate (CER) performance of the (15,11) RS code over an AWGN channel. HDD-BM refers to the performance of a hard decision bounded minimum distance decoder such as the BM algorithm. GMD and KV refer to the per-formance of the GMD algorithm proposed by Forney and the algebraic soft decision decoding algorithm proposed by Koetter and Vardy, respectively. Note that the KV algorithm concerned here is infinite interpolation costs, i.e., the complexity is also infi-nite. For both SEOLD-I and SEOLD-II, the size of the sample set N and the size of the elite set E at every iteration are set to be 20 and 6, respectively. After 10 iterations, SEOLD-I has about 0.5 dB and 0.3 dB coding gain over GMD and KV at a CER of 10−5,

(44)

1 2 3 4 5 6 7 8 10-6 1x10-5 1x10-4 10-3 10-2 10-1 100

Codeword Error Rate

E b/N0 (dB) HDD-BM GMD KV SEOLD-I ( N =20) SEOLD-II ( N =20) ML

Figure 3.4: Codeword error probability performance of the (15,11) Reed-Solomon code; 10 iterations.

respectively. On the other hand, SEOLD-II outperforms all the previous algorithms with a performance gain of about 1.2 dB and 1.0 dB over GMD and KV at a CER of 10−5.

In Fig. 3.5, SEOLD-I has a near KV performance when the N = 100 and E = 10 after 10 iterations. At the same condition, SEOLD-II still outperforms the other algorithms with reasonable complexity. The SEOLD-II has about 0.6 dB and 1.0 dB coding gain over KV where N is equal to 100 and 500, respectively. In conclusion, the proposed decoding algorithms are capable of offering good performance with modest complexity for short high rate RS codes. Its performance can be further improved by increasing the sample size N and/or the maximum iteration number T at the cost of increased decoding complexity.

(45)

1 2 3 4 5 6 7 8 10-6 1x10-5 1x10-4 10-3 10-2 10-1 100

Codeword Error Rate

E b/N0 (dB) HDD-BM GMD KV SEOLD-I ( N =100) SEOLD-II ( N =100) SEOLD-II ( N =500) ML

Figure 3.5: Codeword error probability performance of the (31,25) Reed-Solomon code; 10 iterations.

(46)

Chapter 4

Stochastic List Decoding of Linear

Block Codes

In the previous chapter, we focus on decoding RS codes of short to medium length. The MDS character of RS codes is exploited to reduce the complexity of locating near-by codewords. Such an approach cannot be applied to general linear codes. We thus present a new decoding method which is valid for arbitrary linear codes but is more effective for codes with small girth.

4.1

Preliminary

Let C be a binary (N, K) linear block code with minimum distance dmin and M × N

parity-check matrix H. As the rows of H may be dependent, we have M > N− K. Let I = {1, · · · , N} and J = {1, · · · , M} be the sets of column indices and row indices of H, respectively. We denote the set of bits n that participate in check m by N (m) = {n : Hmn = 1}. Similarly, we define the set of checks in which bit n participates as

M(n) = {j : Hmn = 1}. We denote a set N (m) with bit n excluded by N (m)\n, and

a set M(n) with parity check m excluded by M(n)\m. The cardinality of N (m) and M(n) are denoted by |N (m)| and |M(n)|, respectively. Let en be a 1× N elementary

vector with 1 at position n and 0 at other entries.

(47)

transpose of H and 0 is a 1× M zero vector. For each row hm of H, m∈ J, let Cm ={c ∈ {0, 1}N : chTm = 0 mod 2}, (4.1) then C = M \ m=1 Cm. (4.2)

Using the binary phase-shift-keying (BPSK) signal, the transmitter maps a codeword c into the bipolar vector

Ψ(c) = x = (x1,· · · , xN), xn = Ψ(cn) = (−1)cn (4.3)

and sends it over an additive white Gaussian noise (AWGN) channel with zero mean and power spectral density N0/2 W/Hz. The received sequence at the output of the

matched filter is given by y = (y1,· · · , yN), where yn= xn+wnand wn’s are statistically

independent Gaussian random variables with zero mean and variance N0/2.

Let z = (z1,· · · , zN) be the hard decision version of the received sequence y, i.e.,

zn =

(

0, yn > 0

1, otherwise (4.4) For m∈ J, we define σm as the result of check sum-m based on the hard-decision vector

z: σm =   X n∈N (m) znHmn   (mod 2) (4.5)

and define Σ = (σ1,· · · , σM) as the syndrome vector.

Denoted by Γ = (γ1,· · · , γN−1) the reliability vector of y in which γnis the magnitude

of the log-likelihood ratio (LLR) associated with the corresponding hard-limited bit zn

Ln = log

P (cn= 0| y)

P (cn= 1| y)

. (4.6)

(48)

Let λm be the reliability of check sum m which is defined as

λm = min

n∈N (m)γn (4.7)

Then we first sort m : m ∈ J} and let m1, m2,· · · , mM denote the position of the

check sums in terms of descending order of m : m∈ J}, i.e., the check sum m1 is the

most reliable and mM is the least reliable.

Define qn = P (zn 6= cn|y) as the a posteriori probability that bit n is in error based

on y. Then we have the following lemmas.

Lemma 4.1 For the AWGN channel model considered, the probability qn can be

ex-pressed as qn = 1 1 + eγn (4.8) Proof : See Appendix A.

Lemma 4.2 The probability that for check sum m ∈ M(n), the sum of all bits n0

N (m)\n mismatches the transmitted bit n0, say r mn, is rmn = 1 2  1− Y n0∈N (m)\n (1− 2qn0)  . (4.9) Proof : See [26].

Note that rmn represents the probability of having an odd number of errors N (m)\n.

Define ˜qn as the a posteriori probability that bit n is in error based on the results of the

(49)

Theorem 4.1 Given the received word y and the syndrome set Σn ≡ {σm : m∈ M(n)},

the logarithm of the bit correctness probability ratio for bit n, say ξn, is

ξn = log " 1− ˜qn ˜ qn # = log " P (zn = cn|y, Σn) P (zn 6= cn|y, Σn) # ∼ = γn+ X m∈M(n) " (1− 2σm) min n0∈N (m)\nγn0 !# (4.10) Proof : See Appendix B.

4.2

Sequential Bit-Flipping Algorithm

In this section, we introduce a single-run sequential bit-flipping (SBF) algorithm for transforming z into a valid codeword. This procedure has a special constraint about the parity-check matrix H that H has to be a systematic form. First of all, consider the rows of the parity-check matrix H are linearly independent, i.e., M = N − K. Using appropriate row operations, H can be transformed into a systematic form, say

˜

H = [IMP], where IM is an M × M identity matrix and P is an M × (N − M) binary

matrix. Note that both H and ˜H are the null space of C, hence we can decode the received word by using ˜H instead of H.

However, it is impossible to have this transformation when the rows of H are linearly dependent, i.e., M > N − K. Fortunately, we can remove M − N + K rows of H which can be represented by the linear combination of the remain rows to get a (N − K) × N sub-matrix H0 where H0 has its systematic form ˜H0.

Example 4.1 Consider the following parity check matrix: H =         1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 1         =     h1 ... h5     (4.11)

(50)

Since h1 = h2+· · · + h5, we can remove h1 from H and get the following sub-matrix H0 =      1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 1      (4.12) where H0 is itself a systematic form.

Note that it is easy to confirm that ˜H0 is still the null space of C. Therefore, without

lost of generality, we assume that H is always a systematic form in this chapter for simplicity.

Remind that C =T

m∈JCm, i.e., a codeword c also belongs to subcodes Cm for all

m ∈ J. The idea of the SBF algorithm is to modify z sequentially such that the final result is a valid codeword. Specifically, the SBF algorithm separates the original problem into M sub-problems and solves these sub-problems sequentially in terms of an arbitrary order of {1, · · · , M}, denoted as o = (o(1), · · · , o(M)). The procedure must ensure that the solution of the m-th sub-problem also satisfy the constraints of previous (m− 1) sub-problems. Along the process of the procedure, a sequence of vectors d1, d2,· · · , dM

are produced where

dt∈ ∩tm=1Co(m), 1≤ t ≤ M. (4.13)

and dM is obviously a valid codeword. In general, the SBF algorithm needs to input

a predetermined order o and the LLR vector L at the beginning. At the end of the procedure, a valid codeword d = (d1,· · · , dN) and an associated new LLR vector ˆL =

( ˆL1,· · · , ˆLN) are the outputs. The difference between ˆL and L is given by

( ˆ

Ln= Ln if dn = zn

ˆ

Ln=−Ln if dn 6= zn

(4.14) where ˆL is useful for the stochastic decoding algorithm described in Section 4.5.

Next, we formulate the detailed procedure of the SBF algorithm as below: 1. Let d0 be the hard limiting vector of L, ˆL = L, and I0 ={φ}. Set t = 0.

(51)

2. Let It = It−1∪N (o(t)). If dt−1 ∈ Co(t), let dt = dt−1. Otherwise, find the solution,

say n∗, of

arg min

n∈{It\It−1}ξn (4.15)

where ξn is evaluated by (4.10). Let

dt ← dt−1+ en∗ (mod 2), (4.16)

ˆ

Ln∗ ← −ˆLn∗, (4.17)

σm ← σm+ 1 (mod 2) ∀m ∈ M(n∗), (4.18)

t ← t + 1.

3. If t = M , stop the procedure and output both d = dM and ˆL. Otherwise, go to

Step 2.

We denote the relationship between the inputs and outputs of the above procedure as (d, ˆL) = Ω(o, L) for simplicity.

Remark 4.1 We have to mention that once the number of error bits in {It\It−1} is

greater than or equal to 2, the output codeword dM won’t be the transmitted codeword.

Example 4.2 Consider a (8,4) linear block code with parity check matrix: H =      1 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 0 0 1 0 1 1 1      (4.19) Suppose all zero codeword is transmitted and the received word y is given by

y = (1.83, 2.07, 2.36,−0.21, 1.05, 1.91, −0.09, 1.63). Then the hard limiting vector z is

z = (0, 0, 0, 1, 0, 0, 1, 0),

and an example of the SBF algorithm is shown in Fig. 4.1 where the order of check sums are 3→ 2 → 1 → 4.

(52)

1 2 3 4 5 6 7 8

+

+

+

+

4.60

0

5.22

0

5.95

0

-0.52

1

2.63

0

4.81

0

-0.23

1

4.11

0

LLR

z

0.23

1

0.23

1

2.63

0

0.23

0

Check Reliability

Check sum

Check #3

sum = 0

0

0

0

0

Check #2

sum = 1

0

0

1

0

1. [7=-2.11 <[3=5.72, so flip bit c7 2. V1m0, V2m0, V4m1

0

Check #1

sum = 0

0

0

0

0

Check #4

sum = 1

1

0

0

0

1. flip bit c4 2. V4m0

0

d

0

= (00010010)

d

1

= (00010010)

d

2

= (000100

0

0)

d

3

= (00010000)

d

4

= (000

0

0000)

(53)

4.3

Predicament of Decoding via SBF algorithm

We have introduced a sequential bit flipping procedure, say SBF algorithm, in previous section. It is a simple unified framework for transforming the hard limiting vector z into a codeword in C. Note that different order may induce different codeword to be produced. For instance, the output codewords in Example 4.2 and 4.3 are different because of different orders although they face the same received word y.

Example 4.3 Consider the same case described in Example 4.2. If we change the or-der from 3 → 2 → 1 → 4 to 4 → 3 → 2 → 1, the output codeword d will become (1, 0, 1, 1, 0, 0, 1, 0).

We observe that output codeword d in Example 4.2 is equal to the transmitted codeword but in Example 4.3 is not. It is because the order used in Example 4.3 meets the situation described in Remark 4.1. Obviously, it is a big problem if we want to decode by SBF algorithm. Therefore, we try to solve this problem by the following two ideas:

1. Find appropriate order to avoid the situation described in Remark 4.1.

2. Correct some error bits in advance such that the number of orders which can decode the correct codeword increases.

Note that the first idea is impractical because the complexity of finding appropriate order grows quickly as M increases. Besides, the hardware implementation is inefficient if the order changes frequently. Consequently, we propose two modified methods for decoding based on the SBF algorithm with a fixed order. The first one is designed for cyclic codes that we apply the SBF algorithm to transform all of the cyclic shifted received word into valid codewords. Note that cyclic shifting the received word is similar to decode in different order even though we don’t change the order actually. The another method is to implement the second idea based on the concept of the randomized sphere decoding with moving center which can correct errors iteratively.

(54)

4.4

SBF Algorithm with Cyclic Shifts

Assume a codeword cT belongs to a cyclic code C is transmitted and letL = (L1,· · · , LN)

be the LLR vector of the received word. Define Lν = (L

ν−1, Lν−2,· · · , Lν) as the cyclic

shifted version of L by ν positions. Then we can obtain a set of candidates by the following algorithms:

1. Determine an order o for the SBF algorithm.

2. For all ν ∈ J, apply the SBF algorithm for Lν and o to obtain a set of candidates

D ={d1,· · · , dN} where

(dν, ˆ

) = Ω(o,

).

The transmitted codeword is then estimated by ˆcT = arg min

c∈Dd(Ψ(c), y), (4.20)

where d(a, b) is the Euclidean distance between a and b.

4.5

Stochastic Sequential Bit Flipping Algorithm

Ideally, we can transform z into the transmitted codeword through the SBF algorithm if the appropriate order is found. In fact, such order is hard to find, especially when the LLRs are unreliable. Therefore, we don’t want to decode based on the original LLR vector L at all times but hope to gradually change L such that its hard limiting vector is more and more close to the transmitted codeword. In order to implement this idea, we use the similar method illustrated in Section 3.4 which is an iterative procedure with the following two phases:

1. Generate Ns virtual LLR vectors around the original one according to a specific

(55)

2. Update the parameters of the random mechanism based on Es better candidates

in order to generate better virtual LLR vectors in next iteration.

Note that this is the basic idea of our randomized sphere decoding with moving center. Next, we will illustrate the random mechanism further in next two subsections.

4.5.1

Importance Density and Sample Format

Let s = (s1,· · · , sN) be a random vector where s1,· · · , sN are independent Gaussian

ran-dom variables with means µ1,· · · , µN and variances ρ21,· · · , ρ2N. We write s ∼ N (~µ, ~ρ),

where ~µ = (µ1,· · · , µN) and ~ρ = (ρ1,· · · , ρN) are initialized by

µ(0)n = Ln (4.21)

ρ(0)n = 4 N0

(4.22) At the tth iteration, Nsrandom samples s

(t) 1 , s

(t)

2 ,· · · , s (t)

Ns are drawn from N



~µ(t), ~ρ(t)

to form the sample set S(t). Each sample vector represents the LLRs of an associated

virtual received word. We decode them by the SBF algorithm based on an pre-designed order and obtain sets of candidates d(t)` and associated LLR vectors ˆs

(t) ` = (ˆs`,1,· · · , ˆs`,N) for 1≤ ` ≤ Ns.

4.5.2

Update Parameters

Let d(t)1 ,· · · , d (t)

Ns be the output codewords of the SBF algorithm. We compute the ED

between each candidate codeword and the received word y and sort the corresponding random vectors according to the descending order of their associated EDs. Define an elite set E(t)which includes E

svectors with the smallest EDs to y, i.e., the corresponding

codewords are more likely to have been transmitted. We always store the best one in E(t) up to the current iteration for the final decision when the maximum number of

數據

Figure 1.1: A correctly decoding example of a bounded distance decoder.
Figure 1.3: Decoding failure by a bounded distance decoder.
Figure 1.4: Decoding beyond FEC bound by enlarging the decoding sphere. rithms are also members of the so-called list decoding algorithms because the enlarged decoding sphere may include more than one codewords.
Figure 1.5: Belief propagation - successful decoding.
+7

參考文獻

相關文件

【Those/The students are】 【either copying/taking (the) notes】 【or correcting the mistakes on the quiz/test/exam (paper) (now).】. 【I thought I knew/had known】 【how to/I

He proposed a fixed point algorithm and a gradient projection method with constant step size based on the dual formulation of total variation.. These two algorithms soon became

術科測試編號最小(假設為第 1 號)之應檢人抽中崗位號碼 6,則第 1 號應檢人入 座崗位號碼為 6,第 2 號應檢人入座崗位號碼為 7,第

The learning and teaching in the Units of Work provides opportunities for students to work towards the development of the Level I, II and III Reading Skills.. The Units of Work also

- Informants: Principal, Vice-principals, curriculum leaders, English teachers, content subject teachers, students, parents.. - 12 cases could be categorised into 3 types, based

Then, a visualization is proposed to explain how the convergent behaviors are influenced by two descent directions in merit function approach.. Based on the geometric properties

This paper presents (i) a review of item selection algorithms from Robbins–Monro to Fred Lord; (ii) the establishment of a large sample foundation for Fred Lord’s maximum

如果我們有 一個簡單的位移密碼, 則字母 e 變成密文中的某一個字母, 所以這個字母出現 的頻率會跟 e 在原文中出現的頻率一樣。 因此頻率分析之後, 鑰匙可能就會被逼現身。