國 立 交 通 大 學
電信工程學系
博 士 論 文
具高密度位元檢測碼之隨機列舉解碼法
Stochastic List Decoding of
High-Density Parity-Check Codes
研 究 生:李昌明
指導教授:蘇育德
具高密度位元檢測碼之隨機列舉解碼法
Stochastic List Decoding of High-Density
Parity-Check Codes
研究生:李昌明
Student:
Chang-Ming
Lee
指導教授:蘇育德 博士 Advisor:
Dr.
Yu
T.
Su
國立交通大學
電信工程學系
博士論文
A Dissertation
Submitted to Institute of Communication Engineering
College of Electrical and Computer Engineering
National Chiao Tung University
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
in
Communication Engineering
Hsinchu, Taiwan
具高密度位元檢測碼之隨機列舉解碼法
研究生:李昌明 指導教授:蘇育德 博士
國立交通大學電信工程研究所
中文摘要
在本論文中,我們研究了數種關於具有高密度位元檢測矩陣之線
性方塊碼的隨機解碼法。我們的方法可被視為一具有可移動中心的隨
機球體解碼法,它會根據一球體對稱的機率分佈來選取在中心向量附
近的候選碼。此機率分佈中心向量的更新是根據一被稱為交錯熵
(Cross-Entropy)方法的蒙地卡羅法來實現。在每一次的交錯熵方
法遞迴過程中,一別具含意的隨機樣本集合被產生並轉化為合法碼。
根據這些隨機產生的合法碼與接收向量間的歐幾理得距離,我們選擇
E
個較佳的候選碼組成一菁英集合並用來修正機率分佈進而影響往後
遞迴中產生的隨機樣本。為了確保新產生的隨機樣本會越來越集中在
正確的傳送碼附近,不僅中心向量將會移動到傳送碼,其蘊含的機率
分佈最終亦會退化為只在傳送碼有值的奇異函數。此外,每次遞迴被
更新的機率分佈參數應該要促使新的機率分佈與最佳分佈間的庫柏
克萊不勒(Kullback-Leibler)距離越來越接近。
我們在本論文裡提出了三類隨機解碼法。前兩類是特別針對
(
n
.
k
)里得所羅門碼所提出的設計。在第一種解碼法中,被產生的
隨機樣本代表了一隨機錯誤指標向量集合,其中每一個向量都指出了
接收字碼中
n-k
個應該被擦拭的位置。我們將接收字元中被指定的
相對位置擦拭後即可利用只具擦拭(Erasures-Only)解碼器還原成
候選碼。在第二種方法中,
n
維的實數隨機向量被產生並代表著接收
字碼的可靠度向量,而其中
n-k
個最不可靠的座標會假設為應該要被
擦拭。針對每個隨機樣本,我們將其
k
個最可靠的座標做硬式決策
(hard-decision)並利用只具擦拭解碼器將其還原為合法碼。第三
種演算法利用一連續位元翻轉演算法來將隨機樣本向量轉換為合法
碼 。 值 得 一 提 的 是 前 兩 種 演 算 法 只 對 可 最 大 距 離 分 離
(Maximum-Distance Separable)碼有用而第三種演算法則沒有這個
限制。我們的演算法相對於信賴傳遞(Belief Propagation)演算法
與部分現有的代數演算法提供了性能與複雜度上的改善,尤其是具有
高碼率、高密度位元檢測矩陣的方塊碼。
Stochastic List Decoding of High-Density
Parity-Check Codes
Student: Chang-Ming Lee Advisor: Yu T. Su Department of Communications Engineering
National Chiao Tung University
Abstract
In this thesis, we present several novel stochastic decoding algorithms for linear high-density parity-check (HDPC) codes. Our approach can be regarded as a randomized sphere decoding with moving center that selects candidate codewords around a center vector according to a sphere-symmetric probability distribution. The center (median) vector of the distribution is updated according to a Monte Carlo based approach called the Cross-Entropy (CE) method. The CE method produces, in every iteration, a set of random samples which can be transformed into valid codewords. Based on the Eu-clidean distances between the received word and the random codewords, we select the best E candidates to form the elite set which is then used to modify the probability distribution that govern the generations of the random samples in the ensuing iteration. To ensure that the newly generated samples are concentrated more and more on a small neighborhood of the correct codeword and either the median vector will move to or the underlying distribution will eventually degenerate to a singularity at the transmit-ted codeword, the parameters of the updatransmit-ted distribution should be such that the new distribution is closest to the optimal distribution in the sense of the Kullback-Leibler distance (i.e CE).
We propose three classes of stochastic decoding algorithms in this thesis. The first two are specifically designed for decoding (n, k) Reed-Solomon (RS) codes. For the first decoder, the random samples represent a set of random error locator vectors, each in-dicates n− k possible erasure positions within the received word. We associate each error locator vector with a candidate codeword by erasures-only (EO) decoding the re-ceived word, assuming that erasure locations are those indicated by the error locator vector. The n-dimensional real random vectors in the second algorithm represent reli-ability vectors whose least reliable n− k coordinates are assumed to be erasures. For each sample, we make component-wise hard-decisions on the most reliable k coordinates and EO-decoding the resulting binary vector. The third algorithm uses a sequential bit flipping procedure to convert each random sample into a legitimate codewords. The first two algorithms are valid for MDS codes only while the third algorithm can be used for decoding any linear block code. Our algorithms offer both complexity and performance advantage over BP and some existing algebraic decoding algorithms, especially for high rate linear HDPC codes of short or medium lengths.
誌 謝
在塵世打滾三十年許,雖無大過亦無寸功,所幸可留此
拙著而不至全無痕跡。本論文得以付梓端賴指導教授育德師
多年悉心教誨。師之典範不囿於專業領域,亦可見於待人處
事之應對進退,令昌明獲益匪淺。
感謝研究室的伙伴,社群互動使我的博士生涯備添光
彩。感謝吾妻芳先與雙親無條件的支持,讓我得以完成學業。
僅以此論文獻給不辭辛勞拉拔我於襁褓之中,卻於近日
離苦得樂的外婆,祝您一路順風。
註:感謝易利信獎助學金於就學期間之慷慨贊助,使我安心
於學業減免金錢之憂。
Contents
Chinese Abstract i
English Abstract iii
Acknowledgements v
Contents vi
List of Tables ix
List of Figures x
1 Introduction 1
2 The Cross-Entropy Method 9
2.1 Introduction . . . 9
2.2 The CE Method for Rare-Event Simulation . . . 10
2.3 The CE-Method for Optimization Problem . . . 15
3 Stochastic Erasure-Only List Decoding of RS codes 19
3.1 Preliminary . . . 19
3.2 Stochastic List Decoding Algorithm . . . 21
3.2.1 Algebraic Erasures-Only (EO) Decoding . . . 21
3.2.2 A Stochastic List Decoding Idea . . . 23
3.2.3 Convergence and Complexity . . . 24
3.3 List Decoding via Erasure Location Estimation . . . 26
3.3.1 Importance Density and Sample Format . . . 26
3.3.2 Update Parameters . . . 27
3.4 List Decoding via Virtual Received Words . . . 28
3.4.1 Importance Density and Sample Format . . . 28
3.4.2 Update Parameters . . . 29
3.5 Simulation Results and Discussion . . . 30
4 Stochastic List Decoding of Linear Block Codes 33 4.1 Preliminary . . . 33
4.2 Sequential Bit-Flipping Algorithm . . . 36
4.3 Predicament of Decoding via SBF algorithm . . . 40
4.4 SBF Algorithm with Cyclic Shifts . . . 41
4.5.1 Importance Density and Sample Format . . . 42
4.5.2 Update Parameters . . . 42
4.5.3 Stochastic Sequential Bit Flipping Algorithm . . . 43
4.6 Simulation Results and Discussions . . . 44 5 Conclusions and Future Works 49 A The Proof of Lemma 4.1 51 B The Proof of Theorem 4.1 52
List of Tables
List of Figures
1.1 A correctly decoding example of a bounded distance decoder. . . 2
1.2 An example of erroneous decoding for a bounded distance decoder. . . . 3
1.3 Decoding failure by a bounded distance decoder. . . 3
1.4 Decoding beyond FEC bound by enlarging the decoding sphere. . . 4
1.5 Belief propagation - successful decoding. . . 5
1.6 Belief propagation - trapped in a pseudo codeword. . . 6
1.7 A set of random samples are generated and the random samples in the small dash circle are better directions we want. . . 7
1.8 After updating the parameter of the random mechanism, the new set of generated random samples points the correct way more often. . . 7
3.1 Idea of the algebraic erasures-only decoding. . . 22
3.2 Flow chart of a stochastic decoder for RS codes. . . 25
3.3 Virtual received words are generated around the received LLR vector ¯Γ by hard-limiting the sample vectors generated by an importance probability density whose parameter values evolved according to the CE principle. . 29
3.4 Codeword error probability performance of the (15,11) Reed-Solomon code; 10 iterations. . . 31
3.5 Codeword error probability performance of the (31,25) Reed-Solomon code; 10 iterations. . . 32
4.2 Error rate performance of the (15,11) Hamming Code; Ns = 10, Es= 1 . 45
4.3 Error rate performance of the (7,5) RS Code; Ns= 10, Es = 1 . . . 46
4.4 Error rate performance of the (22,16) single error correction Code; Ns=
10, Es = 1 . . . 46
4.5 Error rate performance of the (39,32) single error correction Code; Ns=
10, Es = 1 . . . 47
4.6 Error rate performance of the (72,64) single error correction Code; Ns=
10, Es = 1 . . . 47
4.7 Error rate performance of the (31,26) BCH Code. . . 48 4.8 Error rate performance of the (15,11) RS Code. . . 48
Chapter 1
Introduction
Linear block codes are popular forward error-correcting (FEC) codes due to their simple structures and satisfactory FEC performance. For instance, Reed-Solomon (RS) codes [1] are used in a wide variety of commercial applications, most prominently in CDs, DVDs and Blue-ray discs, in data transmission technologies such as DSL and WiMAX, in broadcast systems such as DVB and ATSC, and in computer applications such as RAID 6 systems. Low density parity-check (LDPC) codes [2] form another class of linear block codes which offer FEC capability close to the theoretical maximum–the Shannon limit [3]. In recent years, LDPC codes have been adopted by several digital broadcast and communication standards such as the DVB-S2 [4], the IEEE 802.3an (10GBASE-T) [5], the IEEE 802.16e (WiMAX) [6], and the IEEE 802.11n (WiFi) [7]. Although many decoding algorithms for block codes are available, more efficient decoding algorithms which can provide performance enhancement and complexity reduction are still of high demand.
Most hard-decision decoding algorithms are bounded-distance decoders (BDD). They select the codeword c, if exits, whose Hamming distance (HD) to the hard-limiting received word z, say d(z, c), is less than or equal to b(dmin− 1)/2c = tmin, where dmin is
the minimum distance of the code C. As shown in Fig. 1.1, if z is within the decoding sphere centered at the transmitted codeword cT and then the BDD can correctly output
cT c1
c2
dm in
z
Figure 1.1: A correctly decoding example of a bounded distance decoder.
[10] and the Euclidean algorithm [11] for RS codes all belong to the class of BDDs. When z falls into another decoding sphere, e.g., a sphere centered at other legitimate codeword c1 as shown in Fig. 1.2, a BDD will make an incorrect decision such that a decoding
error occurs. A decoding failure is declared if z does not belong to any decoding sphere of radius tmin.
In general, a BDD can only correct up to b(dmin − 1)/2c errors while a maximum
likelihood soft-decision based decoding algorithm can easily correct beyond tmin at the
expense of much higher complexity. There are two general approaches to improve the performance without incurring too much complexity increase. The first one is trying to enlarge the decoding sphere (see Fig. 1.4) in order to correct errors beyond tmin. For
RS codes, the errors-and-erasures decoding [9], Forney’s generalized minimum distance (GMD) decoding [12], the algebraic list decoding algorithm invented by Guruswami and Sudan (GS) [13] and the algebraic soft decision decoding (SDD) algorithm proposed by Koetter and Vardy (KV) [14] belong to this category. Note that the latter three
algo-cT c1
c2
dm in
z
Figure 1.2: An example of erroneous decoding for a bounded distance decoder.
cT c1
c2
dm in
z
cT c1
c2 dm in
z
Figure 1.4: Decoding beyond FEC bound by enlarging the decoding sphere. rithms are also members of the so-called list decoding algorithms because the enlarged decoding sphere may include more than one codewords.
Another idea for performance enhancement is to sequentially modify and move z from its original position so that the new location becomes closer and closer to cT.
Decoding methods based on this idea include the Chase II algorithm [15], and the combined Chase II-GMD algorithm [16]. The belief propagation (BP) based algorithms such as the sum product algorithms (SPA) or its less-complex approximation, the min-sum algorithms (MSA) [17] and their variations are also members of this category. A successful decoding based on BP algorithm will gradually update the estimated soft output and move the modified received vector toward the true transmitted codeword cT; see Fig. 1.5. Unfortunately, the BP process may be trapped in some local minimum
and the modified received vector coincides with a pseudo codeword cp as is shown in
cT c1 c2 dmin z z z z
Figure 1.5: Belief propagation - successful decoding.
algorithms such as the annealed BP algorithm [18]. Another possible solution combines the BP algorithm with the BDD such as the algorithms proposed in [8] and [19]. If the pseudo codeword cp belongs to the decoding sphere of cT, successful decoding is
achieved although the BP algorithm makes z coincides with cp.
In this thesis, we investigate a novel idea of iterative decoding which is a randomized sphere decoding with moving center. If statistical information about possible locations of the transmitted codeword cT around the received word z is given, the order of search
should follow the most possible direction. However, we don’t have such information usually and hence we search follow a probability distribution which is learned by random sampling. Each sample is transformed into a valid codeword and we choose samples whose corresponding code words having smaller Euclidean distance (ED) to z to modify the distribution and update (move) z. As the iteration goes by, newly generated samples are concentrated more and more on a small neighborhood of the correct codeword. The modified distribution becomes closer in the Cross-Entropy (CE) sense to the optimal
cT c1 c2 dmin z z z cP
Figure 1.6: Belief propagation - trapped in a pseudo codeword.
(Dirac) distribution centered at the true transmitted codeword. The center thus move closer to cT accordingly. This concept is implemented by the CE method [22] which has
the following two phases:
1. Explore possible directions pointing the shortest way to the transmitted codeword cT via a set of random samples generated from a specific random mechanism.
2. Choose better directions to update the parameters of the random mechanism in order to find better direction in next iteration.
Fig. 1.7 and Fig. 1.8 illustrate the basic principle of the above idea.
The rest of this thesis is organized as follows. Chapter 2 introduces the CE method which is an elegant practical principle for efficiently simulating rare events and can be converted into an optimization solver. A stochastic erasure-only list decoding (SEOLD) algorithm uses the extended CE method for optimization problem by considering an optimal event as a rare event is illustrated in Chapter 3. In Chapter 4, we investigate
cT c1 c2 dm in z z z z z z
Figure 1.7: A set of random samples are generated and the random samples in the small dash circle are better directions we want.
cT c1 c2 dm in z z z z z z
Figure 1.8: After updating the parameter of the random mechanism, the new set of generated random samples points the correct way more often.
another stochastic list decoding algorithm based on a novel sequential bit flipping pro-cedure. Finally, we summarize our major contributions and suggest some future works in Chapter 5.
Chapter 2
The Cross-Entropy Method
The cross-entropy (CE) method which was originally developed as an adaptive algorithm for rare-event simulation based on variance minimization [20]. It was soon modified to a randomized optimization technique [21], where the original variance minimization program was replaced by an associated CE minimization problem. We summarize the basic concept of this simple, efficient, and general method in this chapter and more detailed investigations can be found in [22].
2.1
Introduction
In the field of rare-event simulation, the CE method is used in conjunction with im-portance sampling (IS), a well-known variance reduction technique in which the system is simulated under a different set of parameters, called the reference parameters (or different probability distribution) so as to make the occurrence of the rare event more likely. A major drawback of the conventional IS technique is that the optimal reference parameters to be used in IS are usually very difficult to obtain. Traditional techniques for estimating the optimal reference parameters [23] typically involve time consuming variance minimization programs. The advantage of the CE method is that it provides a simple and fast adaptive procedure for estimating the optimal reference parameters in the IS.
can be readily applied by first translating the underlying optimization problem into an associated estimation problem, named associated stochastic problem (ASP), which typically involves rare-event estimation. Estimating the rare-event probability and the associated optimal reference parameter for the ASP via the CE method translates effec-tively back into solving the original optimization problem.
In general, the CE algorithm is an iterative procedure that consists of the following two phases in each iteration.
• Generate samples from the specified importance density given by the parameters from the previous iteration.
• Update the parameters for next iteration according to the order of the score values associated with the drawn samples and the minimizing CE criterion.
The significance of the CE concept is that it defines a precise mathematical framework for deriving fast and good updating/learning rules.
2.2
The CE Method for Rare-Event Simulation
In this section, the basic idea behind the CE algorithm for rare event simulation is illustrated. Let x be a random vector taking values in some spaceX . Let {f(·; v)} be a family of probability density functions (pdfs) on X , with respect to some base measure µ where v is a real-valued parameter (vector). Therefore,
E[H(x)] =
Z
X H(x)f (x; v)µ(dx), (2.1)
for any function H. For simplicity, for the rest of this section we take µ(dx) = dx because of µ is either a continuous measure or the Lebesgue measure in most cases.
Let S be some real function onX . Suppose we are interested in the probability that S(x) is greater than or equal to some real number γ under f (x; u). This probability can be expressed as
If this probability is very small, say smaller than 10−5, we call {S(x) ≥ γ} a rare event. A straightforward way to estimate ` is to use crude Monte-Carlo simulation: Draw a random sample x1,· · · , xN from f (x; v); then
ˆ ` = 1 N N X i=1 I{S(xi)≥γ} (2.3)
is an unbiased estimator of `. However this poses serious problems when {S(x) ≥ γ} is a rare event since a large simulation effort is required to estimate ` accurately, that is, with a small relative error or a narrow confidence interval.
An alternative is based on importance sampling: take a random sample x1,· · · , xN
from an importance sampling density g on X , and estimate ` using the likelihood ratio (LR) estimator ˆ ` = 1 N N X i=1 I{S(xi)≥γ} f (xi; u) g(xi) . (2.4)
The best way to estimate ` is to use the change of measure with density g∗(x) = I{S(x)≥γ}f (x; u)
` . (2.5)
By using this change of measure we have in (2.4) I{S(xi)≥γ}
f (xi; u)
g∗(x i)
= `, (2.6) for all i. Since ` is a constant, the estimator (2.4) has zero variance, and we need to produce only N = 1 sample.
The obvious difficulty is that g∗ depends on the unknown parameter `. Moreover, it is often convenient to choose a g in the family of densities {f(·; v)}. The idea now is to choose the reference parameter v such that the distance between the density g∗ above and f (x; v) is minimal. A particularly convenient measure of distance between two densities g and h is the Kullback-Leibler (KL) distance defined as
D(g, h) = Eg " lng(x) h(x) # = Z g(x) ln g(x)dx− Z g(x) ln h(x)dx (2.7)
which is also termed the cross-entropy (CE) between g and h.
Minimizing the Kullback-Leibler distance between g∗ in (2.5) and f (x; v) is
equiva-lent to solve the maximization problem maxv
Z
g∗(x) ln f (x; v)dx (2.8) Substituting g∗ from (2.5) into (2.8) we obtain the maximization program
maxv
Z I
{S(x)≥γ}f (x; u)
` ln f (x; v)dx (2.9) which is equivalent to the program
maxv D(v) = maxv EuhI{S(x)≥γ}ln f (x; v)i (2.10) where D is implicitly defined above. Again using importance sampling, with a change of measure f (x; w) we can rewrite (2.10) as
maxv D(v) = maxv EwhI{S(x)≥γ}W (x; u, w) ln f (x; v)i, (2.11) for any reference parameter w, where
W (x; u, w) = f (x; u)
f (x; w) (2.12) is the likelihood ratio between f (x; u) and f (x; w). The optimal solution of (2.11) can be written as
v∗ = arg maxv EwhI{S(x)≥γ}W (x; u, w) ln f (x; v)i. (2.13) We may estimate v∗ by solving the following stochastic program
maxv D(v) = maxˆ v 1 N N X i=1 h I{S(xi)≥γ}W (xi; u, w) ln f (xi; v) i , (2.14) where x1,· · · , xN is a random sample from f (x; w). In typical applications the function
ˆ
D in (2.14) is convex and differentiable with respect to v, in which case the solution of (2.14) may be readily obtained by solving the following system of equations:
1 N N Xh I{S(xi)≥γ}W (xi; u, w)∇ ln f(xi; v) i = 0. (2.15)
The advantage of this approach is that the solution of (2.15) can often be calculated analytically. In particular, this happens if the distributions of the random variables belong to a natural exponential family (NEF).
We have to note that the CE program (2.14) or (2.15) are useful only if the probability of the target event {S(x) ≥ γ} is not too small under w, say greater than 10−5. For
rare-event probabilities, due to the rareness of the events {S(xi) ≥ γ}, most of the
indicator random variables I{S(xi)≥γ}, i = 1,· · · , N, will be zero, for moderate N. It
makes the program (2.14) and (2.15) difficult to carry out. A multilevel algorithm can be used to overcome this difficulty. The basic idea is to construct a sequence of reference parameters {vt, t≥ 0} and a sequence of levels {γt, t≥ 1}, and iterate in both vt and
γt.
We initialize by choosing a not very small %, say % = 10−2 and by defining v 0 = u.
Next, we let γ1 (γ1 < γ) be such that, under the original density f (x; u), the probability
`1 = EuI{S(xi)≥γ1} is at least %. We then let v1 be the optimal CE reference parameter
for estimating `1, and repeat the last two steps iteratively with the goal of estimating the
pare{`, v∗}. In other words, each iteration of the algorithm consists of two main phases.
In the first phase γt is updated, in the second vt is updated. Specifically, starting with
v0 = u we obtain the subsequent γt and vt as follows:
1. Adaptive updating of γt For a fixed vt−1, let γt be a (1− %)-quantile of S(x)
under vt−1. That is, γt satisfies
Pvt−1(S(x)≥ γt)≥ %, (2.16)
Pvt−1(S(x)≤ γt)≥ 1 − %, (2.17)
where x ∼ f(x; vt−1).
A simple estimator ˆγtof γtcan be obtained by drawing a random sample x1,· · · , xN
from f (x; vt−1), calculating the performances S(xi) for all i, ordering them from
%)-quantile as
ˆ
γt= S(d(1−%)N e) (2.18)
Note that S(j) is called the j-th order-statistic of the sequence S(x1),· · · , S(xN).
Note also that ˆγt is chosen such that the event{S(x) ≥ ˆγt} is not too rare (it has
a probability of around %), and therefore updating the reference parameter via a procedure such as (2.18) is not void of meaning.
2. Adaptive updating of vt For fixed γt and vt−1, derive vt from the solution of
the following CE program maxvD(v) = Evt−1
h
I{S(x)≥γt}W (x; u, vt−1) ln f (x; v)
i
. (2.19) The stochastic counterpart of the above equation is as follows: for fixed ˆγt and
ˆ
vt−1, derive ˆv from the solution of following program
maxv D(v) = maxˆ v 1 N N X i=1 h I{S(xi)≥ˆγt}W (xi; u, ˆvt−1) ln f (xi; v) i . (2.20) Thus, at the first iteration, starting with ˆv0 = u, to get a good estimate for ˆv1,
the target event is artificially made less rare by (temporarily) using a level ˆγ1 which is
chosen smaller than γ. The value of ˆv1 obtained in this way will (hopefully) make the
event {s(x) ≥ γ} less rare in the next iteration, so in the next iteration a value ˆγ2 can
be used which is closer to γ itself. The algorithm terminates when at some iteration t a level is reached which is at least γ and thus the original value of γ can be used without getting too few samples.
The above rationale results in the following algorithm: 1. Define ˆv0 = u. Set t = 1.
2. Generate a sample z1,· · · , xN from the density f (x; vt−1) and compute the sample
(1− %)-quantile ˆγtof the performances according to (2.18), provided ˆγtis less than
3. Use the same sample x1,· · · , xN to solve the stochastic program (2.20). Denote
the solution by ˆvt.
4. If ˆγt< γ, set t = t + 1 and reiterate from Step 2. Else proceed with Step 5.
5. Estimate the rare-event probability ` using the LR estimate ˆ ` = 1 N N X i=1 I{S(xi)≥γ}W (xi; u, ˆvT) (2.21)
where T denotes the final number of iterations.
2.3
The CE-Method for Optimization Problem
Consider the following general maximization problem: Let X be a finite set of states, and let S be a real-valued performance function on X . We wish to find the maximum of S over X and the corresponding state at which this maximum is attained. Let us denote the maximum by γ∗. Thus,
S(x∗) = γ∗ = max
x∈X S(x). (2.22)
The starting point in the methodology of the CE method is to associate with the optimization problem (2.22) a meaningful estimation problem. To this end we define a collection of indicator functions nI{S(x)≥γ}
o
on X for various levels γ ∈ R. Next, let {f(·; v), v ∈ V} be a family of (discrete) probability densities on X , parameterized by a real-valued parameter (vector) v. For a certain u ∈ V we associate with (2.22) the problem of estimating the number
`(γ) = Pu(S(x)≥ γ) =
X
x
I{S(x)≥γ}f (x; u) = EuI{S(x)≥γ}, (2.23)
where Pu is the probability measure under which the random state x has probability
density function (pdf) f (x; u), and Eu denotes the corresponding expectation operator.
We will call the estimation problem (2.23) the associated stochastic problem (ASP). To indicate how (2.23) is associated with (2.22), suppose for example that γ is equal to γ∗
and that f (x; u) is the uniform density on X . Note that, typically, `(γ∗) = f (x∗; u) =
1/|X | where |X | denotes the number of elements in X is a very small number. Thus, for γ = γ∗ a natural way to estimate `(γ) would be to use the LR estimator (2.21) with
reference parameter v∗ given by
v∗ = arg maxv EuhI{S(x)≥γ}ln f (x; v)i. (2.24) This parameter could be estimated by
ˆ v∗ = arg maxv 1 N h I{S(xi)≥γ}ln f (xi; v) i (2.25) where the xi are generated from pdf f (x; u). It is plausible that, if γ is close to γ∗,
that f (x; v∗) assigns most of its probability mass close to x∗, and thus can be used to generate an approximate solution to (2.22). However, it is important to note that the estimator (2.25) is only of practical use when I{S(x)≥γ} = 1 for enough samples. This
means for example that when γ is close to γ∗, u needs to be such that P
u(S(x) ≥ γ)
is not too small. Thus, the choice of u and γ in (2.22) are closely related. On the one hand we would like to choose γ as close as possible to γ∗, and find (an estimate of) v∗
via the procedure above, which assigns almost all mass to state(s) close to the optimal state. On the other hand, we would like to keep γ relative large in order to obtain an accurate estimator for v∗.
The situation is very similar to the rare-event simulation case. The idea is to adopt a two-phase multilevel approach in which we simultaneously construct a sequence of levels ˆγ1, ˆγ2,· · · , ˆγT and parameter (vectors) ˆv0, ˆv1,· · · , ˆvT such that ˆγT is close to the
optimal γ∗ and ˆv
T is such that the corresponding density assigns high probability mass
to the collection of states that give a high performance. This strategy is embodied in the following procedure: 1. Define ˆv0 = u. Set t = 1.
2. Generate a sample x1,· · · , xN from the density f (x; vt−1) and compute the sample
3. Use the same sample x1,· · · , xN and solve the stochastic program (2.20) with
W = 1. Denote the solution by ˆvt.
5. If for some t≥ d, say d = 5, ˆ
γt= ˆγt−1=· · · = ˆγt−d, (2.26)
then stop (let T denote the final iteration); otherwise set t = t + 1 and reiterate from Step 2.
Note that the initial vector ˆv0, the sample size N , the stopping parameter d, and the
number % have to be specified in advance.
The above procedure can, in principle, be applied to any discrete and continuous optimization problem. For each individual problem two essential ingredients need to be supplied:
1. We need to specify how the samples are generated. In other words, we need to specify the family of densities {f(·; v)}.
2. We need to calculate the updating rules for the parameters, based on cross-entropy minimization.
In general there are many ways to generate samples from X , and it is not always immediately clear which way of generating the sample will yield better results or easier updating formulas.
2.4
Updating Rules of Some Useful Densities
In this section we will derive the updating rules for two pdfs which are commonly used for the CE method. The first one is the Bernoulli distribution and the second is the Gaussian distribution.
Suppose the random vector xi = (xi1,· · · , xin)∼ Ber(p) where Ber(p) is Bernoulli
distribution with parameter p = (p1,· · · , pn). Consequently, the pdf is
f (xi; p) = n
Y
j=1
pxiji (1− pi)1−xij, (2.27)
and since each xij can only be 0 or 1,
∂ ∂pj ln f (xi; p) = xij pj − 1− xij 1− pj = 1 (1− pj)pj (xij − pj). (2.28)
Now we can find the maximum in (2.20) (with W = 1) by setting the first derivatives with respect to pj equal to zero, for j = 1,· · · , n:
∂ ∂pj N X i=1 I{S(xi)≥γ}ln f (xi; p) = 1 (1− pj)pj N X i=1 I{S(xi)≥γ}(xij − pj) = 0. (2.29)
Thus, we get the updating rule pj = PN i=1I{S(xi)≥γ}xij PN i=1I{S(xi)≥γ} . (2.30)
Next, consider the Gaussian density f (x; µ, σ2) = √ 1 2πσ2e −1 2 (x−µ)2 σ2 , x∈ R. (2.31)
The optimal solution of (2.20) (with W = 1) follows from minimization of 1 σ2 N X i=1 Ii(xi− µ)2+ ln(σ2) N X i=1 Ii, (2.32)
where Ii = I{S(xi)≥γ}. It is easily seen that this minimum is obtained at (ˆµ, ˆσ2) given by
ˆ µ = PN i=1Iixi PN i=1Ii (2.33) and ˆ σ2 = PN i=1Ii(xi− ˆµ)2 PN i=1Ii (2.34)
Chapter 3
Stochastic Erasure-Only List
Decoding of RS codes
In this chapter, we apply the Cross-Entropy (CE) method [22] to develop a Monte Carlo based iterative SDD algorithm which renders an improved algebraic SDD decoding per-formance. The CE method is an elegant practical principle for simulating rare events which approximates the probability of the rare event by means of a family of parameter-ized probabilistic models. Our stochastic erasure-only list decoding (SEOLD) algorithm uses the extended CE method for optimization problem by considering an optimal event as a rare event.
3.1
Preliminary
Let C be an (n, k) RS code over GF(2m) with minimum Hamming distance d min =
n− k + 1. Let c = (c0,· · · , cn−1) be a codeword in C. For binary transmission, every
code symbol must be expanded into binary with symbols from GF(2) ={0, 1}. Let α be primitive in GF(2m), then the ith symbol c
i can be uniquely represented by the binary
m-tuple c(b)i = (ci,0,· · · , ci,m−1) where ci = ci,0α0 +· · · + ci,m−1αm−1, ∀ci,j ∈ GF(2).
Therefore, the codeword c can be uniquely mapped into the binary expansion vector ¯
c = (c(b)0 , c(b)1 ,· · · , c(b)
n ) = (¯c0, ¯c1,· · · , ¯cnm−1).
codeword ¯c into the bipolar vector
Ψ(¯c) = ¯x = (¯x0,· · · , ¯xnm−1), ¯xj = Ψ(¯cj) = (−1)cj¯ (3.1)
and sends it over an additive white Gaussian noise (AWGN) channel with zero mean and power spectral density N0/2. The received sequence at the output of the matched
filter is ¯y = (¯y0,· · · , ¯ynm−1) where ¯yj = ¯xj + ¯wj and ¯wj’s are statistically independent
Gaussian random variables with zero mean and variance N0/2.
Let ¯z = (¯z0,· · · , ¯znm−1) be the hard decision binary vector of the received bit sequence
¯ y, i.e., ¯ zj = ( 0, ¯yj > 0 1, otherwise (3.2) and z = (z0,· · · , zn−1) be the corresponding symbol vector. Denoted by ¯Γ = (¯γ1,· · · , ¯γnm−1)
the reliability vector of ¯y in which ¯γj is the magnitude of the log-likelihood ratio (LLR)
associated with the corresponding hard-limited bit ¯zj
L (¯cj) = log
P ( ¯cj = 0| ¯y)
P ( ¯cj = 1| ¯y)
, (3.3)
and define the symbol reliability vector Γ = (γ0,· · · , γn−1) of z by
γi = min
j γ¯j, j ∈ {im, · · · , (i + 1)m − 1} (3.4)
Assume that the ith symbol ci of c is uniformly distributed over GF(2m) and the n
received symbols are independent and uniformly drawn from GF(2m). Then P (c
i = β|¯y),
the probability that ci = β was transmitted given the observation ¯y can be easily
evaluated [14]: P (ci = β|¯y) = P (ci = β|¯yi) = P P (¯yi|ci = β)P (ci = β) ω∈GF(2m) P (¯yi|ci = ω)P (ci = ω) = P P (¯yi|ci = β) GF(2m) P (¯yi|ci = ω) (3.5)
where
¯
yi = (¯yim, ¯yim+1,· · · , ¯yim+m−1),
P (¯yi|ci = β) = m−1
Y
j=0
P (¯yim+j|¯cim+j = βj) , β = β0α0+· · · + βm−1αm−1.
The q× n matrix R = [Rβi = P (ci = β|¯y)], q = 2m, will be referred to as the reliability
matrix of the received vector ¯y.
3.2
Stochastic List Decoding Algorithm
3.2.1
Algebraic Erasures-Only (EO) Decoding
It is well-known that RS codes are maximum-distance separable (MDS) which implies that any k coordinates (symbols) in an RS codeword can be used to determine the remaining n− k symbols. Hence it is sufficient to decide k correct (message) or n − k incorrect (error) coordinates of a codeword. Let ELbe the collection of all combinations
of n− k error coordinates, EL = ( s = (s0,· · · , sn−1) si ∈ {0, 1}, X i si = n− k ) (3.6) where si = 1 if the ith coordinate is in error. Then a straightforward decoding schedule
is given as below:
(a). For all s∈ EL, erase the corresponding n−k error coordinates of the received word
z and decode by the erasures-only (EO) decoder. The resulting codeword set is denoted by Cz.
(b). Choose the codeword from Cz with the best score, e.g., the one whose Euclidean
distance from the received word is the smallest, as the decoder output.
The basic idea of the above procedure is shown in Fig. 3.1. It can be easily confirmed that for any c∈ Cz, dH(c, z)≤ n − k, where dH(c, z) is the Hamming distance between
n -k
z
: vectors with Hamming distance n-k away from z
: codewords belong to Cz
: transmitted codeword : re-encode process
symbols is less than dmin. Furthermore, (b) is equivalent to the following minimization
problem
arg min
c d (Ψ(¯c), ¯y) subject to c∈ C
z (3.7)
where Ψ(·) is defined by (1) and d(¯a, ¯b) is the Euclidean distance (ED) between the nm-ary real vectors ¯a and ¯b.
3.2.2
A Stochastic List Decoding Idea
Each error locator vector (ELV) s ∈ EL represents a particular set of n− k possible
error coordinates and has a corresponding codeword cs that belongs to Cz. We denote
the latter relationship by s7→ cs. Although more than one ELV may be associated with
the same codeword, the complexity of searching for the optimal solution c∗ in the error location domain EL is still extremely high because the cardinality of ELis
n
k
and only a few (or one) elements in EL, depending on the number and locations of the received
errors, can be used to reconstruct c∗.
Suppose we model the selection of the ELV s from EL as a stochastic (vector-valued)
experiment governed by a family of parameterized distributions {f(s; u)} with u ∈ ν being a real-valued parameter vector. Usually f (s; u) is assumed to be uniformly distributed due to the lack of priori information whence the search in (3.7) is exhaustive unless some algebraic properties of the code are used. One way to solve (3.7) efficiently is to find a parameter v∗ such that f (s; v∗) = δ(s− s∗) where s∗ 7→ c∗. Then drawing
one sample from f (s; v∗) is sufficient to obtain the optimal solution c∗. To get around
the difficulty that c∗ is not known, one notices that the optimization problem (3.7) is
related to the estimation of the probability P (d(Ψ(¯cs), ¯y) ≤ η|s 7→ cs), which is a rare
event when η = η∗ = d(Ψ(¯c∗), ¯y). The connection comes from the fact that efficient
estimation of a rare event can be achieved by the method of importance sampling and in this case the optimal importance density is f (s; v∗). Without the knowledge of the threshold η∗, we start with a proper importance density f (s; ˆv) to generate samples of s
and compute an initial estimate ˆη for η. Ideally, we can use those drawn samples which satisfy d(Ψ(¯cs), ¯y)≤ ˆη to obtain new parameter value ˆv0 such that f (s; ˆv0) is closest to
f (s; v∗) in the Kullback-Leibler (KL) sense, i.e., the CE between f (s; ˆv0) and f (s; v∗)
is minimized. Since v∗ is unknown, we choose ˆv0 such that f (s; ˆv0) is closest to the
empirical distribution of s in those samples that are generated by f (s; ˆv) and satisfy d(Ψ(¯cs), ¯y) ≤ ˆη for this empirical distribution is likely to be a good approximation of
f (s; v∗). New samples of s are then produced by the updated importance density f (s; ˆv0) to compute new estimate ˆη0. This iterative procedure continues until |ˆη− ˆη0| is less than
a predetermined threshold.
The above method is known as the CE method [22] which is an iterative procedure that consists of the following two phases in each iteration.
• Generate samples from the specified importance density given by the parameters from the previous iteration.
• Update the parameters for next iteration according to the order of the score values associated with the drawn samples and the minimizing CE criterion.
Based on the above discussion, we propose a generic Monte Carlo based SDD algo-rithm as shown in Fig. 3.2 and in Table 3.1 with some detailed description given in Section 3.3 and 3.4.
3.2.3
Convergence and Complexity
Different convergence conditions and results have been discussed for the deterministic CE method and its extensions in [24] where it is also proved that convergence in distribution or η can be guaranteed but needs a proper tuning of the parameters of the algorithm such as the number of samples N , number of elites E, and smoothing factor ρ. Convergence to the global minimum is ensured only if a large sample size N is used. On the other hand, the computing complexity is related to N and is given by O(N (n− k)2). It is obvious
Sample
Generator
Erasure-Only
Decoder
Sample
Evaluator
Parameter
Updator
S
(t)v
(0)v
(t+1)S
E(t)D
(t)Store &
Choose the
Best
d*
(t) *ˆc
S
(t){f( ;v)}
Figure 3.2: Flow chart of a stochastic decoder for RS codes.
1. Define a family of probability densities {f(·; v), v ∈ ν} on the search space Rnm.
Initialize v(0). Set t = 1.
2. Generate a sample set S(t) whose N random vector samples are drawn from f (·; v(t)).
Regard the magnitudes as bit LLRs and convert them into symbol reliabilities.
Erase the n− k least reliable symbols and decode the received word by EO decoding. 3. Evaluate Euclidean distances between the decoded codewords and the received word.
Select the E vector samples with best metrics as the new elite set SE(t) ⊂ S(t)
and store the best decoded codeword d∗(t) in D(t).
4. Evaluate the new parameter v(t+1) by solving
v(t+1) = arg max v 1 |SE(t)| P s(t) ` ∈SE(t)ln f (s (t) ` ; v).
Update v(t+1) via v(t+1) = ρv(t+1)+ (1− ρ)v(t) where 0 < ρ < 1.
5. Terminate decoding if the stopping criterion is met. Choose the best codeword from the list nd∗(t); ∀ to, say ˆc∗, as the decoder output. Otherwise increase t by 1 and
return to step 2.
that the decoding performance can be improved by using a larger N . As we retain the best sample at the end of each iteration, the decoding performance is also improved by increasing the iteration number T . As an early-stopping at any iteration produces a decoded codeword, we say the algorithm converges if the sequence of decoded codewords converges. With a modest N , we found that the decoded codewords converge to the same codeword within 10 iterations in most cases. Our algorithm yields good performance although uniform convergence in distribution or η within a limited iterations is not guaranteed.
3.3
List Decoding via Erasure Location Estimation
In this section, we propose an novel algorithm to solve the discrete optimization problem described in (3.7) by utilizing the stochastic list decoding idea. This algorithm is named as the first kind stochastic erasure-only list decoding (SEOLD-I) algorithm which is used to efficiently estimate the most possible dmin− 1 locations of erasures about the received
word z.
3.3.1
Importance Density and Sample Format
Suppose the reliability matrix R is known at the receiver. Define the distrust function, fd: GF(2m)7→ (0, ∞), of the ith coordinate zi of z as
fd(zi) = P βRβ,i Rzi,i , β ∈ GF(2m )\ {zi} (3.8)
The larger the value of fd(zi) is, the higher the probability that zi should be erased at
the decoder.
Let s = (s0,· · · , sn−1) be a random vector where s0,· · · , sn−1 are independent
Bernoulli random variables with success probabilities p0,· · · , pn−1, i.e., P (si = 1) =
1− P (si = 0) = pi. We write s∼ Ber(p), where p = (p0,· · · , pn−1). In our case, si = 1
represents the ith symbol zi of z should be erased. On the other hand, zi will be reserved
The initial parameters p(0) =p(0) 0 ,· · · , p (0) n−1 are defined as p(0)i = ( 1− , fd(zi) > 1− fd(zi), otherwise (3.9)
where is an arbitrary real value between (0,1). At the tth iteration, let s(t)1 , s
(t)
2 ,· · · , s (t)
N be N trials drawn from Ber
p(t) which satisfy n−1 X j=0 s(t)`,j = dmin− 1, ∀` ∈ {1, · · · , N} (3.10) where s(t)` = (s (t) `,0,· · · , s (t)
`,n−1). We then collect samples from these N trials to form a
sample set S(t) ={s(t) 1 ,· · · , s (t) N}.
3.3.2
Update Parameters
Let d(t)1 ,· · · , d (t)N be the output codewords of the EO decoder. We compute the ED
between each candidate codeword and the received word y and sort the corresponding random vectors according to the descending order of their associated EDs. Define the elite set SE(t) at the tth iteration to be the E vectors with the smallest EDs to y, i.e.,
the corresponding codewords are more likely to have been transmitted. We always store the best one in SE(t) up to the current iteration for the final decision when the maximum
number of iteration is reached.
Suppose the parameters used to generate samples at the tth iteration are p(t) =
p(t)0 ,· · · , p (t) n−1
. The parameters for the next iteration are updated by considering the information provided by both p(t) and S
E(t). More precisely, the ith parameter p(t+1)i is
obtained by [22] p(t+1)i = (1− ρ)p (t) i + ρ· P `∈SE(t)s (t) `,i E (3.11)
3.4
List Decoding via Virtual Received Words
In Step 2 of Table 3.1 we try to find the most likely message/error coordinates such that the associated EO-decoded codeword is closest to the received vector. Note that the random samples are used to determine the erasure locations only, and the searching sphere of the algebraic list decoding described in Section 3.2.1 is always centered at the hard-limited received word z with radius equals to n−k. To increase our search range and improve decoding performance, we include some extra codewords which lie statistically in a small neighborhood of the received word in our expanded search, such that some of them may in fact be closer in ED to the true transmitted codeword c; see Fig. 3.3. More specifically, the expansion is accomplished by eliminating the requirement that the search be centered at z. Instead, we randomized the search center by EO-decoding the hard-decision versions of the drawn sample vectors which we refer to as virtual received words. If the importance density does converge to the desired density δ(s− s∗), such an
expanded search will eventually contract and converge to the true transmitted codeword.
3.4.1
Importance Density and Sample Format
Let ¯s = (¯s0,· · · , ¯snm−1) be a random vector where ¯s0,· · · , ¯snm−1 are independent
Gaus-sian random variables with means µ0,· · · , µnm−1 and variances σ20,· · · , σnm−12 . We write
¯s∼ N (~µ, ~σ), where ~µ = (µ0,· · · , µnm−1) and ~σ = (σ0,· · · , σnm−1) are initialized by
µ(0)j = ¯γj (3.12)
σj(0) =
q
|¯γj| (3.13)
At the tth iteration, N random samples ¯s(t)1 , ¯s(t)2 ,· · · ,¯s(t)N are drawn fromN ~µ(t), ~σ(t)
to form the sample set ¯S(t). Each sample vector represents the bit reliabilities of an
associated virtual received word. By using (3.4) to convert the bit reliabilities into symbol reliability, the n−k coordinates with smallest symbol reliabilities are erased; the
c
z
z
1z
2z
3z
N -1z
Nn-k
Figure 3.3: Virtual received words are generated around the received LLR vector ¯Γ by hard-limiting the sample vectors generated by an importance probability density whose parameter values evolved according to the CE principle.
remaining bit positions are hard-limited, mapped into symbol decisions and the resulting virtual received word is then decoded by an EO decoder.
3.4.2
Update Parameters
Let d(t)1 ,· · · , d (t)
N be the output codewords of the EO decoder. We compute the ED
between each candidate codeword and the received word y and sort the corresponding random vectors according to the descending order of their associated EDs. Define an elite set ¯S(t)E which includes E vectors with the smallest EDs to y, i.e., the corresponding
codewords are more likely to have been transmitted. We always store the best one in ¯S(t)E
up to the current iteration for the final decision when the maximum number of iteration is reached.
Then the two sets of parameters ~µ(t+1) and ~σ(t+1) are updated by [22]
µ(t+1)j = (1− λ)µ (t) j + λ· P ¯s(t) ` ∈¯S (t) E s¯ (t) `,j E (3.14)
and σj(t+1)= (1− η)σ (t) j + η· P ¯s(t) ` ∈¯S (t) E ¯ s(t)`,j − µ(t+1)j 2 E (3.15)
where λ and η are real values between (0, 1) used to smooth the variation of these parameters. The algorithm described in this section is called the second kind stochastic erasures-only listing decoding (SEOLD-II) algorithm.
3.5
Simulation Results and Discussion
In this section, some simulated performance of two SEOLD algorithms (SEOLD-I and SEOLD-II) are presented and compared with that of other well known RS decoding algorithms. A standard binary input AWGN channel is assumed over which the BPSK modulated codewords are transmitted. We model the receive matched filter output as the sum of a±1−valued sequence and Gaussian sequence with zero-mean i.i.d. components. The average performance bounds on the ML error probability of RS codes over an AWGN channel developed in [25] are used as the performance lower limits.
Due to the complexity and the decoding delay considerations, the SEOLD algorithms will not terminate until convergence is assured. Instead, we limit our decoding procedure to T iterations in all simulations.
Fig. 3.4 shows the codeword error rate (CER) performance of the (15,11) RS code over an AWGN channel. HDD-BM refers to the performance of a hard decision bounded minimum distance decoder such as the BM algorithm. GMD and KV refer to the per-formance of the GMD algorithm proposed by Forney and the algebraic soft decision decoding algorithm proposed by Koetter and Vardy, respectively. Note that the KV algorithm concerned here is infinite interpolation costs, i.e., the complexity is also infi-nite. For both SEOLD-I and SEOLD-II, the size of the sample set N and the size of the elite set E at every iteration are set to be 20 and 6, respectively. After 10 iterations, SEOLD-I has about 0.5 dB and 0.3 dB coding gain over GMD and KV at a CER of 10−5,
1 2 3 4 5 6 7 8 10-6 1x10-5 1x10-4 10-3 10-2 10-1 100
Codeword Error Rate
E b/N0 (dB) HDD-BM GMD KV SEOLD-I ( N =20) SEOLD-II ( N =20) ML
Figure 3.4: Codeword error probability performance of the (15,11) Reed-Solomon code; 10 iterations.
respectively. On the other hand, SEOLD-II outperforms all the previous algorithms with a performance gain of about 1.2 dB and 1.0 dB over GMD and KV at a CER of 10−5.
In Fig. 3.5, SEOLD-I has a near KV performance when the N = 100 and E = 10 after 10 iterations. At the same condition, SEOLD-II still outperforms the other algorithms with reasonable complexity. The SEOLD-II has about 0.6 dB and 1.0 dB coding gain over KV where N is equal to 100 and 500, respectively. In conclusion, the proposed decoding algorithms are capable of offering good performance with modest complexity for short high rate RS codes. Its performance can be further improved by increasing the sample size N and/or the maximum iteration number T at the cost of increased decoding complexity.
1 2 3 4 5 6 7 8 10-6 1x10-5 1x10-4 10-3 10-2 10-1 100
Codeword Error Rate
E b/N0 (dB) HDD-BM GMD KV SEOLD-I ( N =100) SEOLD-II ( N =100) SEOLD-II ( N =500) ML
Figure 3.5: Codeword error probability performance of the (31,25) Reed-Solomon code; 10 iterations.
Chapter 4
Stochastic List Decoding of Linear
Block Codes
In the previous chapter, we focus on decoding RS codes of short to medium length. The MDS character of RS codes is exploited to reduce the complexity of locating near-by codewords. Such an approach cannot be applied to general linear codes. We thus present a new decoding method which is valid for arbitrary linear codes but is more effective for codes with small girth.
4.1
Preliminary
Let C be a binary (N, K) linear block code with minimum distance dmin and M × N
parity-check matrix H. As the rows of H may be dependent, we have M > N− K. Let I = {1, · · · , N} and J = {1, · · · , M} be the sets of column indices and row indices of H, respectively. We denote the set of bits n that participate in check m by N (m) = {n : Hmn = 1}. Similarly, we define the set of checks in which bit n participates as
M(n) = {j : Hmn = 1}. We denote a set N (m) with bit n excluded by N (m)\n, and
a set M(n) with parity check m excluded by M(n)\m. The cardinality of N (m) and M(n) are denoted by |N (m)| and |M(n)|, respectively. Let en be a 1× N elementary
vector with 1 at position n and 0 at other entries.
transpose of H and 0 is a 1× M zero vector. For each row hm of H, m∈ J, let Cm ={c ∈ {0, 1}N : chTm = 0 mod 2}, (4.1) then C = M \ m=1 Cm. (4.2)
Using the binary phase-shift-keying (BPSK) signal, the transmitter maps a codeword c into the bipolar vector
Ψ(c) = x = (x1,· · · , xN), xn = Ψ(cn) = (−1)cn (4.3)
and sends it over an additive white Gaussian noise (AWGN) channel with zero mean and power spectral density N0/2 W/Hz. The received sequence at the output of the
matched filter is given by y = (y1,· · · , yN), where yn= xn+wnand wn’s are statistically
independent Gaussian random variables with zero mean and variance N0/2.
Let z = (z1,· · · , zN) be the hard decision version of the received sequence y, i.e.,
zn =
(
0, yn > 0
1, otherwise (4.4) For m∈ J, we define σm as the result of check sum-m based on the hard-decision vector
z: σm = X n∈N (m) znHmn (mod 2) (4.5)
and define Σ = (σ1,· · · , σM) as the syndrome vector.
Denoted by Γ = (γ1,· · · , γN−1) the reliability vector of y in which γnis the magnitude
of the log-likelihood ratio (LLR) associated with the corresponding hard-limited bit zn
Ln = log
P (cn= 0| y)
P (cn= 1| y)
. (4.6)
Let λm be the reliability of check sum m which is defined as
λm = min
n∈N (m)γn (4.7)
Then we first sort {λm : m ∈ J} and let m1, m2,· · · , mM denote the position of the
check sums in terms of descending order of {λm : m∈ J}, i.e., the check sum m1 is the
most reliable and mM is the least reliable.
Define qn = P (zn 6= cn|y) as the a posteriori probability that bit n is in error based
on y. Then we have the following lemmas.
Lemma 4.1 For the AWGN channel model considered, the probability qn can be
ex-pressed as qn = 1 1 + eγn (4.8) Proof : See Appendix A.
Lemma 4.2 The probability that for check sum m ∈ M(n), the sum of all bits n0 ∈
N (m)\n mismatches the transmitted bit n0, say r mn, is rmn = 1 2 1− Y n0∈N (m)\n (1− 2qn0) . (4.9) Proof : See [26].
Note that rmn represents the probability of having an odd number of errors N (m)\n.
Define ˜qn as the a posteriori probability that bit n is in error based on the results of the
Theorem 4.1 Given the received word y and the syndrome set Σn ≡ {σm : m∈ M(n)},
the logarithm of the bit correctness probability ratio for bit n, say ξn, is
ξn = log " 1− ˜qn ˜ qn # = log " P (zn = cn|y, Σn) P (zn 6= cn|y, Σn) # ∼ = γn+ X m∈M(n) " (1− 2σm) min n0∈N (m)\nγn0 !# (4.10) Proof : See Appendix B.
4.2
Sequential Bit-Flipping Algorithm
In this section, we introduce a single-run sequential bit-flipping (SBF) algorithm for transforming z into a valid codeword. This procedure has a special constraint about the parity-check matrix H that H has to be a systematic form. First of all, consider the rows of the parity-check matrix H are linearly independent, i.e., M = N − K. Using appropriate row operations, H can be transformed into a systematic form, say
˜
H = [IMP], where IM is an M × M identity matrix and P is an M × (N − M) binary
matrix. Note that both H and ˜H are the null space of C, hence we can decode the received word by using ˜H instead of H.
However, it is impossible to have this transformation when the rows of H are linearly dependent, i.e., M > N − K. Fortunately, we can remove M − N + K rows of H which can be represented by the linear combination of the remain rows to get a (N − K) × N sub-matrix H0 where H0 has its systematic form ˜H0.
Example 4.1 Consider the following parity check matrix: H = 1 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 1 = h1 ... h5 (4.11)
Since h1 = h2+· · · + h5, we can remove h1 from H and get the following sub-matrix H0 = 1 0 0 0 1 1 1 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 0 1 1 (4.12) where H0 is itself a systematic form.
Note that it is easy to confirm that ˜H0 is still the null space of C. Therefore, without
lost of generality, we assume that H is always a systematic form in this chapter for simplicity.
Remind that C =T
m∈JCm, i.e., a codeword c also belongs to subcodes Cm for all
m ∈ J. The idea of the SBF algorithm is to modify z sequentially such that the final result is a valid codeword. Specifically, the SBF algorithm separates the original problem into M sub-problems and solves these sub-problems sequentially in terms of an arbitrary order of {1, · · · , M}, denoted as o = (o(1), · · · , o(M)). The procedure must ensure that the solution of the m-th sub-problem also satisfy the constraints of previous (m− 1) sub-problems. Along the process of the procedure, a sequence of vectors d1, d2,· · · , dM
are produced where
dt∈ ∩tm=1Co(m), 1≤ t ≤ M. (4.13)
and dM is obviously a valid codeword. In general, the SBF algorithm needs to input
a predetermined order o and the LLR vector L at the beginning. At the end of the procedure, a valid codeword d = (d1,· · · , dN) and an associated new LLR vector ˆL =
( ˆL1,· · · , ˆLN) are the outputs. The difference between ˆL and L is given by
( ˆ
Ln= Ln if dn = zn
ˆ
Ln=−Ln if dn 6= zn
(4.14) where ˆL is useful for the stochastic decoding algorithm described in Section 4.5.
Next, we formulate the detailed procedure of the SBF algorithm as below: 1. Let d0 be the hard limiting vector of L, ˆL = L, and I0 ={φ}. Set t = 0.
2. Let It = It−1∪N (o(t)). If dt−1 ∈ Co(t), let dt = dt−1. Otherwise, find the solution,
say n∗, of
arg min
n∈{It\It−1}ξn (4.15)
where ξn is evaluated by (4.10). Let
dt ← dt−1+ en∗ (mod 2), (4.16)
ˆ
Ln∗ ← −ˆLn∗, (4.17)
σm ← σm+ 1 (mod 2) ∀m ∈ M(n∗), (4.18)
t ← t + 1.
3. If t = M , stop the procedure and output both d = dM and ˆL. Otherwise, go to
Step 2.
We denote the relationship between the inputs and outputs of the above procedure as (d, ˆL) = Ω(o, L) for simplicity.
Remark 4.1 We have to mention that once the number of error bits in {It\It−1} is
greater than or equal to 2, the output codeword dM won’t be the transmitted codeword.
Example 4.2 Consider a (8,4) linear block code with parity check matrix: H = 1 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 0 0 1 0 1 1 1 (4.19) Suppose all zero codeword is transmitted and the received word y is given by
y = (1.83, 2.07, 2.36,−0.21, 1.05, 1.91, −0.09, 1.63). Then the hard limiting vector z is
z = (0, 0, 0, 1, 0, 0, 1, 0),
and an example of the SBF algorithm is shown in Fig. 4.1 where the order of check sums are 3→ 2 → 1 → 4.
1 2 3 4 5 6 7 8
+
+
+
+
4.60
0
5.22
0
5.95
0
-0.52
1
2.63
0
4.81
0
-0.23
1
4.11
0
LLR
z
0.23
1
0.23
1
2.63
0
0.23
0
Check Reliability
Check sum
Check #3
sum = 0
0
0
0
0
Check #2
sum = 1
0
0
1
0
1. [7=-2.11 <[3=5.72, so flip bit c7 2. V1m0, V2m0, V4m10
Check #1
sum = 0
0
0
0
0
Check #4
sum = 1
1
0
0
0
1. flip bit c4 2. V4m00
d
0= (00010010)
d
1= (00010010)
d
2= (000100
0
0)
d
3= (00010000)
d
4= (000
0
0000)
4.3
Predicament of Decoding via SBF algorithm
We have introduced a sequential bit flipping procedure, say SBF algorithm, in previous section. It is a simple unified framework for transforming the hard limiting vector z into a codeword in C. Note that different order may induce different codeword to be produced. For instance, the output codewords in Example 4.2 and 4.3 are different because of different orders although they face the same received word y.
Example 4.3 Consider the same case described in Example 4.2. If we change the or-der from 3 → 2 → 1 → 4 to 4 → 3 → 2 → 1, the output codeword d will become (1, 0, 1, 1, 0, 0, 1, 0).
We observe that output codeword d in Example 4.2 is equal to the transmitted codeword but in Example 4.3 is not. It is because the order used in Example 4.3 meets the situation described in Remark 4.1. Obviously, it is a big problem if we want to decode by SBF algorithm. Therefore, we try to solve this problem by the following two ideas:
1. Find appropriate order to avoid the situation described in Remark 4.1.
2. Correct some error bits in advance such that the number of orders which can decode the correct codeword increases.
Note that the first idea is impractical because the complexity of finding appropriate order grows quickly as M increases. Besides, the hardware implementation is inefficient if the order changes frequently. Consequently, we propose two modified methods for decoding based on the SBF algorithm with a fixed order. The first one is designed for cyclic codes that we apply the SBF algorithm to transform all of the cyclic shifted received word into valid codewords. Note that cyclic shifting the received word is similar to decode in different order even though we don’t change the order actually. The another method is to implement the second idea based on the concept of the randomized sphere decoding with moving center which can correct errors iteratively.
4.4
SBF Algorithm with Cyclic Shifts
Assume a codeword cT belongs to a cyclic code C is transmitted and letL = (L1,· · · , LN)
be the LLR vector of the received word. Define Lν = (L
ν−1, Lν−2,· · · , Lν) as the cyclic
shifted version of L by ν positions. Then we can obtain a set of candidates by the following algorithms:
1. Determine an order o for the SBF algorithm.
2. For all ν ∈ J, apply the SBF algorithm for Lν and o to obtain a set of candidates
D ={d1,· · · , dN} where
(dν, ˆ
Lν) = Ω(o,
Lν).
The transmitted codeword is then estimated by ˆcT = arg min
c∈Dd(Ψ(c), y), (4.20)
where d(a, b) is the Euclidean distance between a and b.
4.5
Stochastic Sequential Bit Flipping Algorithm
Ideally, we can transform z into the transmitted codeword through the SBF algorithm if the appropriate order is found. In fact, such order is hard to find, especially when the LLRs are unreliable. Therefore, we don’t want to decode based on the original LLR vector L at all times but hope to gradually change L such that its hard limiting vector is more and more close to the transmitted codeword. In order to implement this idea, we use the similar method illustrated in Section 3.4 which is an iterative procedure with the following two phases:
1. Generate Ns virtual LLR vectors around the original one according to a specific
2. Update the parameters of the random mechanism based on Es better candidates
in order to generate better virtual LLR vectors in next iteration.
Note that this is the basic idea of our randomized sphere decoding with moving center. Next, we will illustrate the random mechanism further in next two subsections.
4.5.1
Importance Density and Sample Format
Let s = (s1,· · · , sN) be a random vector where s1,· · · , sN are independent Gaussian
ran-dom variables with means µ1,· · · , µN and variances ρ21,· · · , ρ2N. We write s ∼ N (~µ, ~ρ),
where ~µ = (µ1,· · · , µN) and ~ρ = (ρ1,· · · , ρN) are initialized by
µ(0)n = Ln (4.21)
ρ(0)n = 4 N0
(4.22) At the tth iteration, Nsrandom samples s
(t) 1 , s
(t)
2 ,· · · , s (t)
Ns are drawn from N
~µ(t), ~ρ(t)
to form the sample set S(t). Each sample vector represents the LLRs of an associated
virtual received word. We decode them by the SBF algorithm based on an pre-designed order and obtain sets of candidates d(t)` and associated LLR vectors ˆs
(t) ` = (ˆs`,1,· · · , ˆs`,N) for 1≤ ` ≤ Ns.
4.5.2
Update Parameters
Let d(t)1 ,· · · , d (t)Ns be the output codewords of the SBF algorithm. We compute the ED
between each candidate codeword and the received word y and sort the corresponding random vectors according to the descending order of their associated EDs. Define an elite set E(t)which includes E
svectors with the smallest EDs to y, i.e., the corresponding
codewords are more likely to have been transmitted. We always store the best one in E(t) up to the current iteration for the final decision when the maximum number of