Codes and Decodingon General Graphs

(1)

Linköping Studies in Science and Technology Dissertation No. 440

Codes and Decoding on General Graphs

Niclas Wiberg

Department of Electrical Engineering

Linköping University, S-581 83 Linköping, Sweden

Linköping 1996

(2)

(3)

Linköping Studies in Science and Technology Dissertation No. 440

Codes and Decoding on General Graphs

Niclas Wiberg

Department of Electrical Engineering

Linköping University, S-581 83 Linköping, Sweden Linköping 1996

Corrections applied as of errata October 30, 1996

(4)

ISSN 0345-7524

(5)

To Kristin,

Isak, and Benjamin

(6)

(7)

v

Abstract

Iterative decoding techniques have become a viable alternative for constructing high performance coding systems. In particular, the recent success of turbo codes indicates that performance close to the Shannon limit may be achieved. In this thesis, it is showed that many iterative decoding algorithms are special cases of two generic algorithms, the min-sum and sum-product algorithms, which also include non-iterative algorithms such as Viterbi decoding. The min-sum and sum-product algorithms are developed and presented as generalized trellis algorithms, where the time axis of the trellis is replaced by an arbitrary graph, the

“Tanner graph”. With cycle-free Tanner graphs, the resulting decoding algorithms (e.g., Viterbi decoding) are maximum-likelihood but suffer from an exponentially increasing complexity. Iterative decoding occurs when the Tanner graph has cycles (e.g., turbo codes); the resulting algorithms are in general suboptimal, but significant complexity reductions are possible compared to the cycle-free case. Several performance estimates for iterative decoding are developed, including a generalization of the union bound used with Viterbi decoding and a characterization of errors that are uncorrectable after infinitely many decoding iterations.

(8)

(9)

vii

Acknowledgments

The best guarantee to a successful work, undoubtedly, is to work among intelligent and help- ful people. Luckily for me, I had the opportunity to work closely with Andi Löliger and Ralf Kötter for almost four years. Andi, who has been my co-supervisor, has many times pointed me in directions that turned out to be very interesting (sometimes he had to push me to get me started). Furthermore, he was always (well, almost) available for fruitful and engaged discussions regarding my work. It is not an overstatement to say that this thesis would not have been written without him. Thank you, Andi.

The many discussions with Ralf Kötter have also been very valuable. Ralf’s knowledge in algebra, graph theory, and other subjects important to me, combined with his great interest in iterative decoding, has often lead to important insights for both of us.

While a deep penetration of a subject is perhaps the essence of a doctoral thesis, it is equally important to obtain a wider understanding and perspective of a problem area. I am deeply thankful to my professor and supervisor Ingemar Ingemarsson for providing an envi- ronment that always stimulated me to learn more about communication theory in general.

By involving me and other graduate students in the planning of undergraduate courses, and by introducing unconventional teaching methods, we were all forced to review our own knowledge again and again.

All my friends and colleagues at the divisions of information theory, image coding, and data transmission have helped me a lot by providing an inspiring and enjoyable atmosphere.

It is a fact that the best way to learn something is to explain it to other people; many are the times when I, uninvited, have exploited the patience of my colleagues by describing my unsorted thoughts on the whiteboard in the coffee room and elsewhere. Thank you for listen- ing—and please forgive me.

A special thanks goes to my friend Håkan Andersson, with whom I shared the office for some years. Apart from our many discussions, he has helped me a lot by reading my manu- scripts and providing constructive criticism on all aspects of the thesis. Maurice Devenney also did a very good job in reading and correcting my writing.

Needless to say, my parents have been very important for me. Here, I would like to thank my Father, who was the first to bring me into mathematics and science. By having a pro- grammable home computer at the age of 13, my career was practically settled.

I dedicate the thesis to my wife Kristin and my children Isak and Benjamin. Sharing with you the few hours that I did not work on the thesis made the long hours at work bearable.

Thank you for your support, understanding, and love!

Linköping, April 1996 Niclas Wiberg

(10)

(11)

ix

Chapter 1 Introduction

This thesis deals with methods to achieve reliable communication over unreliable channels.

Such methods are used in a vast number of applications which affect many people’s every- day life, for example mobile telephony, telephone modems for data communication, and storage mediums such as compact discs.

A basic scheme for communicating over unreliable channels is illustrated in Figure 1.1.

The message to be sent is encoded with a channel code before it is transmitted on the channel. At the receiving end, the output from the channel is decoded back to a message, hope- fully the same as the original one. A fundamental property of such systems is Shannon’s channel coding theorem, which states that reliable communication can be achieved as long as the information rate does not exceed the “capacity” of the channel, provided that the encoder and decoder are allowed to operate on long enough sequences of data (extensive treatments can be found in many textbooks, e.g. [1]).

We will deal with the decoding problem, i.e. finding a good message estimate given the channel output. The problem can be solved, in principle, by searching through all possible messages and comparing their corresponding codewords with the channel output, selecting the message which is most likely to result in the observed channel output. While such a method can be made optimal in the sense of minimizing the error probability, it is useless in practice because the number of messages is too large to search through them all. The interesting (and difficult!) aspect of the decoding problem is to find methods with reasonable complexity that still give sufficiently good performance.

Decoding methods can be divided, roughly, in two classes: algebraic and “probabilistic”.

Algebraic methods are typically based on very powerful codes, but are only suited to relatively reliable channels. In such cases, however, they can provide virtually error-free communication. By “probabilistic” methods, we mean methods that are designed to use the

Figure 1.1 Shannon’s model for reliable communication on an unreliable channel.

Encoder Channel Decoder

message codeword channel message

output estimate

(14)

channel output as efficiently as possible, approaching the performance of an optimal decoder. Such methods are better suited to highly unreliable channels than algebraic methods are. Unfortunately, probabilistic methods in general work only for relatively weak codes, and consequently, cannot completely avoid decoding errors. To achieve error-free communication on highly unreliable channels, one typically uses a combination of probabilistic and algebraic decoding methods.

We will consider probabilistic decoding methods in this thesis, with the goal of finding decoding methods that both use the channel output efficiently and are still capable of han- dling relatively powerful codes. It is assumed that the reader is familiar with the basics of communication theory.

1.1 Decoding Complexity

Until recently, the most important probabilistic decoding method has been the Viterbi algorithm [2]. A main reason for its success is that it is optimal, or maximum-likelihood, in the sense that it minimizes the probability of decoding error for a given code. The main draw- back, on the other hand, is the computation complexity, which is very high for good codes (the number of operations grows exponentially with the minimum distance of the code, cf. [3]).

The reason behind this high complexity is to be found in the trellis code description, illustrated in Figure 1.2 for a small code, on which the Viterbi algorithm is based. In a trellis, the structure of the code is expressed by viewing the code as a dynamic system, introducing a time axis on which the codeword components are laid out. (Of course, having such a time axis is practical too, since the components must be transmitted in some order anyway. With a memoryless channel, however, the transmission order is irrelevant from a theoretical per- spective; the point here is that the decoding algorithm assumes some time axis on which it operates.) With this dynamic-system view, all dependence between the past and the future (with respect to some instant on the time axis) is expressed in the present state. The problem is that good codes by necessity must have a high dependence between the codeword components, implying that the state space of the trellis must be excessively large (cf. [3]), which leads to a high decoding complexity.

0

1 1

0 1

0 0

1

0 0

0

1 1

0 1

1 1

0 0 0

Figure 1.2 A minimal trellis for a binary linear (6, 3, 3) code. The codewords are obtained by following paths from left to right and reading off the labels encountered.

(15)

1.2 Iterative Decoding Based on Tanner Graphs 3

On the other hand, it has been known for a long time that tailbiting trellises can in some cases be much smaller than any ordinary trellis for the same code [4]. In a tailbiting trellis there are multiple starting states and equally many ending states (an ordinary trellis has single starting and ending states), and a path is required to start and end in the same state.

Figure 1.3 illustrates a two-state tailbiting trellis for the same code as in Figure 1.2. In this case, the maximal number of states is reduced from four to two.

Loosely speaking, tailbiting trellises are based on a circular time axis. This is also the reason for the lower complexity: since the dependence between any two halves of the code- words may be expressed in two states, one in each direction on the time axis, the size of each of these state spaces may be smaller than if there were only one state space.

1.2 Iterative Decoding Based on Tanner Graphs

One of the major goals behind this thesis has been to search for code descriptions, or “realizations”, with lower complexity than trellises. The other major goal was to investigate how such realizations could be exploited by decoding algorithms to achieve low decoding complexity. In particular, a promising direction seemed to be “iterative” decoding, i.e. decoding algorithms that operate on some internal state which is altered in small steps until a valid codeword is reached.

The main result is a framework for code realizations based on “Tanner graphs”, which have the role of generalized time axes, and two generic iterative decoding algorithms that apply to any such realization. While our framework was developed as a combination and generalization of trellis coding and Gallager’s low-density parity-check codes [5], the basic ideas were all present in Tanner’s work “A recursive approach to low complexity codes” [6], including the two mentioned algorithms. Our main contributions to the framework is to explicitly include trellis-type realizations and to allow more general “metrics”, or “cost functions”; the latter makes it possible to model, e.g., non-uniform a priori probability distributions, or channels with memory.

While this framework appeared interesting in itself, additional motivation for our research arose from an unexpected direction: turbo codes.

0 0

1 1

0 1

0 0

1 1

0 1

0 0

1 1

0 1

Figure 1.3 A tailbiting trellis for a binary linear (6, 3, 3) code. The marked path cor- responds to the codeword 011011.

(16)

1.3 Turbo Codes

Undoubtedly, the invention of turbo codes [7] is a milestone in the development of communication theory. Compared to other coding systems, the improvement in performance obtained with turbo coding is so big that, for many applications, the gap between practical systems and Shannon’s theoretical limit is essentially closed, a situation which was probably not predicted by anyone before the invention of turbo codes.

On the negative side, the turbo code construction is largely based on heuristics, in the sense that no theoretical analysis exists as of yet that can predict their amazing performance.

More precisely, it is the decoding algorithm that remains to be analyzed; a relatively successful analysis of the theoretical code performance is given in [8].

As it turned out, turbo codes and their decoding algorithm fit directly into our general framework for codes and decoding based on graphs. This relation provided us with an additional research goal: to understand the turbo codes and their decoding performance, using our framework. Consequently, a lot of the material in this thesis is highly related to turbo codes and their decoding algorithms, and we have tried to make this connection explicit in many places.

1.4 Thesis Outline

The following two chapters present the graph-based framework. Chapter 2 provides a formal definition of code realizations based on graphs. While the basic ideas are due to Tanner [6], our definitions are more general and based on a different terminology. Chapter 3 presents the two decoding algorithms, the “min-sum” and the “sum-product” algorithms. We give a general formulation of these two algorithms, which are extended versions of Tanner’s algorithms A and B, and we show their optimality when applied to realizations with cycle-free Tanner graphs (such as trellises). In both of these chapters we demonstrate explicitly how trellises and turbo codes fit into the framework. Another important example that appears throughout the thesis is Gallager’s low-density parity-check codes [5].

The material in Chapter 2 and Chapter 3 is relatively mature and has been presented earlier, in [9], along with some parts of Chapter 6 and Chapter 7.

Chapters 4 through 6 are devoted to iterative decoding, i.e. the application of the min- sum and sum-product algorithms to realizations with cycles. In Chapter 4 we develop some fundamental results for performance analysis of iterative decoding; these are used in the following two chapters. Chapter 5 is focused on the decoding performance after the first few decoding iterations, before the cycles have affected the computation. In Chapter 6, we consider the performance obtained after this point, when the cycles do affect the computation.

For a limited class of realizations, we analyze the asymptotic performance after infinitely many decoding iterations. The applicability of this result to turbo codes is also discussed.

In Chapter 7, we return to code realizations and complexity issues. A precise interpretation of the above complexity reasoning, regarding trellises and tailbiting trellises, will be given. We will also give a few more examples of code realizations, both with and without cycles. Some of these examples are closely related to turbo codes; in fact, they may point out

(17)

1.4 Thesis Outline 5 a convenient way of constructing good turbo-like realizations with moderate block lengths.

In addition, we will show how the framework can incorporate more complicated situations involving, e.g., channels with memory.

Chapter 8, finally, contains our conclusions.

(18)

6

Chapter 2 Code Realizations Based on Graphs

The central theme of this thesis is to describe codes by means of “equation systems”, whose structure are the basis for decoding algorithms. By structure we mean the relation between the variables and the equations. More precisely, the equation system defines a bipartite graph with vertices both for the variables and for the equations; an edge indicates that a par- ticular variable is present in a particular equation.

Example 2.1 Figure 2.1 illustrates a linear equation system of six variables as well as its structure in the form of the mentioned bipartite graph. Assuming binary variables, this particular equation system defines a binary linear (6, 3, 3) code, with the equations correspond-

ing to the rows of a parity-check matrix. ✼

While Example 2.1 and the term “equation system” conveys some of our ideas, the definitions that we will soon give include a lot more than just linear equations. For maximal gener- ality, we allow an “equation” on a set of variables to be any subset of the possible value combinations; the “equation system” is then the intersection of these subsets. We hope that the reader will not be repelled by this rather abstract viewpoint. As an aid, we provide several examples that are familiar to the reader to show how our definitions unify many different code descriptions and decoding algorithms. In Chapter 7, we provide some examples of new code realizations too.

x₁ x₂

x₄ x₃

x₆

x₅

Figure 2.1 An equation system and the corresponding bipartite graph. Each filled dot together with its neighbors corresponds to an equation with its variables.

x₁+x₂+x₃ = 0 x₃+x₄+x₅ = 0 x₅+x₆+x₁ = 0 x₂+x₄+x₆ = 0









(19)

2.1 Systems, Check Structures and Tanner Graphs 7

2.1 Systems, Check Structures and Tanner Graphs

A configuration space is a direct product , where is a collection of alphabets (or state spaces). We will usually assume that the index set N, as well as all alpha- bets , , are finite. (Neither of these assumptions is essential, though.) Elements of W will be called configurations. Elements of N will be referred to as sites.

In many of our examples (e.g. 2.1), the index set N corresponds directly to the codeword components, i.e. , and the site alphabets are the binary field, i.e.

for all . This means that the configuration space is the familiar space of binary n- tuples. However, more general configuration spaces are also useful, e.g., for trellis-based constructions, where some of the sites do not correspond to codeword components but rather to “cuts” of the time axis; the corresponding site alphabets are then state spaces of the trellis.

The components of a configuration will be denoted by , . More gener- ally, the restriction (projection) of to a subset of sites will be denoted by . For a set of configurations and a site subset we will use the notation

.

Definition 2.1 A system is a triple , where N is a set of sites, W is a configuration space, and is the behavior. (This “behavioral” notion of a system is due to Willems [10], cf. also [11].) The members of B will be called valid configurations. A system is linear if all alphabets are vector spaces (or scalars) over the same field, the configuration space W is the direct product of the alphabets, and the behavior B is a subspace of W.

Definition 2.2 A check structure for a system is a collection Q of subsets of N (check sets) such that any configuration satisfying for all check sets is valid (i.e., in B). The restriction of the behavior to a check set E is called the local behavior at E. A configuration x is locally valid on E if . Note that a configuration is valid if and only if it is locally valid on all check sets.

The bipartite graph corresponding to a check structure Q for a system is called a Tanner graph [6] for that system. Tanner graphs will be visualized as in Figure 2.1, with sites represented by circles and check sets by filled dots which are connected to those sites (circles) that they check.

The definitions 2.1 and 2.2 are “axiomatic” in the sense that they specify required prop- erties for Q to be a check structure. Actual constructions are usually built in the opposite way, by specifying a check structure Q and the corresponding local behaviors (“checks”), so that a desired behavior B is obtained. This was illustrated in Example 2.1, which can be seen as a system with and W the six-dimensional binary vector space. The check structure Q = {{1, 2, 3}, {3, 4, 5}, {5, 6, 1}, {2, 4, 6}} and

the local behaviors (for all check sets ) together define

the behavior B = {000000, 110001, 011100, 000111, 101101, 110110, 011011, 101010}.

W =

Π

^s∈^NA_s { }A_s s∈N

A_s s∈N

N = {1, ,… n} A_s = F₂

s∈N F₂ⁿ

A_s

x∈W x_s s∈N

x∈W R⊆N x_R

X⊆W R⊆N

X_R ≡∧ {x_R:x∈X}

N W B, ,

( )

B⊆W

A_s

N W B, ,

( )

x∈W

x_E∈B_E E∈Q B_E

x_E∈B_E

N W B, ,

( )

B_E N W B, ,

( ) N = {1, ,… 6}

B_E = {000 110 101 011, , , } E∈Q

(20)

Any binary block code C of length n may be viewed as a system , where , , and is the set of codewords. For linear codes, a par- ity check matrix H (i.e., a matrix H such that if and only if ) defines a check structure with one check set for each row of H, containing those sites that have a “one” in that row. The corresponding local behaviors are simple parity checks. Of special interest is the case when the check sets have a small, fixed size k, and the sites are contained in a small, fixed number j of check sets. Such systems, which were introduced by Gallager in [5], are referred to as low-density parity-check codes. When , i.e., when sites belong to exactly two check sets, the codes are referred to as cycle codes [12, pp. 136-138], since these codes are generated by codewords whose support corresponds to cycles in the Tanner graph.

(Example 2.1 is a cycle code.)

So far, we have only considered systems where all sites correspond to components of the codewords. However, it is often useful to allow hidden sites, which do not correspond to codeword components but only serve to give a suitable check structure. The most familiar example of such descriptions is the trellis, as illustrated by the following example.

Example 2.2 Figure 2.2 illustrates the minimal trellis for the same binary linear (6,3,3) block code as in Example 2.1. The trellis is a system (N, W, B) with two types of sites: visi- ble sites (corresponding to codeword components) and hidden sites (corresponding to the

“cuts” between the trellis sections). Hidden sites are illustrated by double circles. The visible site alphabets are all binary, but the hidden site alphabets (the state spaces) contain one, two or four states. A configuration is an assignment of states and output symbols, one from each site alphabet. Such a configuration is valid (i.e. a path) if and only if each local configuration of left state, output symbol, and right state is valid, i.e., a branch. ✼

N W B, ,

( )

N = {1 2, , ,… n} W = F₂ⁿ B = C

H x^T = 0 x∈C

j k,

( ) j = 2

0 1

0 0

0

0 0 0 0

0

1 1 1

1 1

1

1 1

1

Figure 2.2 A trellis (top) for a (6,3,3) code, and the corresponding Tanner graph. The values in the sites form a valid configuration, i.e., a path, which can be seen by check- ing locally in each trellis section.

1 0

10

01 11

0 00

00 00

0 0

10 10 1

01 01

11 11

0 0 1

1 0

1 1

1

0 01

11 11

1 0

(21)

2.1 Systems, Check Structures and Tanner Graphs 9

(As mentioned in the example, hidden sites will be depicted by double circles.) In general, if (N, W, B) is a system with hidden sites, and is the set of visible sites, then a codeword of the system is the restriction of a valid configuration to V. The visible behavior or output code of the system is . We consider a system (N, W, B) with a check structure Q to be a description or realization of the corresponding output code .

The motivation for introducing hidden sites, and indeed for studying systems with check structures at all, is to find code descriptions that are suitable for decoding. There are many different realizations for any given code, and the decoding algorithms that we will consider in Chapter 3 can be applied, in principle, to all of them. However, both the decoding complexity and the performance will differ between the realizations.

One important property of a realization is its structural complexity. In the decoding algorithms that we will consider, all members of the site alphabets and of the local behaviors are considered explicitly during the decoding process, in some cases even several times.

For this reason, the site alphabets and the local behaviors should not be too large; in particular, trivial check structures such as the one with a single check set , and the one with a single hidden site whose alphabet has a distinct value for each valid configuration, are unsuitable for decoding (see Figure 2.3).

Another important property lies in the structure of the Tanner graph. As we will see in Chapter 3, the decoding algorithms are optimal when applied to realizations with cycle-free Tanner graphs. For realizations with cycles in the Tanner graphs, very little is actually known about the decoding performance; however, most indications are that it is beneficial to avoid short cycles. (This will be discussed later.)

So why do we consider realizations with cycles at all, when the decoding algorithms are optimal for cycle-free realization? The advantage of introducing cycles is that the structural complexity may be much smaller in such realizations, allowing for a smaller decoding com- plexity. What happens, roughly, is that a single trellis state space of size m is split into a number of hidden sites of size such that . This will be discussed in Chapter 7, where we will continue the discussion of graph-based realizations. Here we just give one further example of a realization with many cycles in the Tanner graph, an example that deserves special attention.

V⊆N

x_V x∈B

B_V ≡∧ {x_V:x∈B}

B_V

A_s B_E

Q = { }N Figure 2.3 Two trivial realizations that apply to any code (of length six).

m_s

∏

_sm_s^≈^m

(22)

2.2 Turbo Codes

The turbo codes of Berrou et al. [7] are famous for their amazing performance, which beats anything else that has been presented so far. Unfortunately, the performance has only been demonstrated by simulations, and not by theoretical results. In particular, the decoding algorithm proposed in [7] remains to be analyzed (the theoretical code performance was analyzed to some extent in [8]).

The conventional way of presenting turbo codes is to describe the encoder. As illustrated in Figure 2.4, the information sequence is encoded twice by the same recursive systematic convolutional encoder, in one case with the original symbol ordering and in the other case after a random interleaving of the information symbols. (At the output, the redundancy sequences are often punctured in order to achieve the overall rate ¹/₂.) The convolutional encoders are usually rather simple; the typical encoder memory is 4 (i.e., there are 16 states in the trellis).

The corresponding Tanner graph (Figure 2.5) makes the structure somewhat more appar- ent, consisting of two trellises that share certain output symbols via an interleaver (i.e., the order of the common symbols in one trellis is a permutation of the order in the other trellis).

It is well known (cf. [8]) that the amazing performance of turbo codes is primarily due to the interleaver, i.e., due to the cycle structure of the Tanner graph.

Information Sequence

Redundancy Sequence 1

Redundancy Sequence 2 Convolutional

Randomly Chosen

Information Sequence

Figure 2.4 The turbo codes as presented by Berrou et al [7].

Encoder

Recursive Convolutional Recursive

Interleaver Encoder

(23)

2.2 Turbo Codes 11

Figure 2.5 The Tanner graph of the turbo codes.

Interleaver Information

Sequence Redundancy Sequence 1

Redundancy Sequence 2

(24)

12

Chapter 3 The Min-Sum and Sum-Product Algorithms

We will describe two generic decoding algorithms for code realizations based on Tanner graphs, as described in the previous chapter. The structure of the algorithms matches the graphs directly. It will be convenient to think of these algorithms as parallel processing algorithms, where each site and each check is assigned its own processor and the communication between them reflects the Tanner graph. (In fact, this “distributed” viewpoint was one of the motivations for developing the framework. However, in many cases a sequential implementation is actually more natural.)

The algorithms come in two versions: the min-sum algorithm and the sum-product algorithm. The ideas behind them are not essentially new; rather, the algorithms are generaliza- tions of well-known algorithms such as the Viterbi algorithm [2] and other trellis-based algorithms. Another important special case is Gallager’s algorithm for decoding low-density parity-check codes [5]. A relatively general formulation of the algorithms was also given by Tanner [6] (the relation between Tanner’s work and ours is discussed in Section 1.2 of the introduction).

There are other generic decoding algorithms that apply to our general framework for code descriptions. In a previous work [13] we discussed the application of Gibbs sampling, or simulated annealing for decoding graph-based codes.

The overall structure of the algorithms, and the context in which they apply, is illustrated in Figure 3.1. As shown, the algorithms do not make decisions, instead they compute a set of final cost functions upon which a final decision can be made. The channel output enters the algorithms as a set of local cost functions, and the goal of the algorithms is to concentrate, for each site, all information from the channel output that is relevant to that site.

Formally, there is one local cost function for each site , denoted by

(where denotes the real numbers), and one for each check set , denoted by . Similarly, there is one final cost function for each site , denoted by , and one for each check set, denoted by . (In our applications, s∈N γs:A_s→R

R E∈Q

γE:W_E→R s∈N

µs:A_s→R µE:W_E→R

(25)

3.1 The Min-Sum Algorithm 13

the check cost functions and are often not used; they are most interesting when the codewords are selected according to some non-uniform probability distribution, or, as discussed in Section 7.4, when dealing with channels with memory.)

During the computation, the algorithms maintain a set of intermediate cost functions: for each pair of adjacent site and check set (i.e., ), there is one check-to-site cost

function and one site-to-check cost function . These cost

functions are best thought of as having a direction on the Tanner graph. For instance, we will often call the “contribution” from the check set E to the site s. (In the cycle-free case, this will be given a precise interpretation.) See Figure 3.2.

3.1 The Min-Sum Algorithm

The min-sum algorithm is a straightforward generalization of the Viterbi algorithm [2]. (The resulting algorithm is essentially Tanner’s Algorithm B [6]; Tanner did not, however, observe the connection to Viterbi decoding.) Hagenauer’s low-complexity turbo decoder [14] fits directly into this framework. A well-known decoding algorithm for generalized con- catenated codes [15] is also related, as is threshold decoding [16]. Before going into the general description, we encourage the reader to go through the example in Figure 3.3 on pages 14–15, where the decoding of a (7,4,2) binary linear code using the min-sum algorithm is performed in detail.

Min-sum or

Channel γs

x∈B

noise

µE s, µs E, µs

codeword

local cost functions

final cost functions

Decision xˆ estimate sum-product

algorithm

intermediate cost functions

Figure 3.1 Typical decoding application of the min-sum or sum-product algorithm.

The channel output takes the form of local cost functions (“channel metrics”) which are used by the min-sum or sum-product algorithm to compute final cost func- tions , upon which the final decisions are based. During the computation process, the algorithms maintain a set of intermediate cost functions.

γs

µs

γE µE

s E,

( ) s∈E

µE s, :A_s→R µs E, :A_s→R

µE s,

s µs E, E

µE s,

Figure 3.2 The intermediate cost functionsµs E, :A_s→R andµE s, :A_s→R.

(26)

[2,4]

[1,5]

[4,1]

[2,4]

[1,5]

[min(1+2,5+4),min(1+4,5+2)]

= [3,5]

c)

The check-to-site cost function from the upper check to the middle site. For each possible value in the middle site, the check finds the smallest possible cost contribution from the two topmost sites. E.g., for a “0” in the middle sites, the pat- terns “00” and “11” are examined.

[2,4]

[1,5]

[3,4] [2,6]

[3,4]

[4,1]

[5,2]

b)

The channel output after transmitting a random codeword. The numbers in the sites are the local costs (log-likelihoods) for assigning “0” or “1”

to that site. The decoding problem is to find a codeword with the smallest global cost, defined as the sum of the local costs in the sites.

1 0

0 0

1 1

1 a)

The Tanner graph and a codeword. The circles (sites) correspond to the codeword components and the small dots (checks) to the parity-check equations, i.e., the three sites connected to any check are required to have even parity.

[4,1]

[6,5]

[3,5]

[5,6]

[4+3+6+5,1+5+5+6]

= [18,17]

decode to “1”

d)

Final decision of the middle site. (The two lower checks have computed their contributions to the middle site in the same way as was done in c.) The global cost of “0” and “1” in the middle site is then just the sum of the local costs and the three incoming cost contributions.

Figure 3.3 The min-sum algorithm applied to a binary linear (7,4,2) code, whose Tanner graph is shown in a). The decoding problem is to find the codeword with the smallest global cost (or “metric”), defined as the sum over the codeword components of the corresponding local costs, which are indi- cated in b). The local costs are typically channel log-likelihoods (such as Hamming distance or squared Euclidean distance to the received values).

(27)

[2,4]

[1,5]

[3,4] [2,6]

[3,4]

[4,1]

[5,2]

[18,17]

[20,17] [18,17]

[17,20]

[17,18]

[17,19] [18,17]

h)

The rest of the Tanner graph is processed in the same way, resulting in final costs as shown. The resulting optimal codeword turns out to be the one shown in a).

[1,5]

[16,14]

[1+16,5+14]

= [17,19]

decode to “0”

g)

The final cost function of the upper-left site is the sum of its local costs and the cost contribu- tions from its only check. The resulting costs are the smallest possible global costs that result from assigning a “0” or “1” to the upper-left site.

[2,4]

[1,5]

[4,1]

[2,4]

[15,12]

[min(15+2,12+4),min(15+4,12+2)]

= [16,14 ] f)

f)

The top-left site receives the smallest cost con- tributions from the rest of the graph that results from assigning “0” or“1”to that site.

[5,6]

[6,5]

[4,1]

[4+6+5,1+5+6]

= [15,12]

e)

The site-to-check costs from the middle site to the upper check is the smallest possible cost in the lower five sites that results from assigning

“0” or “1” to the middle site; it is computed by adding the middle site’s local costs to the sum of the contributions from the two lower checks.

Figure 3.3 (continued) Boxes c) –g) illustrate the computation of the intermediate and final cost func- tions for a few of the sites. In h), the final cost functions of all sites are shown.

(28)

As discussed in the example (Figure 3.3), the goal of the min-sum algorithm is to find a valid configuration such that the sum of the local costs (over all sites and check sets) is as small as possible. When using the min-sum algorithm in a channel-decoding situation with a memoryless channel and a received vector y, the local check costs are typically omitted (set to zero) and the local site costs are the usual channel log-likelihoods (for visible sites; for hidden sites they are set to zero). On the binary symmet- ric channel, for example, the local site cost would be the Hamming distance between

and the received symbol .

The algorithm consists of the following three steps:

• Initialization. The local cost functions and are initialized as appropriate (using, e.g., channel information). The intermediate cost functions and

are set to zero.

• Iteration. The intermediate cost functions and are alternatingly updated a suitable number of times as follows (cf. also Figure 3.4). The site- to-check cost is computed as the sum of the site’s local cost and all contributions coming into s except the one from E:

. (3.1)

The check-to-site cost is obtained by examining all locally valid con- figurations on E that match a on the site s, for each summing the check’s local cost and all contributions coming into E except the one from s. The minimum over these sums is taken as the cost :

. (3.2)

x∈B

γE(x_E) γs( )x_s

p y( _s x_s) log –

γs( )x_s

x_s y_s

γs γE

µE s,

µs E,

µs E, µE s,

µs E, ( )a

Figure 3.4 The updating rules for the min-sum algorithm.

s E

µs E,

E₁

E₂ µE₁,s

µE₂,s

µs E, ( )a := γs( )a +µE₁,s( )a +µE₂,s( )a

E µE s,

µs₁,E

µs₂,E

s

s₁

s₂

µE s, ( )a γE(x_E) µs₁,E x_s

( 1) µs₂,E x_s ( 2)

+ +

[ ]

x_E∈B_E x_s=a

min :=

µs E, ( )a γs( )a µE′,s( )a

E′∈Q: s∈E′,E′≠E

∑

+ :=

µE s, ( )a

µE s, ( )a γE(x_E) µs′,E(x_s_′)

s′∈E

∑

:s′≠s

+

x_E∈B_E:x_s=a

min :=

(29)

• Termination. The final cost functions and are computed as follows.

The final site cost is computed as the sum of the site’s local cost and all contributions coming into s, i.e.,

, (3.3)

and the final check cost for a local configuration is computed as the sum of the check’s local cost and all contributions coming into E, i.e.,

. (3.4)

As we mentioned above, the goal of the min-sum algorithm is to find a configuration with the smallest possible cost sum. To formulate this precisely, we define the global cost of a valid configuration as

. (3.5)

In a typical channel-decoding situation where the local check costs are set to zero and the local site costs are , the global cost becomes the log- likelihood for the codeword x; then maximum-likelihood decoding corresponds to finding a valid configuration that minimizes . As we will see, the min-sum algorithm does this minimization when the check structure is cycle-free. In addition, it is also possible to assign nonzero values to the check costs in order to include, e.g., an a priori distribution over the codewords: if we define the local check costs such that

, then the global cost will be

, and minimizing will be equivalent to maximizing the a posteriori codeword probability . (See the next section for more about a priori probabilities.) The following theorem is the fundamental theoretical property of the min-sum algorithm:

Theorem 3.1 If the check structure is finite and cycle-free, then the cost functions converge after finitely many iterations, and the final cost functions become

(3.6)

and

. (3.7)

µs µE

µs( )a

µs( )a γs( )a µE′,s( )a

E′∈Q

∑

:s∈E′

+ :=

µE( )a a∈B_E

µE( )a γE( )a µs′,E(a_s_′)

s′

∑

∈E

+ :=

x∈B

G x( ) γE(x_E)

E

∑

∈Q ^γ^s^{( )}^x^s s

∑

∈N

≡∧ +

γE(x_E) γs( )x_s ≡∧ –logp y( _s x_s) G x( ) p y x( )

log –

x∈B G x( )

γE(x_E)

p x( ) γE

p x( ) log

– ⁼

∑

_E_∈_QγE(x_E) ^{G x}^{( )} ^≡∧ ^–^log^{p x}^{( )}^–^log^{p y x}⁽ ⁾ ⁼ p x y( )

log

– –logp y( ) G x( )

p x y( )

µs( )a G x( )

x∈B:x_s=a

= min

µE( )a G x( )

x∈B:x_E=a

min

=

(30)

(The proof is given in Appendix A.1.) For Tanner graphs that contain cycles, there is no general result for the final cost functions, or for the decoding performance. This issue is discussed in the following chapters.

As we mentioned earlier, the min-sum algorithm only computes the final costs; no decision is made. Usually, we also want to find a configuration that minimizes the global cost.

Such a configuration is obtained by taking, for each site s, a value that minimizes the final cost . It may then happen that for some sites several values minimize the final cost; then it may be a nontrivial problem to find a valid configuration that minimizes at all sites. For a cycle-free check structure, however, there is a straightforward procedure for solving this problem: start in a leaf site s (one that belongs to a single check set) and choose an optimal value for ; then extend the configuration successively to neighboring sites, always choosing site values that are both valid and minimize the final cost.

In a practical implementation it is important to handle numerical issues properly. Typi- cally, the cost functions and grow out of range quickly. To overcome this, an arbitrary normalization term may be added to the updating formulas without affecting the finally chosen configuration. Since the algorithm only involves addition and minimization (i.e., no multiplication), fixed precision arithmetic can be used without losing information (the only place where precision is lost is in the initial quantization of the local costs).

3.2 The Sum-Product Algorithm

The sum-product algorithm is a straightforward generalization of the forward-backward algorithm of Bahl et al [17] for the computation of per-symbol a posteriori probabilities in a trellis. Two other special cases of the sum-product algorithm are the classic turbo decoder by Berrou et al. [7] and Gallager’s decoding algorithms for low-density parity-check codes [5].

The general case was outlined by Tanner [6], who did not, however, consider a priori probabilities.

In the sum-product algorithm, the local cost functions and have a multiplicative interpretation: we define the global “cost” for any configuration as the product

. (3.8)

The term “cost” is somewhat misleading in the sum-product algorithm, since it will usually be subject to maximization (rather than minimization as in the min-sum algorithm); we have chosen this term to make the close relation between the two algorithms transparent. The algorithm does not maximize G directly; it merely computes certain “projections” of G, which in turn are natural candidates for maximization.

When discussing the sum-product algorithm, it is natural not to consider the behavior B explicitly, but to instead require that the check costs are zero for local configurations that are non-valid and positive otherwise. Hence, we require

with equality if and only if . (3.9)

x_s∈A_s µs( )x_s

µs

x_s

µE s, µs E,

γs γE

x∈W G x( ) γE(x_E)

E

∏

∈Q ^γ^s^{( )}^x^s s

∏

∈N

≡∧

γE(x_E)

γE(x_E)≥0 x_E∉B_E

(31)

3.2 The Sum-Product Algorithm 19

If we also require the local site costs to be strictly positive for all , then it is easy to see that the global cost (3.8) of a configuration x is strictly positive if x is valid, and zero oth- erwise. In particular, a configuration that maximizes G is always valid (provided that B is nonempty).

In the typical channel-decoding situation, with a memoryless channel and a received vec- tor y, the local site costs are set to the channel likelihoods (for visible sites;

for hidden sites is set to one), and the local check costs are chosen according to the a pri- ori distribution for the transmitted configuration x, which must be of the form . This form includes Markov random fields [18], Markov chains, and, in particular, the uniform distribution over any set of valid configurations. (The latter is achieved by taking as the indicator function for , with an appropriate scaling factor.) With this setup, we get , i.e., is proportional to the a pos- teriori probability of x. We will see later that if the check structure is cycle-free, the algo- rithm computes the a posteriori probability for individual site (symbol) values

, which can be used to decode for minimal symbol error probability.

The sum-product algorithm consists of the following three steps:

• Initialization. The local cost functions and are initialized as appropriate (using, e.g., channel information and/or some known a priori distribution).

The intermediate cost functions and are set to one.

• Iteration. The intermediate cost functions and are updated a suitable number of times as follows (cf. also Figure 3.5). The site-to-check cost is computed as the product of the site’s local cost and all contributions coming into s except the one from E:

. (3.10)

γs( )x_s x_s

γs( )x_s p y( _s x_s)

γs

p x( ) ⁼

∏

_E_∈_QγE(x_E)

γE B_E

G x( ) = p x( ) p y x( )∝p x y( ) G x( )

p x( _s y) = G x( )′

x′∈B:x′_s=x_s

∑

γs γE

µE s, µs E,

µs E, ( )a

Figure 3.5 The updating rules for the sum-product algorithm.

s E

µs E,

E₁

E₂ µE₁,s

µE₂,s

E µE s,

µs₁,E

µs₂,E

s

s₁

s₂

µs E, ( )a := γs( )a µE₁,s( )a µE₂,s( )a µE s, ( )a γE(x_E)µs₁,E x_s

( 1)µs₂,E x_s ( 2)

x_E∈W_E x_s=a

∑

:=

µs E, ( )a γs( )a µE′,s( )a

E′∈Q: s∈E′,E′≠E

∏

:=

(32)

The check-to-site cost is obtained by summing over all local configu- rations on E that match a on the site s, each term being the product of the check’s local cost and all contributions coming into E except the one from s:

. (3.11)

Note that the sum in (3.11) actually only runs over (the locally valid configurations) since is assumed to be zero for .

• Termination. The final cost functions and are computed as follows.

The final site cost is computed as the product of the site’s local cost and all contributions coming into s, i.e.,

, (3.12)

and the final check cost is computed as the product of the check’s local cost and all contributions coming into E, i.e.,

. (3.13)

The fundamental theoretical result for the sum-product algorithm is the following:

Theorem 3.2 If the check structure is finite and cycle-free, then the cost functions converge after finitely many iterations and the final cost functions become

(3.14)

and

. (3.15)

(The proof is essentially identical with that of Theorem 3.1, which is given in Appendix A.1.) An important special case of Theorem 3.2 is when the cost functions correspond to probability distributions:

Corollary 3.3 If the global cost function is (proportional to) some probability distribution over the configuration space, then the final cost functions are (proportional to) the corresponding marginal distributions for the site

µE s, ( )a

µE s, ( )a γE(x_E) µs′,E(x_s_′)

s′∈E

∏

:s′≠s x_E∈W

∑

_E:x_s=a

:=

B_E

γE(x_E) x_E∉B_E

µs µE

µs( )a

µs( )a γs( )a µ_E′,s( )a

E′∈Q

∏

:s∈E′

:=

µE( )a

µE( )a γE( )a µ_s′,E(a_s′)

s′

∏

∈E

:=

µs( )a G x( )

x∈B

∑

:x_s=a

=

µE( )a G x( )

x∈B

∑

:x_E=a

=

G x( )

Codes and Decodingon General Graphs

Linköping Studies in Science and Technology Dissertation No. 440

Codes and Decoding on General Graphs

Niclas Wiberg

Department of Electrical Engineering

Linköping University, S-581 83 Linköping, Sweden

Linköping 1996

Linköping Studies in Science and Technology Dissertation No. 440

Codes and Decoding on General Graphs

Niclas Wiberg

Department of Electrical Engineering

Linköping University, S-581 83 Linköping, Sweden Linköping 1996

Corrections applied as of errata October 30, 1996

To Kristin,

Isak, and Benjamin

Abstract

Acknowledgments

Contents

Chapter 1 Introduction

1.1 Decoding Complexity

1.2 Iterative Decoding Based on Tanner Graphs

1.3 Turbo Codes

1.4 Thesis Outline

Chapter 2

Code Realizations Based on Graphs

2.1 Systems, Check Structures and Tanner Graphs

Π

∏

2.2 Turbo Codes

Chapter 3

The Min-Sum and Sum-Product Algorithms

3.1 The Min-Sum Algorithm

∑

∑

∑

∑

∑

∑

∑

3.2 The Sum-Product Algorithm

∏

∏

∏

∑

∑

∏

∏

∑

∏

∏

∑

∑