### Linköping Studies in Science and Technology Dissertation No. 440

**Codes and Decoding** **on General Graphs**

### Niclas Wiberg

### Department of Electrical Engineering

### Linköping University, S-581 83 Linköping, Sweden

### Linköping 1996

### Linköping Studies in Science and Technology Dissertation No. 440

**Codes and Decoding** **on General Graphs**

### Niclas Wiberg

### Department of Electrical Engineering

### Linköping University, S-581 83 Linköping, Sweden Linköping 1996

*Corrections applied as of errata October 30, 1996*

ISSN 0345-7524

*To Kristin,*

*Isak, and Benjamin*

v

## Abstract

Iterative decoding techniques have become a viable alternative for constructing high perfor- mance coding systems. In particular, the recent success of turbo codes indicates that perfor- mance close to the Shannon limit may be achieved. In this thesis, it is showed that many iterative decoding algorithms are special cases of two generic algorithms, the min-sum and sum-product algorithms, which also include non-iterative algorithms such as Viterbi decod- ing. The min-sum and sum-product algorithms are developed and presented as generalized trellis algorithms, where the time axis of the trellis is replaced by an arbitrary graph, the

“Tanner graph”. With cycle-free Tanner graphs, the resulting decoding algorithms (e.g., Viterbi decoding) are maximum-likelihood but suffer from an exponentially increasing com- plexity. Iterative decoding occurs when the Tanner graph has cycles (e.g., turbo codes); the resulting algorithms are in general suboptimal, but significant complexity reductions are possible compared to the cycle-free case. Several performance estimates for iterative decod- ing are developed, including a generalization of the union bound used with Viterbi decoding and a characterization of errors that are uncorrectable after infinitely many decoding itera- tions.

vii

## Acknowledgments

The best guarantee to a successful work, undoubtedly, is to work among intelligent and help- ful people. Luckily for me, I had the opportunity to work closely with Andi Löliger and Ralf Kötter for almost four years. Andi, who has been my co-supervisor, has many times pointed me in directions that turned out to be very interesting (sometimes he had to push me to get me started). Furthermore, he was always (well, almost) available for fruitful and engaged discussions regarding my work. It is not an overstatement to say that this thesis would not have been written without him. Thank you, Andi.

The many discussions with Ralf Kötter have also been very valuable. Ralf’s knowledge in algebra, graph theory, and other subjects important to me, combined with his great interest in iterative decoding, has often lead to important insights for both of us.

While a deep penetration of a subject is perhaps the essence of a doctoral thesis, it is equally important to obtain a wider understanding and perspective of a problem area. I am deeply thankful to my professor and supervisor Ingemar Ingemarsson for providing an envi- ronment that always stimulated me to learn more about communication theory in general.

By involving me and other graduate students in the planning of undergraduate courses, and by introducing unconventional teaching methods, we were all forced to review our own knowledge again and again.

All my friends and colleagues at the divisions of information theory, image coding, and data transmission have helped me a lot by providing an inspiring and enjoyable atmosphere.

It is a fact that the best way to learn something is to explain it to other people; many are the times when I, uninvited, have exploited the patience of my colleagues by describing my unsorted thoughts on the whiteboard in the coffee room and elsewhere. Thank you for listen- ing—and please forgive me.

A special thanks goes to my friend Håkan Andersson, with whom I shared the office for some years. Apart from our many discussions, he has helped me a lot by reading my manu- scripts and providing constructive criticism on all aspects of the thesis. Maurice Devenney also did a very good job in reading and correcting my writing.

Needless to say, my parents have been very important for me. Here, I would like to thank my Father, who was the first to bring me into mathematics and science. By having a pro- grammable home computer at the age of 13, my career was practically settled.

I dedicate the thesis to my wife Kristin and my children Isak and Benjamin. Sharing with you the few hours that I did not work on the thesis made the long hours at work bearable.

Thank you for your support, understanding, and love!

Linköping, April 1996 Niclas Wiberg

ix

## Contents

**1** **Introduction** **1**

*1.1 Decoding Complexity* *2*

*1.2 Iterative Decoding Based on Tanner Graphs* *3*
*1.3 Turbo Codes* *4*

*1.4 Thesis Outline* *4*

**2** **Code Realizations Based on Graphs** **6**

*2.1 Systems, Check Structures and Tanner Graphs* *7*
*2.2 Turbo Codes* *10*

**3** **The Min-Sum and Sum-Product Algorithms** **12**

*3.1 The Min-Sum Algorithm* *13*
*3.2 The Sum-Product Algorithm* *18*

*3.3 Updating Order and Computation Complexity* *21*
*3.4 Optimized Binary Version of the Min-Sum Algorithm* *22*
*3.5 Non-Decoding Applications* *23*

*3.6 Further Unifications* *24*

**4** **Analysis of Iterative Decoding** **25**

*4.1 The Computation Tree* *27*
*4.2 The Deviation Set* *29*

**5** **Decoding Performance on Cycle-Free Subgraphs** **37**

*5.1 Estimating the Error Probability with the Union Bound* *37*
*5.2 Asymptotic Union Bound* *40*

*5.3 Computing Statistical Moments of Final Costs* *45*
*5.4 Gaussian Approximation of Log-Cost-Ratios* *49*

**6** **Decoding Performance with Cycles** **52**

*6.1 Cycle Codes* *54*
*6.2 Tailbiting Trellises* *57*
*6.3 The General Case* *59*
*6.4 Turbo Codes* *60*

*7.1 Realization Complexity* *62*
*7.2 Cycle-Free Realizations* *63*
*7.3 Realizations with Cycles* *64*
*7.4 Modeling Complicated Channels* *71*

**8** **Conclusions** **74**

**A Proofs and Derivations** **76**

*A.1 Proof of Theorems 3.1 and 3.2* *76*
*A.2 Derivations for Section 3.3* *80*
*A.3 Derivation for Section 3.4* *84*
*A.4 Derivation used in Section 5.4* *85*
*A.5 Proofs for Section 6.1* *88*
*A.6 Proof of Theorem 7.2* *91*

**References** **92**

1

## Chapter 1 Introduction

This thesis deals with methods to achieve reliable communication over unreliable channels.

Such methods are used in a vast number of applications which affect many people’s every- day life, for example mobile telephony, telephone modems for data communication, and storage mediums such as compact discs.

A basic scheme for communicating over unreliable channels is illustrated in Figure 1.1.

The message to be sent is encoded with a channel code before it is transmitted on the chan- nel. At the receiving end, the output from the channel is decoded back to a message, hope- fully the same as the original one. A fundamental property of such systems is Shannon’s channel coding theorem, which states that reliable communication can be achieved as long as the information rate does not exceed the “capacity” of the channel, provided that the encoder and decoder are allowed to operate on long enough sequences of data (extensive treatments can be found in many textbooks, e.g. [1]).

We will deal with the decoding problem, i.e. finding a good message estimate given the channel output. The problem can be solved, in principle, by searching through all possible messages and comparing their corresponding codewords with the channel output, selecting the message which is most likely to result in the observed channel output. While such a method can be made optimal in the sense of minimizing the error probability, it is useless in practice because the number of messages is too large to search through them all. The inter- esting (and difficult!) aspect of the decoding problem is to find methods with reasonable complexity that still give sufficiently good performance.

Decoding methods can be divided, roughly, in two classes: algebraic and “probabilistic”.

Algebraic methods are typically based on very powerful codes, but are only suited to rela- tively reliable channels. In such cases, however, they can provide virtually error-free com- munication. By “probabilistic” methods, we mean methods that are designed to use the

**Figure 1.1 Shannon’s model for reliable communication on an unreliable channel.**

Encoder Channel Decoder

message codeword channel message

output estimate

channel output as efficiently as possible, approaching the performance of an optimal decoder. Such methods are better suited to highly unreliable channels than algebraic meth- ods are. Unfortunately, probabilistic methods in general work only for relatively weak codes, and consequently, cannot completely avoid decoding errors. To achieve error-free communi- cation on highly unreliable channels, one typically uses a combination of probabilistic and algebraic decoding methods.

We will consider probabilistic decoding methods in this thesis, with the goal of finding decoding methods that both use the channel output efficiently and are still capable of han- dling relatively powerful codes. It is assumed that the reader is familiar with the basics of communication theory.

### 1.1 Decoding Complexity

Until recently, the most important probabilistic decoding method has been the Viterbi algo- rithm [2]. A main reason for its success is that it is optimal, or maximum-likelihood, in the sense that it minimizes the probability of decoding error for a given code. The main draw- back, on the other hand, is the computation complexity, which is very high for good codes (the number of operations grows exponentially with the minimum distance of the code, cf. [3]).

The reason behind this high complexity is to be found in the trellis code description,
illustrated in Figure 1.2 for a small code, on which the Viterbi algorithm is based. In a trellis,
the structure of the code is expressed by viewing the code as a dynamic system, introducing
a time axis on which the codeword components are laid out. (Of course, having such a time
axis is practical too, since the components must be transmitted in some order anyway. With a
memoryless channel, however, the transmission order is irrelevant from a theoretical per-
*spective; the point here is that the decoding algorithm assumes some time axis on which it*
operates.) With this dynamic-system view, all dependence between the past and the future
(with respect to some instant on the time axis) is expressed in the present state. The problem
is that good codes by necessity must have a high dependence between the codeword compo-
nents, implying that the state space of the trellis must be excessively large (cf. [3]), which
leads to a high decoding complexity.

0

0

1 1

0 1

0 0

1

1

0 0

0

1 1

0 1

1 1

1 1

0 0 0

**Figure 1.2***A minimal trellis for a binary linear (6, 3, 3) code. The codewords are*
*obtained by following paths from left to right and reading off the labels encountered.*

*1.2 Iterative Decoding Based on Tanner Graphs* 3

*On the other hand, it has been known for a long time that tailbiting trellises can in some*
cases be much smaller than any ordinary trellis for the same code [4]. In a tailbiting trellis
there are multiple starting states and equally many ending states (an ordinary trellis has sin-
gle starting and ending states), and a path is required to start and end in the same state.

Figure 1.3 illustrates a two-state tailbiting trellis for the same code as in Figure 1.2. In this case, the maximal number of states is reduced from four to two.

Loosely speaking, tailbiting trellises are based on a circular time axis. This is also the
reason for the lower complexity: since the dependence between any two halves of the code-
*words may be expressed in two states, one in each direction on the time axis, the size of each*
of these state spaces may be smaller than if there were only one state space.

### 1.2 Iterative Decoding Based on Tanner Graphs

One of the major goals behind this thesis has been to search for code descriptions, or “real- izations”, with lower complexity than trellises. The other major goal was to investigate how such realizations could be exploited by decoding algorithms to achieve low decoding com- plexity. In particular, a promising direction seemed to be “iterative” decoding, i.e. decoding algorithms that operate on some internal state which is altered in small steps until a valid codeword is reached.

The main result is a framework for code realizations based on “Tanner graphs”, which have the role of generalized time axes, and two generic iterative decoding algorithms that apply to any such realization. While our framework was developed as a combination and generalization of trellis coding and Gallager’s low-density parity-check codes [5], the basic ideas were all present in Tanner’s work “A recursive approach to low complexity codes” [6], including the two mentioned algorithms. Our main contributions to the framework is to explicitly include trellis-type realizations and to allow more general “metrics”, or “cost func- tions”; the latter makes it possible to model, e.g., non-uniform a priori probability distribu- tions, or channels with memory.

While this framework appeared interesting in itself, additional motivation for our research arose from an unexpected direction: turbo codes.

0 0

1 1

0 1

0 0

1 1

0 1

0 0

1 1

0 1

**Figure 1.3 A tailbiting trellis for a binary linear (6, 3, 3) code. The marked path cor-***responds to the codeword 011011.*

### 1.3 Turbo Codes

Undoubtedly, the invention of turbo codes [7] is a milestone in the development of commu- nication theory. Compared to other coding systems, the improvement in performance obtained with turbo coding is so big that, for many applications, the gap between practical systems and Shannon’s theoretical limit is essentially closed, a situation which was probably not predicted by anyone before the invention of turbo codes.

On the negative side, the turbo code construction is largely based on heuristics, in the sense that no theoretical analysis exists as of yet that can predict their amazing performance.

More precisely, it is the decoding algorithm that remains to be analyzed; a relatively success- ful analysis of the theoretical code performance is given in [8].

As it turned out, turbo codes and their decoding algorithm fit directly into our general framework for codes and decoding based on graphs. This relation provided us with an addi- tional research goal: to understand the turbo codes and their decoding performance, using our framework. Consequently, a lot of the material in this thesis is highly related to turbo codes and their decoding algorithms, and we have tried to make this connection explicit in many places.

### 1.4 Thesis Outline

The following two chapters present the graph-based framework. Chapter 2 provides a formal definition of code realizations based on graphs. While the basic ideas are due to Tanner [6], our definitions are more general and based on a different terminology. Chapter 3 presents the two decoding algorithms, the “min-sum” and the “sum-product” algorithms. We give a gen- eral formulation of these two algorithms, which are extended versions of Tanner’s algo- rithms A and B, and we show their optimality when applied to realizations with cycle-free Tanner graphs (such as trellises). In both of these chapters we demonstrate explicitly how trellises and turbo codes fit into the framework. Another important example that appears throughout the thesis is Gallager’s low-density parity-check codes [5].

The material in Chapter 2 and Chapter 3 is relatively mature and has been presented ear- lier, in [9], along with some parts of Chapter 6 and Chapter 7.

Chapters 4 through 6 are devoted to iterative decoding, i.e. the application of the min- sum and sum-product algorithms to realizations with cycles. In Chapter 4 we develop some fundamental results for performance analysis of iterative decoding; these are used in the fol- lowing two chapters. Chapter 5 is focused on the decoding performance after the first few decoding iterations, before the cycles have affected the computation. In Chapter 6, we con- sider the performance obtained after this point, when the cycles do affect the computation.

For a limited class of realizations, we analyze the asymptotic performance after infinitely many decoding iterations. The applicability of this result to turbo codes is also discussed.

In Chapter 7, we return to code realizations and complexity issues. A precise interpreta- tion of the above complexity reasoning, regarding trellises and tailbiting trellises, will be given. We will also give a few more examples of code realizations, both with and without cycles. Some of these examples are closely related to turbo codes; in fact, they may point out

*1.4 Thesis Outline* 5
a convenient way of constructing good turbo-like realizations with moderate block lengths.

In addition, we will show how the framework can incorporate more complicated situations involving, e.g., channels with memory.

Chapter 8, finally, contains our conclusions.

6

## Chapter 2

## Code Realizations Based on Graphs

The central theme of this thesis is to describe codes by means of “equation systems”, whose
*structure are the basis for decoding algorithms. By structure we mean the relation between*
*the variables and the equations. More precisely, the equation system defines a bipartite*
*graph with vertices both for the variables and for the equations; an edge indicates that a par-*
ticular variable is present in a particular equation.

* Example 2.1* Figure 2.1 illustrates a linear equation system of six variables as well as its
structure in the form of the mentioned bipartite graph. Assuming binary variables, this par-
ticular equation system defines a binary linear (6, 3, 3) code, with the equations correspond-

ing to the rows of a parity-check matrix. ✼

While Example 2.1 and the term “equation system” conveys some of our ideas, the defini- tions that we will soon give include a lot more than just linear equations. For maximal gener- ality, we allow an “equation” on a set of variables to be any subset of the possible value combinations; the “equation system” is then the intersection of these subsets. We hope that the reader will not be repelled by this rather abstract viewpoint. As an aid, we provide sev- eral examples that are familiar to the reader to show how our definitions unify many differ- ent code descriptions and decoding algorithms. In Chapter 7, we provide some examples of new code realizations too.

*x*_{1} *x*_{2}

*x*_{4}
*x*_{3}

*x*_{6}

*x*_{5}

**Figure 2.1 An equation system and the corresponding bipartite graph. Each filled dot***together with its neighbors corresponds to an equation with its variables.*

*x*_{1}+*x*_{2}+*x*_{3} = 0
*x*_{3}+*x*_{4}+*x*_{5} = 0
*x*_{5}+*x*_{6}+*x*_{1} = 0
*x*_{2}+*x*_{4}+*x*_{6} = 0

*2.1 Systems, Check Structures and Tanner Graphs* 7

### 2.1 Systems, Check Structures and Tanner Graphs

*A configuration space is a direct product* , where is a collection of
*alphabets (or state spaces). We will usually assume that the index set N, as well as all alpha-*
bets , , are finite. (Neither of these assumptions is essential, though.) Elements of
*W will be called configurations. Elements of N will be referred to as sites.*

*In many of our examples (e.g. 2.1), the index set N corresponds directly to the codeword*
components, i.e. , and the site alphabets are the binary field, i.e.

for all . This means that the configuration space is the familiar space * of binary n-*
tuples. However, more general configuration spaces are also useful, e.g., for trellis-based
constructions, where some of the sites do not correspond to codeword components but rather
to “cuts” of the time axis; the corresponding site alphabets are then state spaces of the
trellis.

The components of a configuration will be denoted by , . More gener- ally, the restriction (projection) of to a subset of sites will be denoted by . For a set of configurations and a site subset we will use the notation

.

**Definition 2.1 A system is a triple***, where N is a set of sites, W is a*
configuration space, and * is the behavior. (This “behavioral” notion of*
*a system is due to Willems [10], cf. also [11].) The members of B will be*
*called valid configurations. A system is linear if all alphabets* are vector
*spaces (or scalars) over the same field, the configuration space W is the direct*
*product of the alphabets, and the behavior B is a subspace of W.*

**Definition 2.2 A check structure for a system*** is a collection Q of*
*subsets of N (check sets) such that any configuration* satisfying
for all check sets * is valid (i.e., in B). The restriction* of the
*behavior to a check set E is called the local behavior at E. A configuration x is*
*locally valid on E if* . Note that a configuration is valid if and only if it
is locally valid on all check sets.

*The bipartite graph corresponding to a check structure Q for a system* is called a
*Tanner graph [6] for that system. Tanner graphs will be visualized as in Figure 2.1, with*
sites represented by circles and check sets by filled dots which are connected to those sites
(circles) that they check.

The definitions 2.1 and 2.2 are “axiomatic” in the sense that they specify required prop-
*erties for Q to be a check structure. Actual constructions are usually built in the opposite*
*way, by specifying a check structure Q and the corresponding local behaviors*
*(“checks”), so that a desired behavior B is obtained. This was illustrated in Example 2.1,*
which can be seen as a system with * and W the six-dimensional*
*binary vector space. The check structure Q = {{1, 2, 3}, {3, 4, 5}, {5, 6, 1}, {2, 4, 6}} and*

the local behaviors (for all check sets ) together define

*the behavior B = {000000, 110001, 011100, 000111, 101101, 110110, 011011, 101010}.*

*W* =

### Π

*∈*

^{s}

^{N}*A*

*{ }*

_{s}*A*

_{s}*s*∈

*N*

*A*_{s}*s*∈*N*

*N* = {1, ,… *n*} *A** _{s}* =

*F*

_{2}

*s*∈*N* *F*_{2}^{n}

*A*_{s}

*x*∈*W* *x*_{s}*s*∈*N*

*x*∈*W* *R*⊆*N* *x*_{R}

*X*⊆*W* *R*⊆*N*

*X** _{R}* ≡∧ {

*x*

*:*

_{R}*x*∈

*X*}

*N W B*, ,

( )

*B*⊆*W*

*A*_{s}

*N W B*, ,

( )

*x*∈*W*

*x** _{E}*∈

*B*

_{E}*E*∈

*Q*

*B*

_{E}*x** _{E}*∈

*B*

_{E}*N W B*, ,

( )

*B*_{E}*N W B*, ,

( ) *N* = {1, ,… 6}

*B** _{E}* = {000 110 101 011, , , }

*E*∈

*Q*

*Any binary block code C of length n may be viewed as a system* , where
, , and is the set of codewords. For linear codes, a par-
*ity check matrix H (i.e., a matrix H such that* if and only if ) defines a check
*structure with one check set for each row of H, containing those sites that have a “one” in*
that row. The corresponding local behaviors are simple parity checks. Of special interest is
*the case when the check sets have a small, fixed size k, and the sites are contained in a small,*
*fixed number j of check sets. Such systems, which were introduced by Gallager in [5], are*
referred to as *low-density parity-check codes. When* , i.e., when sites belong to
*exactly two check sets, the codes are referred to as cycle codes [12, pp. 136-138], since these*
codes are generated by codewords whose support corresponds to cycles in the Tanner graph.

(Example 2.1 is a cycle code.)

So far, we have only considered systems where all sites correspond to components of the
*codewords. However, it is often useful to allow hidden sites, which do not correspond to*
codeword components but only serve to give a suitable check structure. The most familiar
example of such descriptions is the trellis, as illustrated by the following example.

* Example 2.2* Figure 2.2 illustrates the minimal trellis for the same binary linear (6,3,3)

*block code as in Example 2.1. The trellis is a system (N, W, B) with two types of sites: visi-*

*ble sites (corresponding to codeword components) and hidden sites (corresponding to the*

“cuts” between the trellis sections). Hidden sites are illustrated by double circles. The visible site alphabets are all binary, but the hidden site alphabets (the state spaces) contain one, two or four states. A configuration is an assignment of states and output symbols, one from each site alphabet. Such a configuration is valid (i.e. a path) if and only if each local configuration of left state, output symbol, and right state is valid, i.e., a branch. ✼

*N W B*, ,

( )

*N* = {1 2, , ,… *n*} *W* = *F*_{2}^{n}*B* = *C*

*H x** ^{T}* = 0

*x*∈

*C*

*j k*,

( ) *j* = 2

0 1

0 0

0 0

0

0 0 0 0

0

0

1 1 1

1 1

1

1 1

1

1

**Figure 2.2 A trellis (top) for a (6,3,3) code, and the corresponding Tanner graph. The****values in the sites form a valid configuration, i.e., a path, which can be seen by check-***ing locally in each trellis section.*

1 0

10

01 11

0 00

00 00

0 0

10 10 1

01 01

11 11

0 0 1

1 0

1 1

1

0 01

11 11

1 0

*2.1 Systems, Check Structures and Tanner Graphs* 9

(As mentioned in the example, hidden sites will be depicted by double circles.) In general, if
*(N, W, B) is a system with hidden sites, and* * is the set of visible sites, then a codeword*
of the system is the restriction of a valid configuration * to V. The visible behavior*
*or output code of the system is* *. We consider a system (N, W, B) with a*
*check structure Q to be a description or realization of the corresponding output code* .

The motivation for introducing hidden sites, and indeed for studying systems with check structures at all, is to find code descriptions that are suitable for decoding. There are many different realizations for any given code, and the decoding algorithms that we will consider in Chapter 3 can be applied, in principle, to all of them. However, both the decoding com- plexity and the performance will differ between the realizations.

One important property of a realization is its structural complexity. In the decoding algo- rithms that we will consider, all members of the site alphabets and of the local behaviors are considered explicitly during the decoding process, in some cases even several times.

For this reason, the site alphabets and the local behaviors should not be too large; in particu- lar, trivial check structures such as the one with a single check set , and the one with a single hidden site whose alphabet has a distinct value for each valid configuration, are unsuitable for decoding (see Figure 2.3).

Another important property lies in the structure of the Tanner graph. As we will see in Chapter 3, the decoding algorithms are optimal when applied to realizations with cycle-free Tanner graphs. For realizations with cycles in the Tanner graphs, very little is actually known about the decoding performance; however, most indications are that it is beneficial to avoid short cycles. (This will be discussed later.)

So why do we consider realizations with cycles at all, when the decoding algorithms are
optimal for cycle-free realization? The advantage of introducing cycles is that the structural
complexity may be much smaller in such realizations, allowing for a smaller decoding com-
*plexity. What happens, roughly, is that a single trellis state space of size m is split into a*
number of hidden sites of size such that . This will be discussed in Chapter 7,
where we will continue the discussion of graph-based realizations. Here we just give one
further example of a realization with many cycles in the Tanner graph, an example that
deserves special attention.

*V*⊆*N*

*x*_{V}*x*∈*B*

*B** _{V}* ≡∧ {

*x*

*:*

_{V}*x*∈

*B*}

*B*_{V}

*A*_{s}*B*_{E}

*Q* = { }*N*
**Figure 2.3 Two trivial realizations that apply to any code (of length six).**

*m*_{s}

### ∏

_{s}*m*

_{s}^{≈}

^{m}### 2.2 Turbo Codes

The turbo codes of Berrou et al. [7] are famous for their amazing performance, which beats anything else that has been presented so far. Unfortunately, the performance has only been demonstrated by simulations, and not by theoretical results. In particular, the decoding algo- rithm proposed in [7] remains to be analyzed (the theoretical code performance was ana- lyzed to some extent in [8]).

The conventional way of presenting turbo codes is to describe the encoder. As illustrated
in Figure 2.4, the information sequence is encoded twice by the same recursive systematic
convolutional encoder, in one case with the original symbol ordering and in the other case
after a random interleaving of the information symbols. (At the output, the redundancy
sequences are often punctured in order to achieve the overall rate ^{1}/_{2}.) The convolutional
encoders are usually rather simple; the typical encoder memory is 4 (i.e., there are 16 states
in the trellis).

The corresponding Tanner graph (Figure 2.5) makes the structure somewhat more appar- ent, consisting of two trellises that share certain output symbols via an interleaver (i.e., the order of the common symbols in one trellis is a permutation of the order in the other trellis).

It is well known (cf. [8]) that the amazing performance of turbo codes is primarily due to the interleaver, i.e., due to the cycle structure of the Tanner graph.

Information Sequence

Redundancy Sequence 1

Redundancy Sequence 2 Convolutional

Randomly Chosen

Information Sequence

**Figure 2.4 The turbo codes as presented by Berrou et al [7].**

Encoder

Recursive Convolutional Recursive

Interleaver Encoder

*2.2 Turbo Codes* 11

**Figure 2.5 The Tanner graph of the turbo codes.**

Interleaver Information

Sequence Redundancy Sequence 1

Redundancy Sequence 2

12

## Chapter 3

## The Min-Sum and Sum-Product Algorithms

We will describe two generic decoding algorithms for code realizations based on Tanner graphs, as described in the previous chapter. The structure of the algorithms matches the graphs directly. It will be convenient to think of these algorithms as parallel processing algo- rithms, where each site and each check is assigned its own processor and the communication between them reflects the Tanner graph. (In fact, this “distributed” viewpoint was one of the motivations for developing the framework. However, in many cases a sequential implemen- tation is actually more natural.)

The algorithms come in two versions: the min-sum algorithm and the sum-product algo- rithm. The ideas behind them are not essentially new; rather, the algorithms are generaliza- tions of well-known algorithms such as the Viterbi algorithm [2] and other trellis-based algorithms. Another important special case is Gallager’s algorithm for decoding low-density parity-check codes [5]. A relatively general formulation of the algorithms was also given by Tanner [6] (the relation between Tanner’s work and ours is discussed in Section 1.2 of the introduction).

There are other generic decoding algorithms that apply to our general framework for code descriptions. In a previous work [13] we discussed the application of Gibbs sampling, or simulated annealing for decoding graph-based codes.

The overall structure of the algorithms, and the context in which they apply, is illustrated
in Figure 3.1. As shown, the algorithms do not make decisions, instead they compute a set of
*final cost functions upon which a final decision can be made. The channel output enters the*
*algorithms as a set of local cost functions, and the goal of the algorithms is to concentrate,*
for each site, all information from the channel output that is relevant to that site.

Formally, there is one local cost function for each site , denoted by

(where denotes the real numbers), and one for each check set , denoted by
. Similarly, there is one final cost function for each site , denoted by
, and one for each check set, denoted by . (In our applications,
*s*∈*N* γ*s*:*A** _{s}*→

**R****R***E*∈*Q*

γ*E*:*W** _{E}*→

**R***s*∈

*N*

µ*s*:*A** _{s}*→

*µ*

**R***E*:

*W*

*→*

_{E}

**R***3.1 The Min-Sum Algorithm* 13

the check cost functions and are often not used; they are most interesting when the codewords are selected according to some non-uniform probability distribution, or, as dis- cussed in Section 7.4, when dealing with channels with memory.)

During the computation, the algorithms maintain a set of intermediate cost functions: for each pair of adjacent site and check set (i.e., ), there is one check-to-site cost

function and one site-to-check cost function . These cost

functions are best thought of as having a direction on the Tanner graph. For instance, we will
often call * the “contribution” from the check set E to the site s. (In the cycle-free case,*
this will be given a precise interpretation.) See Figure 3.2.

### 3.1 The Min-Sum Algorithm

The min-sum algorithm is a straightforward generalization of the Viterbi algorithm [2]. (The resulting algorithm is essentially Tanner’s Algorithm B [6]; Tanner did not, however, observe the connection to Viterbi decoding.) Hagenauer’s low-complexity turbo decoder [14] fits directly into this framework. A well-known decoding algorithm for generalized con- catenated codes [15] is also related, as is threshold decoding [16]. Before going into the gen- eral description, we encourage the reader to go through the example in Figure 3.3 on pages 14–15, where the decoding of a (7,4,2) binary linear code using the min-sum algo- rithm is performed in detail.

Min-sum or

Channel γ*s*

*x*∈*B*

*noise*

µ*E s*, µ*s E*, µ*s*

*codeword*

*local*
*cost functions*

*final*
*cost functions*

Decision *xˆ*
*estimate*
sum-product

algorithm

*intermediate*
*cost functions*

**Figure 3.1***Typical decoding application of the min-sum or sum-product algorithm.*

**The channel output takes the form of local cost functions*** (“channel metrics”)*
**which are used by the min-sum or sum-product algorithm to compute final cost func-****tions***, upon which the final decisions are based. During the computation process,*
*the algorithms maintain a set of intermediate cost functions.*

γ*s*

µ*s*

γ*E* µ*E*

*s E*,

( ) *s*∈*E*

µ*E s*, :*A** _{s}*→

*µ*

**R***s E*, :

*A*

*→*

_{s}

**R**µ*E s*,

*s* µ*s E*, *E*

µ*E s*,

* Figure 3.2 The intermediate cost functions*µ

*s E*, :

*A*

*→*

_{s}

**R***and*µ

*E s*, :

*A*

*→*

_{s}

**R***.*

[2,4]

[1,5]

[4,1]

[2,4]

[1,5]

[min(1+2,5+4),min(1+4,5+2)]

= [3,5]

*c)*

*The check-to-site cost function from the upper*
*check to the middle site. For each possible value*
*in the middle site, the check finds the smallest*
*possible cost contribution from the two topmost*
*sites. E.g., for a “0” in the middle sites, the pat-*
*terns “00” and “11” are examined.*

[2,4]

[1,5]

[3,4] [2,6]

[3,4]

[4,1]

[5,2]

*b)*

*The channel output after transmitting a random*
**codeword. The numbers in the sites are the local****costs (log-likelihoods) for assigning “0” or “1”**

*to that site. The decoding problem is to find a*
**codeword with the smallest global cost, defined***as the sum of the local costs in the sites.*

1 0

0 0

1 1

1
*a)*

*The Tanner graph and a codeword. The circles*
*(sites) correspond to the codeword components*
*and the small dots (checks) to the parity-check*
*equations, i.e., the three sites connected to any*
*check are required to have even parity.*

[4,1]

[6,5]

[3,5]

[5,6]

[4+3+6+5,1+5+5+6]

= [18,17]

*decode to “1”*

*d)*

*Final decision of the middle site. (The two lower*
*checks have computed their contributions to the*
*middle site in the same way as was done in c.)*
*The global cost of “0” and “1” in the middle*
*site is then just the sum of the local costs and*
*the three incoming cost contributions.*

**Figure 3.3***The min-sum algorithm applied to a binary linear (7,4,2) code, whose Tanner graph is*
**shown in a). The decoding problem is to find the codeword with the smallest global cost (or “metric”),****defined as the sum over the codeword components of the corresponding local costs, which are indi-***cated in b). The local costs are typically channel log-likelihoods (such as Hamming distance or*
*squared Euclidean distance to the received values).*

*3.1 The Min-Sum Algorithm* 15

[2,4]

[1,5]

[3,4] [2,6]

[3,4]

[4,1]

[5,2]

[18,17]

[20,17] [18,17]

[17,20]

[17,18]

[17,19] [18,17]

*h)*

*The rest of the Tanner graph is processed in the*
*same way, resulting in final costs as shown. The*
*resulting optimal codeword turns out to be the*
*one shown in a).*

[1,5]

[16,14]

[1+16,5+14]

= [17,19]

*decode to “0”*

*g)*

*The final cost function of the upper-left site is*
*the sum of its local costs and the cost contribu-*
*tions from its only check. The resulting costs are*
*the smallest possible global costs that result*
*from assigning a “0” or “1” to the upper-left*
*site.*

[2,4]

[1,5]

[4,1]

[2,4]

[15,12]

[min(15+2,12+4),min(15+4,12+2)]

= [16,14
]
*f)*

*f)*

*The top-left site receives the smallest cost con-*
*tributions from the rest of the graph that results*
*from assigning “0” or“1”to that site.*

[5,6]

[6,5]

[4,1]

[4+6+5,1+5+6]

= [15,12]

*e)*

*The site-to-check costs from the middle site to*
*the upper check is the smallest possible cost in*
*the lower five sites that results from assigning*

*“0” or “1” to the middle site; it is computed by*
*adding the middle site’s local costs to the sum of*
*the contributions from the two lower checks.*

**Figure 3.3 (continued) Boxes c) –g) illustrate the computation of the intermediate and final cost func-***tions for a few of the sites. In h), the final cost functions of all sites are shown.*

As discussed in the example (Figure 3.3), the goal of the min-sum algorithm is to find a
valid configuration such that the sum of the local costs (over all sites and check sets)
is as small as possible. When using the min-sum algorithm in a channel-decoding situation
*with a memoryless channel and a received vector y, the local check costs* are typi-
cally omitted (set to zero) and the local site costs are the usual channel log-likelihoods
(for visible sites; for hidden sites they are set to zero). On the binary symmet-
ric channel, for example, the local site cost would be the Hamming distance between

and the received symbol .

The algorithm consists of the following three steps:

• *Initialization. The local cost functions* and are initialized as appropriate
(using, e.g., channel information). The intermediate cost functions and

are set to zero.

• *Iteration. The intermediate cost functions* and are alternatingly
updated a suitable number of times as follows (cf. also Figure 3.4). The site-
to-check cost is computed as the sum of the site’s local cost and all
*contributions coming into s except the one from E:*

. (3.1)

The check-to-site cost is obtained by examining all locally valid con-
*figurations on E that match a on the site s, for each summing the check’s local*
*cost and all contributions coming into E except the one from s. The minimum*
over these sums is taken as the cost :

. (3.2)

*x*∈*B*

γ*E*(*x** _{E}*)
γ

*s*( )

*x*

_{s}*p y*( _{s}*x** _{s}*)
log
–

γ*s*( )*x*_{s}

*x*_{s}*y*_{s}

γ*s* γ*E*

µ*E s*,

µ*s E*,

µ*s E*, µ*E s*,

µ*s E*, ( )*a*

**Figure 3.4 The updating rules for the min-sum algorithm.**

*s* *E*

µ*s E*,

*E*_{1}

*E*_{2}
µ*E*_{1},*s*

µ*E*_{2},*s*

µ*s E*, ( )*a* := γ*s*( )*a* +µ*E*_{1},*s*( )*a* +µ*E*_{2},*s*( )*a*

*E*
µ*E s*,

µ*s*_{1},*E*

µ*s*_{2},*E*

*s*

*s*_{1}

*s*_{2}

µ*E s*, ( )*a* γ*E*(*x** _{E}*) µ

*s*

_{1},

*E*

*x*

_{s}( 1) µ*s*_{2},*E* *x** _{s}*
( 2)

+ +

[ ]

*x** _{E}*∈

*B*

_{E}*x*

*=*

_{s}*a*

min :=

µ*s E*, ( )*a* γ*s*( )*a* µ*E*′,*s*( )*a*

*E*′∈*Q*:
*s*∈*E*′,*E*′≠*E*

### ∑

+ :=

µ*E s*, ( )*a*

µ*E s*, ( )*a*

µ*E s*, ( )*a* γ*E*(*x** _{E}*) µ

*s*′,

*E*(

*x*

_{s}_{′})

*s*′∈*E*

### ∑

:*s*′≠

*s*

+

*x** _{E}*∈

*B*

*:*

_{E}*x*

*=*

_{s}*a*

min :=

*3.1 The Min-Sum Algorithm* 17

• *Termination. The final cost functions* and are computed as follows.

The final site cost is computed as the sum of the site’s local cost and all
*contributions coming into s, i.e.,*

, (3.3)

and the final check cost for a local configuration is computed
*as the sum of the check’s local cost and all contributions coming into E, i.e.,*

. (3.4)

As we mentioned above, the goal of the min-sum algorithm is to find a configuration with
*the smallest possible cost sum. To formulate this precisely, we define the global cost of a*
valid configuration as

. (3.5)

In a typical channel-decoding situation where the local check costs are set to zero
and the local site costs are , the global cost becomes the log-
likelihood * for the codeword x; then maximum-likelihood decoding corresponds*
to finding a valid configuration that minimizes . As we will see, the min-sum
algorithm does this minimization when the check structure is cycle-free. In addition, it is
also possible to assign nonzero values to the check costs in order to include, e.g., an a
priori distribution over the codewords: if we define the local check costs such that

, then the global cost will be

, and minimizing will be equivalent to maximizing the a posteri- ori codeword probability . (See the next section for more about a priori probabilities.) The following theorem is the fundamental theoretical property of the min-sum algo- rithm:

**Theorem 3.1** If the check structure is finite and cycle-free, then the cost
functions converge after finitely many iterations, and the final cost functions
become

(3.6)

and

. (3.7)

µ*s* µ*E*

µ*s*( )*a*

µ*s*( )*a* γ*s*( )*a* µ*E*′,*s*( )*a*

*E*′∈*Q*

### ∑

:*s*∈

*E*′

+ :=

µ*E*( )*a* *a*∈*B*_{E}

µ*E*( )*a* γ*E*( )*a* µ*s*′,*E*(*a*_{s}_{′})

*s*′

### ∑

∈*E*

+ :=

*x*∈*B*

*G x*( ) γ*E*(*x** _{E}*)

*E*

### ∑

∈*Q*

^{γ}

^{s}^{( )}

^{x}

^{s}*s*

### ∑

∈*N*

≡∧ +

γ*E*(*x** _{E}*)
γ

*s*( )

*x*

*≡∧ –log*

_{s}*p y*(

_{s}*x*

*)*

_{s}*G x*( )

*p y x*( )

log –

*x*∈*B* *G x*( )

γ*E*(*x** _{E}*)

*p x*( ) γ*E*

*p x*( )
log

– ^{=}

### ∑

_{E}_{∈}

*γ*

_{Q}*E*(

*x*

*)*

_{E}

^{G x}^{( )}

^{≡∧}

^{–}

^{log}

^{p x}^{( )}

^{–}

^{log}

^{p y x}^{(}

^{)}

^{=}

*p x y*( )

log

– –log*p y*( ) *G x*( )

*p x y*( )

µ*s*( )*a* *G x*( )

*x*∈*B*:*x** _{s}*=

*a*

= min

µ*E*( )*a* *G x*( )

*x*∈*B*:*x** _{E}*=

*a*

min

=

(The proof is given in Appendix A.1.) For Tanner graphs that contain cycles, there is no gen- eral result for the final cost functions, or for the decoding performance. This issue is dis- cussed in the following chapters.

As we mentioned earlier, the min-sum algorithm only computes the final costs; no deci- sion is made. Usually, we also want to find a configuration that minimizes the global cost.

*Such a configuration is obtained by taking, for each site s, a value* that minimizes
the final cost . It may then happen that for some sites several values minimize the final
cost; then it may be a nontrivial problem to find a valid configuration that minimizes at
all sites. For a cycle-free check structure, however, there is a straightforward procedure for
*solving this problem: start in a leaf site s (one that belongs to a single check set) and choose*
an optimal value for ; then extend the configuration successively to neighboring sites,
always choosing site values that are both valid and minimize the final cost.

In a practical implementation it is important to handle numerical issues properly. Typi- cally, the cost functions and grow out of range quickly. To overcome this, an arbi- trary normalization term may be added to the updating formulas without affecting the finally chosen configuration. Since the algorithm only involves addition and minimization (i.e., no multiplication), fixed precision arithmetic can be used without losing information (the only place where precision is lost is in the initial quantization of the local costs).

### 3.2 The Sum-Product Algorithm

The sum-product algorithm is a straightforward generalization of the forward-backward algorithm of Bahl et al [17] for the computation of per-symbol a posteriori probabilities in a trellis. Two other special cases of the sum-product algorithm are the classic turbo decoder by Berrou et al. [7] and Gallager’s decoding algorithms for low-density parity-check codes [5].

The general case was outlined by Tanner [6], who did not, however, consider a priori proba- bilities.

In the sum-product algorithm, the local cost functions and have a multiplicative interpretation: we define the global “cost” for any configuration as the product

. (3.8)

The term “cost” is somewhat misleading in the sum-product algorithm, since it will usually
be subject to maximization (rather than minimization as in the min-sum algorithm); we have
chosen this term to make the close relation between the two algorithms transparent. The
*algorithm does not maximize G directly; it merely computes certain “projections” of G,*
which in turn are natural candidates for maximization.

*When discussing the sum-product algorithm, it is natural not to consider the behavior B*
explicitly, but to instead require that the check costs are zero for local configurations
that are non-valid and positive otherwise. Hence, we require

with equality if and only if . (3.9)

*x** _{s}*∈

*A*

*µ*

_{s}*s*( )

*x*

_{s}µ*s*

*x*_{s}

µ*E s*, µ*s E*,

γ*s* γ*E*

*x*∈*W*
*G x*( ) γ*E*(*x** _{E}*)

*E*

### ∏

∈*Q*

^{γ}

^{s}^{( )}

^{x}

^{s}*s*

### ∏

∈*N*

≡∧

γ*E*(*x** _{E}*)

γ*E*(*x** _{E}*)≥0

*x*

*∉*

_{E}*B*

_{E}*3.2 The Sum-Product Algorithm* 19

If we also require the local site costs to be strictly positive for all , then it is easy to
*see that the global cost (3.8) of a configuration x is strictly positive if x is valid, and zero oth-*
*erwise. In particular, a configuration that maximizes G is always valid (provided that B is*
nonempty).

In the typical channel-decoding situation, with a memoryless channel and a received vec-
*tor y, the local site costs* are set to the channel likelihoods (for visible sites;

for hidden sites is set to one), and the local check costs are chosen according to the a pri-
*ori distribution for the transmitted configuration x, which must be of the form*
. This form includes Markov random fields [18], Markov chains, and,
in particular, the uniform distribution over any set of valid configurations. (The latter is
achieved by taking as the indicator function for , with an appropriate scaling factor.)
With this setup, we get , i.e., is proportional to the a pos-
*teriori probability of x. We will see later that if the check structure is cycle-free, the algo-*
rithm computes the a posteriori probability for individual site (symbol) values

, which can be used to decode for minimal symbol error probability.

The sum-product algorithm consists of the following three steps:

• *Initialization. The local cost functions* and are initialized as appropriate
(using, e.g., channel information and/or some known a priori distribution).

The intermediate cost functions and are set to one.

• *Iteration. The intermediate cost functions* and are updated a suit-
able number of times as follows (cf. also Figure 3.5). The site-to-check cost
is computed as the product of the site’s local cost and all contributions
*coming into s except the one from E:*

. (3.10)

γ*s*( )*x*_{s}*x*_{s}

γ*s*( )*x*_{s}*p y*( _{s}*x** _{s}*)

γ*s*

*p x*( ) ^{=}

### ∏

_{E}_{∈}

*γ*

_{Q}*E*(

*x*

*)*

_{E}γ*E* *B*_{E}

*G x*( ) = *p x( ) p y x*( )∝*p x y*( ) *G x*( )

*p x*( _{s}*y*) =
*G x*( )′

*x′*∈*B*:*x′** _{s}*=

*x*

_{s}### ∑

γ*s* γ*E*

µ*E s*, µ*s E*,

µ*E s*, µ*s E*,

µ*s E*, ( )*a*

**Figure 3.5 The updating rules for the sum-product algorithm.**

*s* *E*

µ*s E*,

*E*_{1}

*E*_{2}
µ*E*_{1},*s*

µ*E*_{2},*s*

*E*
µ*E s*,

µ*s*_{1},*E*

µ*s*_{2},*E*

*s*

*s*_{1}

*s*_{2}

µ*s E*, ( )*a* := γ*s*( )*a* µ*E*_{1},*s*( )*a* µ*E*_{2},*s*( )*a* µ*E s*, ( )*a* γ*E*(*x** _{E}*)µ

*s*

_{1},

*E*

*x*

_{s}( 1)µ*s*_{2},*E* *x** _{s}*
( 2)

*x** _{E}*∈

*W*

_{E}*x*

*=*

_{s}*a*

### ∑

:=

µ*s E*, ( )*a* γ*s*( )*a* µ*E*′,*s*( )*a*

*E*′∈*Q*:
*s*∈*E*′,*E*′≠*E*

### ∏

:=

The check-to-site cost is obtained by summing over all local configu-
*rations on E that match a on the site s, each term being the product of the*
*check’s local cost and all contributions coming into E except the one from s:*

. (3.11)

Note that the sum in (3.11) actually only runs over (the locally valid con- figurations) since is assumed to be zero for .

• *Termination. The final cost functions* and are computed as follows.

The final site cost is computed as the product of the site’s local cost and
*all contributions coming into s, i.e.,*

, (3.12)

and the final check cost is computed as the product of the check’s local
*cost and all contributions coming into E, i.e.,*

. (3.13)

The fundamental theoretical result for the sum-product algorithm is the following:

**Theorem 3.2** If the check structure is finite and cycle-free, then the cost
functions converge after finitely many iterations and the final cost functions
become

(3.14)

and

. (3.15)

(The proof is essentially identical with that of Theorem 3.1, which is given in Appendix A.1.) An important special case of Theorem 3.2 is when the cost functions corre- spond to probability distributions:

**Corollary 3.3** If the global cost function is (proportional to) some
probability distribution over the configuration space, then the final cost func-
tions are (proportional to) the corresponding marginal distributions for the site

µ*E s*, ( )*a*

µ*E s*, ( )*a* γ*E*(*x** _{E}*) µ

*s*′,

*E*(

*x*

_{s}_{′})

*s*′∈*E*

### ∏

:*s*′≠

*s*

*x*

*∈*

_{E}*W*

### ∑

*:*

_{E}*x*

*=*

_{s}*a*

:=

*B*_{E}

γ*E*(*x** _{E}*)

*x*

*∉*

_{E}*B*

_{E}µ*s* µ*E*

µ*s*( )*a*

µ*s*( )*a* γ*s*( )*a* µ* _{E′}*,

*s*( )

*a*

*E′*∈*Q*

### ∏

:*s*∈

*E′*

:=

µ*E*( )*a*

µ*E*( )*a* γ*E*( )*a* µ* _{s′}*,

*E*(

*a*

*)*

_{s′}*s′*

### ∏

∈*E*

:=

µ*s*( )*a* *G x*( )

*x*∈*B*

### ∑

:*x*

*=*

_{s}*a*

=

µ*E*( )*a* *G x*( )

*x*∈*B*

### ∑

:*x*

*=*

_{E}*a*

=

*G x*( )