Coding for DMC - 二元無記憶通道的最佳極小區塊碼設計

Definition 2.3 A (M, n) coding scheme for a DMC (X , Y, P_Y_|X) consists of

• the message set M = {1, . . . , M} of M equally likely random messages M ;

• the (M, n) codebook (or simply code) consisting of M length-n channel input se-quences, called codewords;

• an encoding function f : M → Xⁿthat assigns for every message m ∈ M a codeword x = (x₁, . . . , xn); and

• a decoding function g : Yⁿ → ˆM that maps the received channel output n-sequence y to a guess ˆm ∈ ˆM. (Usually, we have ˆM = M.)

Note that an (M, n) code consist merely of a unsorted list of M codewords of length n, whereas an (M, n) coding scheme additionally also defines the encoding and decoding functions. Hence, the same code can be part of many different coding schemes.

Definition 2.4 A code is called linear if the sum of any two codewords again is a code-word.

Note that a linear code always contains the all-zero codeword.

The two main parameters of interest of a code are the number of possible messages M (the larger, the more information is transmitted) and the blocklength n (the shorter, the less time is needed to transmit the message):

• we have M equally likely messages, i.e., the entropy is H(M ) = log₂Mbits and we need log₂M bits to describe the message in binary form;

• we need n transmissions of a channel input symbol X_k over the channel in order to transmit the complete message.

Hence, it makes sense to give the following definition.

Definition 2.5 The rate² of a (M, n) code is defined as R! log₂M

n bits/transmission. (2.4)

It describes what amount of information (i.e., what part of the log₂Mbits) is transmitted in each channel use.

However, this definition of a rate makes only sense if the message really arrives at the receiver, i.e., if the receiver does not make a decoding error!

2We define the rate here using a logarithm of base 2. However, we can use any logarithm as long as we adapt the units accordingly.

Definition 2.6 An (M, n) coding scheme for a DMC consists of a codebook C^(M,n) with M codewords xm of length n (m = 1, . . . , M), an encoder that maps every message m into its corresponding codeword x_m, and a decoder that makes a decoding decision g(y) ∈ {1, . . . , M} for every received binary n-vector y.

We will always assume that the M possible messages are equally likely.

Definition 2.7 Given that message m has been sent, let λ_m"

C^(M,n)#

be the probability of a decoding error of an (M, n) coding scheme with blocklength n:

λm

C^(M,n)# ! Pr[g(Y) &= m|X = xm] (2.5)

P_Y_|X(y|x_m) I{g(y) &= m}, (2.6)

where I{·} is the indicator function

I{statement} !

%1 if statement is true,

0 if statement is wrong. (2.7) The maximum error probability λ"

C^(M,n)#

of an (M, n) coding scheme is defined as λ"

C^(M,n)# ! max

m∈Mλm

C^(M,n)#

. (2.8)

The average error probability P_e"

C^(M,n)#

of an (M, n) coding scheme is defined as

Pe"

C^(M,n)# ! 1 M

$M m=1

λm

C^(M,n)#

. (2.9)

Moreover, sometimes it will be more convenient to focus on the probability of not making any error, denoted success probability ψ_m"

C^(M,n)# : ψ_m"

C^(M,n)# ! Pr[g(Y) = m|X = xm] (2.10)

P_Y_|X(y|xm)I{g(y) = m}. (2.11)

The definition of minimum success probability ψ"

C^(M,n)#

and the average success proba-bility³ P_c"

C^(M,n)#

are accordingly.

Definition 2.8 For a given (M, n) coding scheme, we define the decoding region D^(M,n)m

as the set of n-vectors y corresponding to the m-th codeword xm as follows:

D^(M,n)_m ! {y : g(y) = m}. (2.12)

3The subscript “c” stands for “correct.”

Note that we will always assume that the M possible messages are equally likely and that the decoder g is a maximum likelihood (ML) decoder :

g(y) ! arg max

1≤m≤MP_Y_|X(y|xm) (2.13)

that minimizes the average error probability P_e"

C^(M,n)#

among all possible decoders.

Hence, we are going to be lazy and directly concentrate on the set of codewords C^(M,n), called (M, n) codebook or usually simply (M, n) code. Sometimes we follow the custom of traditional coding theory and use three parameters: "

M, n, d#

code, where the third parameter d denotes the minimum Hamming distance, i.e., the minimum number of com-ponents in which any two codewords differ.

Moreover, we also make the following definitions.

Definition 2.9 By dα,β(xm, y) we denote the number of positions j, where xm,j = α and y_j = β. For m &= m^$, the joint composition q_α,β(m, m^$) of two codewords x_m and x_m^" is defined as

qα,β(m, m^$) !d_α,β(x_m, x_m^")

n . (2.14)

Note that d_H(·, ·) ! d_0,1(·, ·) + d_1,0(·, ·) and w_H(x) ! d_H(x, 0) denote the commonly used Hamming distance and Hamming weight, respectively.

The following remark deals with the way how codebooks can be described. It is not standard, but turns out to be very important and is actually the clue to our derivations.

Remark 2.10 Usually, the codebook C^(M,n)is written as an M × n codebook matrix with the M rows corresponding to the M codewords:

C^(M,n)=



 x₁

... xM



 =



c¹ c₂ · · · cn



 . (2.15)

However, it turns out to be much more convenient to consider the codebook column-wise rather than row-wise! So, instead of specifying the codewords of a codebook, we actually specify its (length-M) column-vectors c_j.

Remark 2.11 Since we assume equally likely messages, any permutation of rows only changes the assignment of codewords to messages and has no impact on the performance.

We consider two codes with permuted rows as being equal, i.e., a code is actually a set of codewords, where the ordering of the codewords is irrelevant.

Furthermore, since we are only considering memoryless channels, any permutation of the columns of C^(M,n) will lead to another codebook that is equivalent to the first in the sense that it has the exact same error probability. We say that such two codes are equivalent. We would like to emphasize that two codebooks being equivalent is not the same as two codebooks being equal. However, as we are mainly interested in the performance of

a codebook, we usually treat two equivalent codes as being the same. In particular, when we speak of a unique code design, we do not exclude the always possible permutations of columns.

In spite of this, for the sake of clarity of our derivations, we usually will define a certain fixed order of the codewords/codebook column vectors.

The most famous relation between code rate and error probability has been derived by Shannon in his landmark paper from 1948 [1].

Theorem 2.12 (The Channel Coding Theorem for a DMC) Define C! max

PX(·)I(X; Y ) (2.16)

where X and Y have to be understood as input and output of a DMC and where the maximization is over all input distributions P_X(·).

Then for every R < C there exists a sequence of (2^nR, n) coding schemes with maximum error probability λ"

C^(M,n)#

→ 0 as the blocklength n gets very large.

Conversely, any sequence of (2^nR, n) coding schemes with maximum error probability λ"

C^(M,n)#

→ 0 must have a rate R ≤ C.

So we see that C denotes the maximum rate at which reliable communication is possible.

Therefore C is called channel capacity.

Note that this theorem considers only the situation of n tending to infinity and thereby the error probability going to zero. However, in a practical system, we cannot allow the blocklength n to be too large because of delay and complexity. On the other hand it is not necessary to have zero error probability either.

So the question arises what we can say about “capacity” for finite n, i.e., if we allow a certain maximal probability of error, what is the smallest necessary blocklength n to achieve it? Or, vice versa, fixing a certain short blocklength n, what is the best average error probability that can be achieved? And, what is the optimal code structure for a given channel?

Channel Models

We consider a discrete memoryless channel (DMC) with both a binary input and a binary output alphabets. The most general such binary DMC is the so-called binary asymmetric channel (BAC) and is specified by two parameters: %₀ denotes the probability that a 0 is flipped into a 1, and %1 denotes the probability that a 1 is flipped into a 0, see Fig.3.1.

0 0

1 1

%₀

%₁ 1 − %₀

1 − %₁

X Y

Figure 3.1: The binary asymmetric channel (BAC).

For symmetry reasons and without loss of generality, we can restrict the values of these parameters as follows:

0 ≤ %₀ ≤ %₁ ≤ 1 (3.1)

%₀≤ 1 − %₀ (3.2)

%0≤ 1 − %1. (3.3)

Note that in the case when %₀ > %₁, we simply flip all zeros to ones and vice versa to get

%₀

%₁

Ω

%₀+ %₁= 1 (completely noisy channel)

%₀= %₁ (BSC)

%₀ = 0 (ZC)

0.5 1

Figure 3.2: Region of possible choices of the channel parameters %₀ and %₁ of a BAC. The shaded area corresponds to the interesting area according to (3.1)–(3.3).

an equivalent channel with %₀ ≤ %₁. For the case when %₀ > 1 − %₀, we flip the output Y , i.e., change all output zeros to ones and ones to zeros, to get an equivalent channel with

%₀ ≤ 1 − %₀. Note that (3.2) can be simplified to %₀ ≤ ¹₂ and is actually implied by (3.1) and (3.3). And for the case when %0 > 1 − %1, we flip the input X to get an equivalent channel that satisfies %₀ ≤ 1 − %₁.

We have depicted the region of possible choices of the parameters %₀ and %₁ in Fig.3.2.

The region of interesting choices given by (3.1)–(3.3) is denoted by Ω.

Note that the boundaries of Ω correspond to three special cases: The binary symmetric channel (BSC) (see Fig. 3.3) has equal cross-over probabilities %0 = %1 = %. According to (3.2), we can assume without loss of generality that % ≤ ¹₂.

0 0

1 1

% 1 − %

1 − %

X Y

Figure 3.3: The binary symmetric channel (BSC).

0 0

1 1

%₁

1 − %₁

X Y

Figure 3.4: The Z-channel (ZC).

The Z-channel (ZC) (see Fig. 3.4) will never distort an input 0, i.e., %₀ = 0. An input 1 is flipped to 0 with probability %₁ < 1.

Finally, the case %₀= 1−%₁ corresponds to a completely noisy channel of zero capacity:

given Y = y, the events X = 0 and X = 1 are equally likely, i.e., X ⊥⊥ Y .

A special case of the binary input and ternary output channel is BEC, which is not belong the special case of BAC, we have the transition probability, δ, from zero to one.

The output alphabets of BEC are {0, 1, 2}, which is not binary. Here is the channel model defined as the following figure:

1 2 δ

1 − δ 1 − δ

Figure 3.5: BEC

Due to the symmetry of the BSC and BEC, we have an additional equivalence in the codebook design.

Lemma 3.1 Consider an arbitrary code C^(M,n) to be used on the BSC or BEC and con-sider an arbitrary M-vector c. Now construct a new length-(n + 1) code C^(M,n+1) by appending c to the codebook matrix of C^(M,n) and a new length-(n + 1) code C^(M,n+1) by appending the flipped vector ¯c = c ⊕ 1 to the codebook matrix of C^(M,n). Then the performance of these two new codes is identical:

P_e⁽ⁿ⁺¹⁾"

C^(M,n+1)#

= P_e⁽ⁿ⁺¹⁾,

C^(M,n+1)

-. (3.4)

We remind the reader that our ultimate goal is to find the structure of an optimal code C(M,n)∗ that satisfies

P_e⁽ⁿ⁾"

C^(M,n)∗#

≤ P_e⁽ⁿ⁾"

C^(M,n)#

(3.5) for any code C^(M,n).

Preliminaries

4.1 Error Probability of the BAC

The conditional probability of the received vector y given the sent codeword xm of the BAC can be written as

P_Yⁿ_|X(y|xm) = (1 − %₀)^d^0,0^(x^m^,y)· %^d₀^0,1^(x^m^,y)· %^d₁^1,0^(x^m^,y)· (1 − %₁)^d^1,1^(x^m^,y) (4.1) where we use P_Yⁿ_|X to denote the product distribution

P_Yⁿ_|X(y|x) =

!n j=1

P_Y_|X(yj|xj). (4.2)

Considering that

n = d_0,0(x_m, y) + d_0,1(x_m, y) + d_1,0(x_m, y) + d_1,1(x_m, y) (4.3) the average error probability of a coding scheme C^(M,n) over a BAC can now be written as

P_e"

C^(M,n)#

= 1 M

$M m=1

y g(y)&=m

P_Yⁿ_|X(y|xm) (4.4)

= (1 − %₀)ⁿ M

$M mm=1&=g(y)

. %₀ 1 − %0

/d_0,1(xm,y)

· . %₁

1 − %0

/d_1,0(xm,y). 1 − %1

1 − %0

/d_1,1(xm,y)

(4.5)

where g(y) is the ML decision (2.13) for the observation y.

4.1.1 Capacity of the BAC

Without loss of generality, we can only consider BACs with 0 ≤ %₀ ≤ %₁ ≤ 1 and 0 ≤

%₀+ %₁ ≤ 1.

Proposition 4.1 The capacity of a BAC is given by

C_BAC= %₁

1 − %₀− %₁ · H_b(%₀) − 1 − %₀

1 − %₀− %₁ · H_b(%₁) + log₂ .

1 + 2Hb(!1)−Hb(!0) 1−!0−!1

(4.6)

bits, where H_b(·) is the binary entropy function defined as

H_b(p) ! −p log₂p − (1 − p) log₂(1 − p). (4.7) The input distribution P_X^∗(·) that achieves this capacity is given by

P_X^∗(0) = 1 − P_X^∗(1) = z − %₁(1 + z)

(1 + z)(1 − %₀− %₁) (4.8) with

z ! 2Hb(!1)−Hb(!0)

1−!0−!1 . (4.9)

在文檔中二元無記憶通道的最佳極小區塊碼設計 (頁 15-24)