Multiple blocks update algorithm for matching pursuit video Coding

(1)

MULTIPLE BLOCKS UPDATE ALGORITHM FOR MATCHING PURSUIT VIDEO CODING

Jian-Liang Lin

†∗

_{, Wen-Liang Hwang}

†

_{, and Soo-Chang Pei}

∗

_{Institute of Communication Engineering, National Taiwan University, Taiwan, R.O.C.}

†

_{Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.}

ABSTRACT

Matching pursuit (MP) video coding has been demonstrated to attain a better coding performance than DCT-based video coding in terms of PSNR and perceptual quality at very low bit rates. However, because of its massive computational complexity, the MP algorithm is usually only approximated. By approximating a residual in a subspace, we propose a multiple blocks search and update algorithm in MP video coding to achieve a faster and better MP approximation. In this paper, we evaluate the performance and compare it to the traditional one-block search algorithms. The experi-mental results show that our proposed algorithm can signif-icantly improve the coding performance and encoding time.

1. INTRODUCTION

Matching pursuit, which is a frame-based algorithm, is a promising method for low bit-rate video coding [5]An MP-based codec yields a better PSNR and perceptual quality than a transform-based codec and its decoder is simpler . However, it cannot be used in applications that require real time bidirectional communications because the encoder con-sumes a massive amount of computational time. An MP encoder does not obtain all the coefficients in one step, but iteratively finds the frame coefficient that has the largest ab-solute inner product value between a residual and all the bases. The inner product value and the base from which the value is obtained are called an atom. Many approaches have been proposed to simplify the complex encoding stage. One approach approximates the codewords of a dictionary with a linear combination of simpler codewords so that the computation becomes easier [7, 2, 6, 8].

The most popular approach for finding an atom is pro-posed by Neff and Zakhor [5], in which a residual frame is divided into blocks, and an atom is found within the block with the largest amount of energy at each iteration. This approach is modified in [1], in which an energy weight is given to a block so that the greater the number of atoms chosen from the block, the smaller the energy weight of the block will be. Therefore, the block is less likely to be cho-sen in later iterations. The energy weight approach reduces the likelihood that most of the atoms will be selected from a

few blocks, and improves the PSNR performance over that of Neff and Zakhor’s algorithm. The above algorithms find an atom from the largest (weighted) energy block, we there-fore call them one-block algorithms.

The one-block algorithm is simple and efficient at the cost of sacrificing coding performance. Although coding performance can be improved by finding an atom from more than one block, there is still the issue of the massive number of inner products between a residual and the bases in these blocks. To solve this problem, we approximate a residual in a subspace, spanned by a small number of bases within a few blocks. The bases and the blocks are selected according to the content of the residual, while the coding performance and efficiency are determined by the number of bases and the number of blocks. Our simulations show that our PSNR and efficiency are better than in a one-block algorithm at low bit-rates of various sequences.

2. MP UPDATE ALGORITHM AND ATOM EXTRACTION

Matching pursuit is a frame-based algorithm that represents a signal by a succession of greedy steps [4]. For each itera-tion, the signal is projected to a base that approximates the signal most efficiently. Let the over-complete image bases

be{gγ(x)}, where γ is the index. The matching pursuit

al-gorithm decomposes an image into a linear expansion of the bases as follows.

The image f(x) is first decomposed into

f(x) =< f(x), gγ₀(x) > gγ₀(x) + Rf(x),

where gγ0(x) = arggγ(x)max{| < f(x), gγ(x) > |} and

Rf(x) is the residual image after approximating f(x) in the

direction of gγ0(x). The gγ0(x) and the inner product value

< f(x), gγ0(x) > are called an atom. The matching pursuit

algorithm then decomposes the residual image Rf(x) by

projecting it onto the bases, as was done for f(x). Instead

of recalculating the inner products at each iteration, Mallat and Zhang [4] provide the MP update algorithm. At the kth iteration, let:

gγk= arg max_γ∈Γ | < R

k_{f, g}

γ > |

(2)

be the base of the largest absolute inner product value. The

new residual signal Rk+1_{f is}

Rk+1_{f = R}k_{f− < R}k_{f, g}

γk> gγk.

The inner products between Rk+1_{f and the bases {g}_γ_{} can}

be represented by

< Rk+1_{f, g}_γ _{>=< R}k_{f, g}

γ> − < Rkf, gγ_k>< gγ_k, gγ > .

(1)

Because < Rk_{f, g}

γ > and < Rkf, gγk > have been

cal-culated in the previous iteration, and if < gγk, gγ > is

pre-calculated, this update needs only one addition and multipli-cation. Unfortunately, this update algorithm needs a huge

amount of space to store all non-zero < gγk, gγ > in an

image and is only useful in an one-dimensional signal de-composition.

For image decomposition by MP, because the number of bases is huge, the complexity of applying the MP to the entire image is too high. To overcome this problem, the proposed approach in [5, 1] divides a residual into blocks, and at each iteration, the MP is applied only to the block with the largest energy. This approach is both simple and efficient and has, therefore, been implemented in many MP-based video codecs.

3. MULTIPLE BLOCKS APPROXIMATION

The approach in [5, 1] assumes that the probability of the current largest energy block containing the maximum atom is high. This assumption can be further developed with the correlation between a block containing the maximum atom and the energy of the block. Thus, the energy of a block can be used to determine whether a block should be included in the procedure to find atoms.

3.1. Blocks Selection

LetB be the set of blocks in which to search for atoms.

Before we propose our multiple block selection algorithm, we present the optimal set of blocks for atom selection and show the difficulties in obtaining the optimal set in practice.

For a block b, let P₀(b) be the probability that the maximum

atom is not within b, and let P₁(b) be the probability that the

block b contains the maximum atom. The miss probability

PM means that the block, containing the maximum atom, is

excluded from finding the atom:

PM(B) =

b∈B

P₁(b). (2)

The false alarm probability PF means that an atom is found

in a block that does not contain the maximum atom:

PF(B) =

b∈B

P₀(b). (3)

We define the average performance loss of selecting a

non-maximum atom incurred byB as:

R(B) = PF(B)CF+ PM(B)CM,

where the non-negative numbers CFand CMare the

respec-tive average conditional performance losses when a false alarm or a miss occur. From Equations 3 and 2, we can derive that R(B) = b∈B P₀(b)CF+ b∈B P₁(b)CM = b∈B P₀(b)CF+ b∈B (1 − P1(b))CM = b∈B (P₀(b)CF − P1(b)CM) + b∈B CM = b∈B (P₀(b)CF − P1(b)CM) + CM|B|. (4)

Let the optimal setB∗ _{be the block set that minimizes the}

above equation. Let ˜B be the set of blocks satisfying

˜

B = {b|P0(b)CF− P1(b)CM ≤ 0}. (5)

Equation 5 can be rewritten as ˜

B = {b|P1(b) ≥ τP0(b)}, (6)

where τ = CF

CM. The likelihood of block b can be defined as

L(b) = P1(b)

P₀(b),

and Equation 6 becomes ˜

B = {b|L(b) ≥ τ}. (7)

Because any block inB∗_{must be in ˜}_{B, we have}

B∗_{⊆ ˜B.} ₍₈₎

The optimal block setB∗_{is too difficult to determine. Thus,}

we propose the following ad-hoc procedure to construct the

block setB. This procedure is simple and has proven to be

effective in our simulations.

The correlation between the block containing the max-imum atoms and the blocks with larger amounts of energy is high. Therefore, at each iteration, we include the blocks

of relatively large energy levels intoB. We normalize the

energy of the blocks at each iteration so that the block with

the largest energy becomes 1. A block b is assigned toB

according to its normalized energy ˜||b||2:

b ∈ B if ||b||˜ 2≥ η,

where 0 ≤ η ≤ 1 is a threshold. An atom is then chosen

(3)

Fig. 1. After the base corresponding to the black hexagon

at the center of the gray area is selected, the inner products with the bases covered in the gray area are updated. The black dots are bases.

3.2. Block Content Approximation

Our approach is to approximate the content of a block inB

in a subspace spanned by a few MP bases. Let a block be

|s|2 _{pixels, and}_{|D| be the size of the dictionary D. The}

bases in a block are|s|2_{|D|. For computational efficiency,}

we reduce the number of bases in a block to L. Figure 1

illustrates an example in which L = 2 and|B| = 10.

Our algorithm that multiple blocks and the MP update is described below.

Multiple Blocks Update Algorithm

1. Initialization (k=0): The residual f is first divided

into blocks.If the normalized energy||˜b||2_{of the block}

b is larger than a threshold 0 < η < 1, i.e.

||˜b||2_{≥ η,} ₍₉₎

we assign the block toB and calculate the inner

prod-ucts between the residual and the bases of the block.

We then record the L bases{gb

γl, l = 1, · · · , L}

giv-ing the largest absolute inner products and assign them

to BL.

2. Atom Extraction and Update of the Inner

Prod-ucts (at k-th iteration): Let gγ_kmax be the basis that

gives the largest absolute inner product value. We then update the non-zero inner products according to

gγ_kmax and Equation 1.

3. Update of Block SetB: When an atom be extracted

from a block, the energy of some blocks may change.

For a block that is not inB, if it’s normalized energy is

larger than η, we include it inB. We then calculate the

inner products between Rk+1_{f and the bases within}

this block, and record the L best bases, as we do in Step 1.

4. Next iteration: k = k + 1. If k < n, then go to Step 2.

In the first phase, for each block inB we compute the

in-ner products of the block with all the bases, and from them we select the L bases producing the L largest absolute

val-ues. Let BL be the union of the bases of all blocks.

Be-cause we assume that the approximating residual is in the

subspace spanned by BL, in the second phase we apply MP

to a residual with the restriction on the bases in BL. Each

block in the first phase obtains|s|2_{|D| inner product}

val-ues. It takes a complexity of|s|2_{|D| to obtain the L largest}

absolute inner products from them.

The first phase takes|s|2_{|D||B| inner products, and the}

second phase takes at most|B|L inner products to obtain an

atom. If the non-zero inner products between bases in BL

are at most m≤ |B|L and if a residual takes n iterations on

average, we perform

|s|2_{|D||B| + (n − 1)m}

inner products to approximate a residual. To obtain a bet-ter efficiency than the one-block algorithm, we require that, after n iterations,

|s|2_{|D||B| + (n − 1)m < |s|}2_|D|n. ₍₁₀₎

The term n|s|2_{|D| is the total number of inner products that}

find n atoms by using the one-block algorithm. Because the

number of bases of a block is much smaller than|s|2_|D|,

we can use the MP update algorithm to update the inner products at each iteration.

4. PERFORMANCE EVALUATIONS AND COMPARISONS

We evaluated the performance of our algorithm and com-pared it with the popular algorithms in [5, 1]. An MP atom contains a base and an inner product value. The index of a codeword is encoded by an adaptive arithmetic code. The inner product value is encoded by a bit-plane based approach and the position of a base is located by a quadtree and quadtree representation [3]. Other different MP atom encoding meth-ods can be used, but they change the average number of bits

to encode an atom ra. The first frame of a video sequence

is an intra-frame (I-frame), encoded by DCT, and all other frames are inter-frames (P-frames), encoded by MP.

Table 1 shows the computing time taken to encode the Akiyo sequence by various searching algorithms. In our testing platform, the CPU speed is 2.4 GHz per second. Our algorithm has three components: computing the inner

prod-ucts between a residual and bases (Tip); sorting the largest

L bases for each block (Tsort); and updating inner

prod-ucts for atom candidates (Tup). The computing time of each

component is also shown in the table. Our algorithm com-putes inner products at the first iteration, and updates the inner products at the following iterations. Because updat-ing inner products is relatively faster than computupdat-ing inner

(4)

Table 1. Elapsing time (sec) for encoding the Akiyo

se-quence by different methods. Our algorithms are in the third and the fourth columns with L=100 and L = 400, respec-tively. Algorithm [5] [1] L=100 L=400 Total time 321.20 287.44 183.89 205.33 Tip 185.94 160.65 56.67 67.34 Tsort 9.23 7.76 2.85 3.51 Tup 0 0 3.45 15.43 0.7 0.8 0.9 1 1.1 32.3 32.4 32.5 32.6 32.7 encoding time Y − PSNR [5] [1] L=100 L=400 (a) 24 Kbps 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 34.6 34.65 34.7 34.75 34.8 34.85 34.9 34.95 35 35.05 35.1 encoding time Y − PSNR [5] [1] L=400 L=800 L=1200 L=1600 (b) 44 Kbps

Fig. 2. The plot of average computing time versus PSNR of

various.

products, the overall time of our algorithm is constrained by Tip.

Figures 2 plots of shows the average time versus the PSNR of various sequences at different bit-rates. Our se-quences include the following slow motion sese-quences: Akiyo, Sean, Miss America, Container, Mother and Daughter, Sales-man, and the fast motion sequences: Carphone and Fore-man. The PSNR performance of our method increases as L increases. This implies that using more bases to approx-imate a residual yields a better PSNR, but the overall com-puting time increases. Figure 2 shows that overall comput-ing time increases linearly as a function of L. The data in Figure 2 (a) shows that our approach with L = 400 gives the best average performance in terms of time and PSNR of all methods at 24 Kbps. The PSNR gains with this parame-ter (L = 400) over that of the Neff and Zakhor’s one-block algorithm is on average of 0.4 dB. We normalize the com-puting time of the Neff and Zakhor’s algorithm to 1, so that the comparison will not be affected by the speed of the CPU. The computing time of our algorithm with L = 400 is a

fac-tor of 0.7−0.8 of that of the one-block algorithms. Figure 2

(b) shows that for L between 800 and 1200, our method has

a PSNR gain 0.4− 0.5 dB over that of Neff and Zakhor’s

one-block algorithm at 44 Kbps.

5. CONCLUSION

In contrast to the traditional approach, in which an atom is chosen from the block with the largest (weighted) energy, we approximate a residual in a subspace spanned by a few MP bases. From this approximation, we obtain a new MP atom finding algorithm that uses multiple blocks for atom searching, and uses the MP update algorithm to update in-ner product values. The simulations show that our proposed algorithm outperforms one-block algorithms, in terms of PSNR, as well as computing time. The performance of our method depends on two parameters, namely: the number of blocks and the number of bases in each block. Adaptation of the parameters for different video sequences to achieve the best performance is an issue worthy of further study.

6. REFERENCES

[1] O. Al-Shaykh, E. Miloslavsky, T. Nomura, R. Neff, and A. Zakhor, “Video compression using matching pur-suits”, IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 123–143, Feb. 1999.

[2] P. Czerepi´nski, C. Davies, N. Canagarajah, and D. Bull “Matching pursuits video coding: dictionaries and fast implentation”, IEEE Trans. Circuits Syst. Video Tech-nol., vol. 10, no. 7, pp. 1103–1115, Oct. 2000.

[3] J.L. Lin, W.L. Hwang, and S.C. Pei, “SNR scalability based on bitplane coding of matching pursuit atoms at low bit rates: fine-grained and two-layer”, to appear in IEEE Trans. Circuits Syst. Video Technol.

[4] G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries”, IEEE Trans. Signal Process-ing,Vol. 41, pp. 3397–3415, December 1993.

[5] R. Neff and A. Zakhor, “Very low bit-rate video coding based on matching pursuits”, IEEE Trans. Circuits Syst. Video Technol., vol. 7, pp. 158–171, Feb. 1997.

[6] R. Neff and A. Zakhor, “Matching pursuit video

coding–part I: dictioanry approximation”, IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 1, pp. 13–26, Jan. 2002.

[7] D.W. Redmill, D.R. Bull, and P. Czerepinki, “Video coding using a fast non-separable matching pursuits al-gorithm”, Proc. IEEE Int. Conf. Image Processing., pp. 769–773, 1998.

[8] C. De Vleeschouwer and B. Macq, “Subband dictionar-ies for low-cost matching pursuits of video residues”, IEEE Trans. Circuits Syst. Video Technol., vol. 9, No. 7, pp. 984–993, Oct 1999.