A reduced-complexity image coding scheme using decision-directed wavelet-based contourlet transform

(1)

A reduced-complexity image coding scheme using decision-directed

wavelet-based contourlet transform

Chao-Hsiung Hung, Hsueh-Ming Hang

⇑

Department of Electronics Engineering, National Chiao-Tung University, Hsinchu, Taiwan

a r t i c l e

i n f o

Article history:

Received 1 November 2011 Accepted 18 June 2012 Available online 30 June 2012 Keywords:

Contourlet transform

Wavelet-based contourlet transform Bit-plane coding

Directional ﬁlter bank Directional transform Wavelet transform Image coding

Computational complexity reduction Adaptive Directional Transform

a b s t r a c t

Recently the wavelet-based contourlet transform (WBCT) is adopted for image coding because it matches better image textures of different orientations. However, its computational complexity is very high. In this paper, we propose three tools to enhance the WBCT coding scheme, in particular, on reducing its computational complexity. First, we propose short-length 2-D filters for directional transform. Second, the directional transform is applied to only a few selected subbands and the selection is done by a mean-shift-based decision procedure. Third, we fine-tune the context tables used by the arithmetic coder in WBCT coding to improve coding efficiency and to reduce computation. Simulations show that, at com-parable coded image quality, the proposed scheme saves over 92% computing time of the original WBCT scheme. Comparing to the conventional 2-D wavelet coding schemes, it produces clearly better subjective image quality.

1. Introduction

Wavelet-based image coding method becomes a popular image compression topic in recent years [1,2]. For example, it was adopted by JPEG2000[2]as an international image coding stan-dard. Typical wavelet-based image coding scheme consists of three stages: two-dimensional discrete wavelet transform (2-D DWT), quantization, and arithmetic coding [2]. A digital image is first transformed by 2-D DWT to produce a set of transform coefficients. After quantization, these coefficients are compressed to a binary stream by an entropy coding tool.

However, the 2-D DWT is inefﬁcient in representing the edge signals that are not aligned with the vertical or the horizontal axes

[3]. Many 2-D directional transforms have thus been developed to solve this problem[5–8],[13–19]. Among them, the wavelet-based contourlet transform (WBCT)[19] technique has the critical-sam-pling property, consumes comparatively less computational power, and requires no side information for decoding. Therefore, we focus on WBCT in this study.

The arithmetic coding methods [9,10,21,22,33–35] are com-monly adopted to compress the transformed/quantized coefﬁ-cients. Particularly, the embedded block coding with optimized truncation (EBCOT) [9] technique was adopted by JPEG2000. It was originally designed for intra-subband coding. In this study,

we adopt ESCOT (3-D embedded subband coding with optimized truncation)[33], which is an extension of EBCOT to inter-frame video coding, because our future work is targeting at video coding. Combining WBCT and ESCOT, a WBCT image coding scheme can achieve a better coding performance than a regular 2-D DWT im-age coding scheme. However, there are a few issues in the existing WBCT coding schemes. They need a large amount of computations because the existing WBCT directional filters have a large support. And, we found that for a specific picture, some WBCT frequency subbands do not need further directional transform. Furthermore, the context table in ESCOT needs adjustment to match the charac-teristics of quantized WBCT coefficients.

To solve these issues, we propose three tools in this paper to en-hance the WBCT image coding scheme. First, we suggest a set of short-length 2-D directional ﬁlters [38] and verify their perfor-mance. Second, we design a mean-shift-based decision scheme to dynamically select the proper subbands for directional transform

[39]. Third, we re-design the context tables of ESCOT to match the data directionality. With these tools, our proposed scheme re-duces 92% or higher the computational complexity of the original WBCT image coding scheme at similar visual quality[38].

The rest of this paper is organized as follows. Literature reviews and detailed problem statements are given in Section2 and Section 3describes the use of short-length 2-D ﬁlters. Section4presents a mean-shift-based decision algorithm for choosing the proper subbands for directional transform. Section5illustrates new entro-py-coding context tables that are optimized for compressing the

⇑Corresponding author. Fax: +886 3 5731791.

E-mail address:[email protected](H.-M. Hang).

Contents lists available atSciVerse ScienceDirect

J. Vis. Commun. Image R.

(2)

ﬁltered coefﬁcients. Experimental results are shown in Section6. Finally, Section7concludes this paper.

2. Literature reviews and problem statements

Wavelet-based image coding systems typically consist of trans-form, quantization, and entropy coding. In this section, we brieﬂy review the evolvement of directional transforms from 2-D DWT to WBCT. Then, we summarize the operations of ESCOT. When we put these two elements together, they form a conventional WBCT scheme.

2.1. Literature review: directional transform

2-D DWT is the tensor product of two one-dimensional discrete wavelet transforms (1-D DWT), and the Daubechies 9-7 wavelet filter is often in use. 1-D DWT can represent the piecewise smooth 1-D signals by a few coefficients[2]. But the outputs of 2-D DWT would contain many small coefficients for 2-D edges when these edges are not aligned with the vertical or the horizontal axes[3]. If we quan-tize these coefficients to zero, the coded image shows Gibbs artifacts along the edges [4]. To solve this problem, 2-D multiresolution transforms such as the spatial-domain multiresolution directional transform (SMDT) and the frequency-domain multiresolution direc-tional transform (FMDT)[20]have been developed to improve the directionality of 2-D DWT. SMDT uses the adaptive directional 1-D DWT to align with the image texture direction[5–8]. In this way, SMDT can pack the signal energy into a few subbands. But it needs a huge amount of computation to decide a suitable direction and it also requires extra side information for carrying the direction information. On the other hand, FMDT uses a set of pre-selected 2-D filters to perform multiresolution directional decomposition

[11–17]. Each filter corresponds to a basis function with specific spatial direction and resolution. FMDT can represent 2-D directional texture patterns by relatively few large coefficients. It needs less computational power and requires no side information for decod-ing. Therefore, we focus on the FMDT approach in this study.

2-D DWT and Laplacian pyramid (LP)[21]are two frequently used multiresolution transforms.Fig. 1(a) shows the filter bank structure of a 2-D DWT. After transform, it outputs four subband signals – HL (the horizontal high-pass and vertical low-pass sub-band signal), LH (the horizontal low-pass and vertical high-pass subband signal), HH (the horizontal pass and vertical high-pass subband signal), and LL (the horizontal low-high-pass and vertical low-pass subband signal). G1(z)G4(z) are the filters with specific pass bands and their output frequency partitions are given in

Fig. 1(b). D2represents the decimation matrix, and D2= 2I2, where I2is an identity matrix. 2-D DWT is a critical-sampling transform that keeps the same amount of data after one transform.

In contrast,Fig. 1(c) shows a typical LP wavelet system that does not satisfy the critical-sampling condition. It decomposes the input into one low-pass subband signal, LL, and one high-pass subband signal, H. F1(z) is the corresponding synthesis ﬁlter for the analysis ﬁlter G1(z).Fig. 1(d) shows the frequency partition of these two subbands. When the synthesized subband signal LL is sub-tracted from the original input, it produces the high-pass subband signal H. Without down-sampling, subband H is free from fre-quency scrambling. In this case, the LP system behaves as an over-sampling transform and it increases 25% data size after the transform.

The subband signal LL in LP (Fig. 1(c)) is identical to the subband signal LL in 2-D DWT when their G1(z) and D2are the same. That is, these two LL signals occupy the same frequency partition as in

Fig. 1(b) andFig. 1(d), respectively. In a multi-level 2-D DWT, the subband signal LL produced by the ﬁrst 2-D DWT is further pro-cessed by the sub-sequent 2-D DWT’s. Likewise, in a multi-level LP, the LL subband signal may be further processed by sub-sequent LP.

In the several variations of FMDT, contourlet transform (CT)

[3][11]and wavelet-based contourlet transform (WBCT)[17]adopt the directional ﬁlter banks (DFB) structure [22] as shown in

Fig. 2(a), where four 2-D filters and four decimation matrices are illustrated. These four 2-D filters decompose the input signal to four directional subbands. Each subband has a specific directional pass band. These 2-D filters, A1(z)A4(z), are fan filters and their corresponding output frequency partitions are drawn inFig. 2(b). The decimation matrices rotate and down-sample the signals along specific directions. The first stage of CT uses LP to produce the sub-band signals, LL and H. It further uses the DFB inFig. 2(a) to decom-pose the subband signal H into four directional subband signals;

Fig. 2(c) shows the frequency partition of a typical CT. Because LP increases the data size, it is less preferred in the compression scenario. Therefore, another structure, WBCT, was proposed. It uses 2-D DWT to ﬁrst generate four subbands, LL, HL, LH and HH. It fur-ther decomposes each of the three higher subband signals, HL, LH, and HH, by the DFB inFig. 2(a).Fig. 2(d) shows the frequency par-tition produced by WBCT, wherein the twelve directional subbands are labeled from 1 to 12. WBCT has the critical-sampling property and it maintains the same data size. Thus, it is more desirable for compression purpose. Therefore, in this study we focus on the WBCT structure.

2.2. Literature review: arithmetic coding

After transform and quantization, we use arithmetic coding to compress the produced coefﬁcients. Arithmetic coding algorithms

[9,10,18,19,26–28]provide rather good compression efﬁciency by considering three types of correlations among the coefﬁcients.

↓D2 ↑D2 G1(z) F1(z) LL (to next LP) H -+ G1(z) G2(z) G4(z) G3(z) ↓D2 ↓D2 ↓D2 ↓D2 LL (to next 2-D DWT) LH HL HH

(c)

(a)

π /2 π 0 -π /2 -π Fy -π -π /2 0 π /2 π Fx

(b)

LL LH HL HL LH HH HH HH HH π /2 π 0 -π /2 -π Fy -π -π /2 0 π /2 π Fx

(d)

LL H

(3)

First, the inter-subband coding methods, such as the set partition-ing in hierarchical tree (SPIHT) method[18]and the embedded zero-trees of wavelet transform (EZW) method[19], mitigate the inter-band correlations in a tree structure. Second, the intra-subinter-band coding methods partition the coefficients in one subband to several non-overlapped coding blocks and handle only the correlations among the neighbors in one coding block (the intra-subband corre-lations). Examples in this category are the embedded block coding with optimized truncation (EBCOT) method[9], the 3-D embedded subband coding with optimized truncation (ESCOT) method [26], and the tarp-filter-based system that classifies coefficients to achieve embedding (TCE) method[10]. Third, the mixed inter-subband and intra-subband coding methods cover both the inter-subband and intra-subband correlations. Examples are the embedded conditional entropy coding of wavelet coefficients (ECECOW) method[27] and the embedded coding using zeroblocks of wavelet coefficients and con-text modeling (EZBC) method[28]. To save computing power, for single image compression, we use the intra-subband coding meth-ods in this study.

Among the intra-subband coding methods, EBCOT is popular for image coding and is adopted by JPEG2000; ESCOT is popular for vi-deo coding and was adopted by the wavelet vivi-deo coding reference software in the MPEG scalable coding standard development. EB-COT’s context model includes eight neighbors (3 3 square) of a to-be-processed coefﬁcient. ESCOT extends this context model by considering all spatial-temporal neighbors in a 3 3 3 cube. We adopt ESCOT to ease our future development in video coding. 2.3. Problem statement

We explore three issues of the current WBCT image coding scheme in this study: filter replacement, subband skipping, and context modification. The first target is to reduce the computation of filtering. In a typical WBCT scheme, the input image is first pro-cessed by a 2-D DWT and four subbands are generated: LL1_{, HL}1_, LH1_{, and HH}1_{. Then, we filter the LL}1_{subband signal again by 2-D} DWT to obtain LL2, HL2, LH2, and HH2. Likewise, we recursively ap-ply 2-D DWT to the LLi_{subband, and produce LL}i+1_{, HL}i+1_{, LH}i+1_{, and} HHi+1_{, wherein ‘i’ represents the 2-D DWT iterations. Also, we can} use DFB inFig. 2(a) to decompose HLi_{, LH}i_{, and HH}i_{at the chosen} levels.

LL1_{and its split subband signals (LL}i_{, HL}i_{, LH}i_{, and HH}i_{, where} i > 1) contain the low and mid frequency components in the sensi-tive range of human visual system. When we apply the DFB in

Fig. 2(a) on these subbands and quantize their transform coefﬁ-cients, the ringing effects may appear on the smooth image re-gions. Thus, we tend to represent these coarse subband signals by 2-D DWT bases[16]. On the other hand, we apply the

direc-tional transform to HL1_{, LH}1_{, and HH}1_{to match their directional} textures. But some of these subbands may be inappropriate for directional transform.

InFig. 2(a), fan filters are constructed by using the hourglass fil-ters. Typical hourglass 2-D filters are in the form of 23 23 and 45 45 matrices [23]. When we convolute them to generate A1(z)A4(z) in Fig. 2(a), the sizes of the fan filters are 45 45, 67 67, or 89 89. Because the fan filter sizes are large, the filter-ing process costs a huge amount of computation and may produce many small coefficients. Therefore, it is desirable to use shorter filters.

2-D DWT and ESCOT match quite well. The transformed/quan-tized coefficients typically have stronger energy along the horizon-tal and the vertical directions because these are the major directions of filers. This horizontal and the vertical direction prop-erties are considered in the ESCOT design and thus the 2-D DWT+ESCOT scheme provides good compression performance. However, the major direction of transform coefficients produced by WBCT is decided by the directional filter of that subband. Apply-ing the original ESCOT context tables to code these coefficients is inappropriate. Therefore, we need to construct new ESCOT context tables for WBCT subbands.

3. Short-length 2-D ﬁlters

To reduce computational load of the current WBCT, we design new short-length 2-D filters (SLF). The design procedure is as fol-lows. We first choose an appropriate 1-D filter, up-sample it, and map it to a 2-D filter.

We begin our design from a 1-D type-II linear phase finite im-pulse response filter[23,25]. Eq.(1)is a 1-D prototype filter b(z), wherein the coefficients {

v

k} satisfy (2) so that b(ej0) = 1. When N1= 1 (short ﬁlter), b(z) has a wide transition band. To keep a good balance between the transition band width and the ﬁlter length, we select N1= 2, and thus,

v

1= 0.5916 and

v

2= 0.0982.Fig. 3(a) and (b) show the magnitude and the phase responses of b(z).

We up-sample b(z) by 2 and get b(z2).Fig. 3(c) and (d) show the magnitude and the phase responses of b(z2_{). In}_{Fig. 3}_{(d), b(z}2₎ con-tains a phase discontinuity of

p

at frequency 0.5

p

. Because of this phase discontinuity, the left-side and the right-side amplitudes in

Fig. 3(c) have different signs.

bðzÞ ¼X N1 k¼1

v

k ðzN1þkþ zN1kþ1Þ ð1Þ XN1 k¼1

v

k¼ 0:5 ð2Þ A1(z) A2(z) A4(z) A3(z) ↓D2 ↓D2 ↓D2 ↓D2

(a)

π /2 π 0 -π /2 -π Fy -π _{-π /2} ₀ _{π /2} _π Fx

(b)

DS1 DS2 DS3 DS4 DS1 DS1 DS2 DS2 DS3 DS3 DS4 DS4 π /2 π 0 -π /2 -π Fy -π /2 0 π /2 π Fx

(d)

LL π /2 π 0 -π /2 -π Fy -π _{-π /2} ₀ _{π /2} _π Fx

(c)

LL 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12

Fig. 2. (a) A four directional DFB structure. (b) Frequency partitions produced by the DFB in (a). (c) Frequency partition produced by CT. (d) Frequency partition produced by WBCT.

(4)

We then map b(z2) to a 2-D filter[24]. From b(z2), we derive the quadrant filters and rotate them by 45 degrees to construct the hourglass filters[12]. InFig. 4, the symbol zhdenotes the horizontal frequency, and zvdenotes the vertical one. InFig. 4(a), we shift b(z2_{) by 0.5}

_p

_{along the negative frequency axis and the shifted} b(z2_{) in horizontal direction is denoted by}

_a

_(z

h2). Similarly, the shifted b(z2_{) in vertical direction is denoted by}

_a

_(z

v2) inFig. 4(b). In Fig. 4 (c), we multiply

a

(zh2) and

a

(zv2) together to obtain a quadrant ﬁlter

a

(zh,zv). Accordingly, the four acquired quadrant fil-ters are defined by(3)–(6) [12]. We rotate these quadrant filters by

(7)to obtain the hourglass ﬁlters. In(7), an hourglass ﬁlter A’(

x

) is obtained from a quadrant ﬁlter A(

x

)[3], wherein Q0and Q1are the quincunx sampling matrices speciﬁed by(8).

H0ðzh;zvÞ ¼ ð1 þ

a

ðzh;zvÞÞ= ffiffiffi2 p ð3Þ H1ðzh;zvÞ ¼ z0ð ffiffiffi 2 p ðpffiffiffi2H0ðzh;zvÞ 1ÞH0ðzh;zvÞÞ ð4Þ F0ðzh;zvÞ ¼ zh1H1ðzh;zvÞ ð5Þ F1ðzh;zvÞ ¼ zh1H0ðzh;zvÞ ð6Þ A0ð

x

Þ ¼ AðQT 0

x

Þ ¼ Að 1 2Q T 1

x

Þ ¼ Að 1 2Q0

x

Þ ð7Þ Q0¼ 1 1 1 1 ;Q1¼ 1 1 1 1 ð8Þ Fig. 5shows a cascaded DFB structure[3]. The left half, H’0(zh, zv) and H’1(zh, zv), are the analysis filters, and the right half, F’o(zh, zv) and F’1(zh, zv), are the corresponding synthesis filters. The signals DS1DS4are identical to those inFig. 2(a) and their frequency par-titions are inFig. 2(b). This two-level analysis DFB structure con-sists of hourglass filters and quincunx sampling matrices. We rotate the quadrant filter H0(zh, zv) in (3) to obtain the hourglass fil-ter H’0(zh, zv). H’1(zh, zv), F’o(zh, zv) and F’1(zh, zv) are designed similarly.

The sizes of our proposed 2-D hourglass short-length ﬁlters (SLF) are 7 7 and 13 13. They are much smaller than the sizes (23 23 and 45 45) of their corresponding long-length ﬁlters (LLF)[23].Fig. 6shows the magnitude responses of SLF and LLF. Although the transition band of SLF seems wider than that of the LLF, SLF matches the image local variation well due to its small size.

Table 1shows the impacts of SLF and LLF on the DFB computa-tional complexities. We compare two DFB implementations, direct structure and ladder structure, on the non-zero SLF/LLF coeffi-cients. S is the size of input image. The numbers of multiplications and additions are proportional to S. The runtime is measured by running Matlab r2008b on a PC with Intel Core 2 Quad Q9400 CPU. The numbers of multiplications and additions include both convolution and down-sampling operations. When the sizes of the hourglass filters are 23 23, 45 45, 7 7 and 13 13, the numbers of nonzero coefficients are 145, 649, 17 and 65, respec-tively. For both the direct and the ladder structures, the SLF-based

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -1.5 -1 -0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 magnitude magnitude phase ( π radian) phase ( π radian) frequency (_{π radian)}

frequency (π radian) frequency (π radian)

frequency (_{π radian)}

(b)

(a)

(d)

(c)

Fig. 3. (a) Magnitude response of b(z). (b) Phase response of b(z). (c) Magnitude response of b(z2

). (d) Phase response of b(z2 ). (-π,-π) 1 1 1 1 -1 -1 -1 -1 (a)α(zh2) (b)α(zv2) (c)α(zh, zv) (π,π) (π,π) (π,π) (-π,-π) (-π,-π) zh zh zh zv zv zv 1 -1 1 -1

(5)

DFB takes approximately only 10% multiplications and additions of those of the LLF-based DFB. In the runtime proﬁle, the SLF-based DFB saves roughly 80% computation time in both structures. The performance gap between our theoretical estimates (multiplica-tions and addi(multiplica-tions) and experimental measurements (runtime) are largely due to data transfer (disk access).

4. Mean-shift-based decision on subband selection

In the WBCT image coding scheme, we apply the directional transform to the LH1_{, HL}1_{, and HH}1_{subbands. Yet, only the} sub-band signal with significant energy in that direction would benefit from the directional transform. We thus try to locate the subbands with this property. Essentially, we identify the energy peaks and find their locations.

Mean shift technique is adopted to locate the energy peaks in the frequency spectrum. Mean shift is an iterative, nonparametric

estimator of the peak location[32][33]; it finds a path to local max-imum[34]. Let {xi}i=1. . .nbe an arbitrary n-point data set in the d-dimensional Euclidean space Rd_{. First, we calculate the mean shift} vector m(x) by(9), wherein x is the center of current window, h is the window radius, and K(x) is the flat kernel defined by(10). Then, we update the center by setting m(x) + x as the center of the next window. We repeat this process until m(x) converges to 0.

mðxÞ ¼ Pn i¼1xiKðxx_hiÞ Pn i¼1KðxxhiÞ x ð9Þ KðxÞ 1; if kxk 6 1 0; if kxk > 1 ð10Þ Fig. 7shows our proposed mean-shift-based decision process for selecting the subbands. To illustrate the decision ﬂow, we use a 512 512 pixel, 256 gray-level image as the input.

↓Q1 ↓Q1 ↓Q1 ↓Q1 DS1 DS2 DS3 DS4 ↓Q0 ↓Q0 ↑Q1 ↑Q1 ↑Q1 ↑Q1 ↑Q0 ↑Q0 H’0(zh, zv) H’0(zh, zv) H’0(zh, zv) H’1(zh, zv) H’1(zh, zv) H’1(zh, zv) F’1(zh, zv) F’1(zh, zv) F’1(zh, zv) F’0(zh, zv) F’0(zh, zv) F’0(zh, zv)

Fig. 5. A four-channel cascaded DFB.

Fig. 6. (a) LLF, whose size = 23 23[29]. (b) SLF, whose size = 7 7.

Table 1

The computational complexity and run time measured on the non-zero ﬁlter coefﬁcients.

LLF SLF

Direct structure Ladder structure Direct structure Ladder structure Number of Multiplications 4S(145 + 649 + 2) = 3124S 4S(144 + 2) = 584S 4S(17 + 65 + 2) = 336S 4S(16 + 2) = 68S Number of Additions 4S(145 + 649 + 2) = 3124S 4S(144 + 2) = 584S 4S(17 + 65 + 2) = 336S 4S(16 + 2) = 68S

S = 512 512 43.656 s 14.938 s 9.078 s 3.953 s

S = 256 256 10.906 s 3.813 s 1.797 s 0.854 s

(6)

4.1. Energy spectrum smoothing

We calculate the input image frequency spectrum by the 2-D discrete Fourier transform (2-D DFT). The frequency spectrum com-prises 512 512 discrete frequency components (DFC). The DFC is generally a complex number with the form in(11)and their energy levels are in form of(12). Herein, (x, y) represents the coordinate pair of a DFC, 1 6 x 6 512, and 1 6 y 6 512.

mðx; yÞ ¼ aðx; yÞ þ bðx; yÞi ð11Þ

cðx; yÞ ¼ ðaðx; yÞÞ2

þ ðbðx; yÞÞ2 ð12Þ

InFig. 8, we copy the left-most column to the right-most col-umn border and copy the up-most row to the bottom-most row border in order to get a symmetric energy spectrum. The zero fre-quency DFC is at (257, 257).

Fig. 9(a) shows the energy spectrum c(x, y) of the input image Pepper, wherein the energy levels are in log10scale, i.e., log10 (c(-x, y)). It contains many small peaks. These small peaks may cause misjudgment on cluster identiﬁcation. Therefore, we use a smooth-ing operator (deﬁned inFig. 9(c)) to reduce small peaks[31].Fig. 9

(b) shows the smoothed energy spectrum. The large energy peaks typically stand out after smoothing.

4.2. Choosing the representative energy level based on the low frequency components

Fig. 9(b) shows natural images contain strong low frequency components. We choose it as the basis for calculating the thresh-old value for identifying energy peaks. Fig. 10(a) shows the subband signals generated by WBCT and Fig. 10(b) shows the DFC coordinates in the upper half subband LH 4-0. The gray area is called the low frequency zone, and the white area is the high frequency zone. Because the upper half subband is symmetric to its lower half, we only look at the DFC in the upper half of LH1_. The upper half of LH1 _{is the region bounded by 129 6 x 6 385} and 1 6 y 6 129. Along each column x of LH1, we calculate the mean

l

(x) and the variance

r

(x) of the DFC by (13) and (14). We ﬁnd that the DFC magnitudes in the center three columns (256 6 x 6 258) usually have large means and small variances. Similar property holds for HL1_{. Therefore, we set the width of} low frequency zone in LH1 _{and HL}1 _{to 3 when the input image} size is 512 512.

l

ðxÞ ¼X 129 y¼1 log10cðx; yÞ 129 ;129 6 x 6 385 ð13Þ

r

ðxÞ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X129 y¼1 ðlog10cðx;yÞÞ 2 129 X129 y¼1 log10cðx;yÞ 129 !2 v u u t ;1296x6385 ð14Þ

To detect the peaks, we calculate the representative energy lev-els of the low frequency components. Eq.(15)computes the DFC mean of the LH1_{low frequency zone, and}₍₁₆₎_{computes that of}

the HL1 _{low frequency zone. With these DFC means, we define} the representative energy level LH_L for LH1 by(17), and HL_L for HL1_by₍₁₈₎_{. Essentially, we like to select a threshold that} iden-tifies the peaks with ‘‘significant’’ energy. In(17), when the average energy level of low frequency components in HL subband is at least four times larger than that in the LH subband, we use the former as the threshold; otherwise, the latter. The parameter ‘‘log10(4)’’ de-notes the case that the large energy is at least 4 times of small ones. Correspondingly, the absolute magnitude of the large energy is at least twice of that of the small energy because the energy is the square of the absolute value. In this case, the difference in bit plan coding is significant. LH

l

¼ P129 y¼1 P258 x¼256log10cðx; yÞ 3 129 ð15Þ HL

l

¼ P258 y¼256 P129 x¼1log10ð4Þcðx; yÞ 3 129 ð16Þ

ifððHL

l

LH

l

Þ < log10ð4ÞÞLH L ¼ LH

l

; else LH L ¼ HL

l

ð17Þ

ifððLH

l

HL

l

Þ < log10ð4ÞÞHL L ¼ HL

l

; else HL L ¼ LH

l

ð18Þ

4.3. Deciding thresholds for directional subbands

A directional subband sometimes contains stronger energy level than the low frequency components. We consider this situation and adjust threshold in this step. We try to determine a peak detec-tion threshold for every WBCT subband. Take the subband LH 4-0 as an example. We only look at the upper half of LH 4-0 because the DFCs in the upper half of LH 4-0 are symmetric to those in the lower half of LH 4-0. In the upper half of LH 4-0, we ﬁrst con-sider only the DFC outside the low frequency zone. We calculate the mean LH_4-0_

l

and the variance LH_4-0_

r

outside the low frequency zone in LH 4-0, i.e., the c(x, y) of white area inFig. 10

(b), and construct a Gaussian distribution using the calculated mean and variance. InFig. 11, each Gaussian distribution approxi-mates its corresponding energy histogram well. Thus, the peak detection threshold for LH 4-0 is set by(19). The parameter b in

B. Choosing the Representative Energy Level based on Low Frequency Components

D. Peak Identification using a Mean-Shift-based Procedure A. Energy Spectrum Smoothing

C. Deciding Thresholds for Directional Subbands

Fig. 7. The ﬂowchart of the proposed mean-shift-based decision algorithm.

1 1 1 2 1 3 2 1 2 2 2 3 3 1 3 2 3 3 510 1 510 2 510 3 511 1 511 2 511 3 512 1 512 2 512 3 513 1 513 2 513 3 1 510 1 511 1 512 2 510 2 511 2 512 3 510 3 511 3 512 1 513 2 513 3 513 510 510 511 510 512 510 513 510 510 511 511 511 512 511 513 511 510 512 511 512 512 512 513 512 510 513 511 513 512 513 513 513 horizontal coordinate x 1 2 3 510 511 512 513 vertical coordinate y 1 2 3 510 511 512 513 x y = c(x, y) c (513, y ) = c (1, y), 1≤ y ≤512 c(x, 513) = c(x, 1), 1≤ x ≤513 (a)

Fig. 8. The coordinates of energy coefﬁcients c(x, y). The padded data are in gray background.

(7)

(19)is chosen to be 0.7 because we like to eliminate the 75% DFC candidates. Together with the representative energy level LH_L de-ﬁned earlier, 25% or fewer DFC candidates may be identiﬁed as en-ergy peaks. We repeat similar procedures on LH 4-1LH 4-3, and HL 4-0HL 4-3.

Generally, the transmission priority of HH1_{is lower than the} other subbands due to its lower information contents. Because of its low energy, we use the thresholds of its neighboring subbands to identify the energy peaks in HH1_{. For example, we set the} threshold HH_4-0_T of HH 4-0 by (20) using the parameters of HL 4-1.

LH 4 0 T ¼ MaxðLH L; LH 4 0

l

þ b LH 4 0

r

Þ ð19Þ

HH 4 0 T ¼ MaxðHL L; HL 4 1

l

þ b HL 4 1

r

Þ ð20Þ

4.4. Peak identiﬁcation using a mean-shift-based procedure

We like to identify a directional subband that has significant en-ergy by examining the discrete frequency components (DFC) of an image. This typically is caused by periodic texture patterns. And its corresponding DFC pattern is a cluster of DFCs with high energy. Thus, an energy peak in this paper is defined as a cluster of coeffi-cients (c(x, y) in a neighborhood) whose energy level is larger than the threshold. It has two properties: the energy level is high and these high-energy DFC coefficients are clustered in a small neigh-borhood. We use an image cluster identification scheme, Mean-Shift technique, to allocate them.

1) When a c(x, y) within a directional subband and outside the low frequency zone is greater than the threshold of that subband, its location (x, y) is set to be the center of a search window. We then calculate its mass center coordinates (xmass, ymass) by(21). The win-dow size is chosen to be 11 11, or, roughly, its radius r = 5, be-cause a small radius often leads to too many small peaks and a large radius sometimes misses peaks. In the search procedure, we extend the coefﬁcients outside the subband boundary by peri-odic extension. ðxmass;ymassÞ ¼ Pxþ5 m¼x5 Pyþ5 n¼y5m cðm;nÞ Pxþ5 m¼x5 Pyþ5 n¼y5cðm;nÞ ; Pxþ5 m¼x5 Pyþ5 n¼y5n cðm;nÞ Pxþ5 m¼x5 Pyþ5 n¼y5cðm;nÞ ! ð21Þ

We round xmassand ymassto the nearest integers and set the rounded (xmass, ymass) as the center of next search window. Then, we use(21)again to update the mass center. We repeat this proce-dure until the rounded (xmass, ymass) converges. Thus, a peak candi-date is identiﬁed.

2) The number of the peak candidates is recorded by a table d(x, y). The initial values of all entries of d(x, y) are 0. When we identify a DFC at (x, y) as an energy peak candidate, we increase d(x, y) by 1. When the table value of a speciﬁc location (x, y) is greater than 10 and it is also the largest d(x, y) within a 3 3 win-dow, the DFC located at (x, y) is judged as an energy peak. When one subband contains one or more energy peaks in the high frequency zone, it is considered to be suitable for directional decomposition.

Fig. 9. (a) Energy spectrum of image Pepper. (b) Smoothed energy spectrum of image Pepper. (c) Smoothing operator.

(a)

(b)

LL LH 4-0 LH 4-0 LH 4-1 LH 4-1 LH 4-2 LH 4-3 LH 4-2 LH4-3 HH 4-0 HH 4-0 HH 4-2 HH 4-2 HH 4-1 HH 4-1 HH 4-3 HH 4-3 HL 4-1 HL 4-1 HL 4-3 HL 4-3 HL 4-2 HL 4-2 HL 4-0 HL 4-0 π /2 1 129 257 385 _513π 0 -π /2 -π -π -π /2 0 π /2 π 1 129 257 385 513 horizontal coordinate x horizontal frequency Fx vertical frequency Fy vertical coordinate y 257 1 258 129 259 128 259 129 384 129 383 127 383 129 385 129 383 128 384 128 257 127 258 127 259 127 257 128 258 128 257 129 257 258 259 383 384 385 1 2 3 127 128 129 vertical coordinate y horizontal coordinate x 4 2 4 3 260 4 257 126 258 126 259 126 260 126 260 127 260 128 260 129 382 127 382 126 382 128 382 129 4 126 260 382 257 2 257 3 257 4 258 2 258 3 258 4 259 4 259 3 x y = c(x, y)

Fig. 10. (a) The subband frequency domain partition produced by WBCT. (b) The DFC coordinates in the upper half subband LH 4-0. The gray area in (a) and (b) is the low frequency zone.

(8)

Table 2shows some representative test images and their band-decomposition decision results for each subband. All images are images of 256 gray levels, and their sizes are 512 512 pixels. For each subband, the ‘‘(x, y)’’ column denotes the max energy peak location, and the ‘‘suitable for DT’’ column denotes the decision re-sult. AsTable 2shows, the directional transform is inadequate for all subbands of the test image Lena; some subbands of Barbara, Fin-gerprint, Pepper, Boat, and Couple are suitable for directional trans-form, and all subbands of Elaine benefit from the directional transform.Fig. 12shows the identified peaks by red dots. We fail to identify some peaks for two reasons. First, some peaks contain energy lower than the threshold. Second, when a peak is near the low frequency zone, clusters identified by the mean mean-shift scheme are occasionally in the low frequency zone.Fig. 19shows a portion of test images Barbara and Elaine. They contain periodic signals. Identifying these signals in the spatial domain is hard. These periodic signals are corresponding to energy peaks in the quency domain and thus we perform peak identification in fre-quency domain.

4.5. Computational Complexity

We now look at the computational complexity issue of our deci-sion algorithm. We examine the amount of multiplications and additions for the steps inFig. 7. We assume that the input image size is S = W H. Herein, W is the width of the input image and H is the height. We also assume that W and H are all power of 2 and we can implement the 2-D DFT in the radix-2 fast Fourier transform (FFT) structure.

1) In Step A ofFig. 7, we apply 2-D DFT to the input image, ob-tain its energy spectrum, and then apply a smoothing ﬁlter to the spectrum. The 2-D DFT is implemented by the radix-2 FFT, and thus the required numbers of real-value additions and multiplica-tions are given by(22) and (23), in which ceil(x) means the small-est integer greater than or equal to x. Next, Eq.(12)needs 1 real addition and 2 real multiplications to calculate the energy of a DFC. For the entire image, the required numbers of real additions and real multiplications are in(24) and (25). The smoothing oper-ator inFig. 9(c) requires 8 real additions and 1 real multiplication

Fig. 11. The DFC energy histograms of some directional subbands. Each histogram is approximated by a Gaussian distribution. The directional subbands and the corresponding images are (a) LH 4-0 of Boat, (b) HL 4-3 of Lena, (c) LH 4-3 of Pepper, and (d) HL 4-0 of Fingerprint.

Table 2

Some test images, their max energy peak location in each subband ((x, y)) and the decision result for each subband (suitable for DT).

Wavelet subband LH1 _HL1 _HH1

Test image (x, y) Suitable for DT (x, y) Suitable for DT (x, y) Suitable for DT

Barbara (213, 126) N (130, 366) Y (92, 384) N Fingerprint (257, 129) N (128, 200) Y N Lena (154, 121) N (122, 259) N (129, 390) N Pepper N N (32, 24) Y Boat N (118, 259) Y N Couple (236, 96) N (109, 259) Y N Elaine (177, 123) Y (107, 212) Y (83, 32) Y

(9)

for each c(x, y). Thus, the total numbers of real additions and mul-tiplications are given by(26) and (27). Finally, the overall numbers of real additions and real multiplications in Step A are(28) and (29).

Nreal addition in DFT¼ 2 Ncomplex multiplication in DFTþ 2 Ncomplex addition in DFT¼ 3 W H ðceilðlog2WÞ

þ ceilðlog2HÞÞ ð22Þ

Nreal multiplication in DFT¼ 4 Ncomplex multiplication in DFT¼ 2 W H

ðceilðlog2WÞ þ ceilðlog2HÞÞ ð23Þ

Nreal addition in calculating power¼ W H ð24Þ Nreal multiplication in calculating power¼ 2 W H ð25Þ Nreal addition in smoothing spectrum¼ 8 W H ð26Þ Nreal multiplication in smoothing spectrum¼ 2 W H ð27Þ Nreal addition in stepA¼ 3 W H ðceilðlog2WÞ þ ceilðlog2HÞÞ

þ W H þ 8 W H ð28Þ

Nreal multiplication in stepA¼ 2 W Hðceilðlog2WÞ

þ ceilðlog2HÞÞ 2 W H þ W H ð29Þ

2) Step B chooses the representative energy levels based on the low frequency zone. Eqs.(15) and (16)calculate the mean of the DFC energy in the low frequency zone. The heights of the low fre-quency zones in LH1_{and HL}1_{are (ceil(H/4) + 1) and (ceil(W/4) + 1),}

respectively. The width is Wlfz. Thus, the mean calculation (Step B) needs 2 divisions and Nreal addition in stepB real additions as shown in

(30). We choose Wlfz= 3 when S = 512 512.

Nreal addition in stepB¼ Wlfz ceil H=4ð ð Þ þ ceil W=4ð Þ þ 2Þ 2 ð30Þ

3) Step C decides the thresholds for directional subbands. The DFC number in each directional subband is WH/16, thus the DFC number in each half directional subband is WH/32. In addition to 2 real divisions, we need W H/32 real multiplications and (W H/16-2) real additions to calculate the mean and the variance of each half directional subband. LH1 _{and HL}1_together have 8 directional subbands in total. The numbers of real additions and real multiplications in step C are, therefore, given by(31) and (32).

Nreal addition in stepC¼ 8 ðW H=16 2Þ ¼ W H=2 16 ð31Þ Nreal multiplication in stepC¼ 8 W H=32 ¼ W H=4 ð32Þ

4) Step D identiﬁes the energy peaks. Eq.(21)needs 1 division, 242 multiplications and 480 additions. In total, the numbers of real additions and real multiplications in step D are in(33) and (34), wherein Nitis the iteration number. In our experiments, the mini-mal Nitis 11 (test image Baboon), the maximal Nitis 12487 (test im-age Barbara), and the averim-age Nitis 1697.

Nreal addition in stepD¼ Nit 480 ð33Þ

Nreal multiplication in stepD¼ Nit 242 ð34Þ

All in all,(35) and (36)give the total number of multiplications

and additions in the decision procedure. When

S = W H = 512 512, Wlfz= 3, Nin= 1697, the total number of real additions and real multiplications are 17,461,460 and 10,699,826.

Fig. 12. Energy spectrum of test images (a) Barbara, (b) Pepper, and (c) Elaine. Horizontal axis and vertical axis represent horizontal frequency and vertical frequency, respectively. The energy spectrums are all in log10scale. The red squares are the locations of the identiﬁed energy peaks.

Table 3

Computational complexity and run time for the systems with and without decision when LLF is adopted.

LLF without decision LLF with decision (fastest) LLF with decision (slowest) Ratio (fastes) (%) Ratio (slowest) (%)

Number of multiplications 114,819,072 10,699,826 125,518,898 9.32 109.32

Number of additions 114,819,072 17,461,460 132,280,521 15.21 115.21

Run time 11.613 s 1.385 s 13.012 s 11.93 112.05

Table 4

Computational complexity and run time for the systems with and without decision when SLF is adopted.

SLF without decision SLF with decision (fastest) SLF with decision (slowest) Ratio (fastest) (%) Ratio (slowest)(%)

Number of multiplications 13,369,344 10,699,826 24,069,170 80.03 180.03

Number of additions 13,369,344 17,461,460 30,830,804 130.61 230.61

(10)

Ntotal real addition¼ W H ð3 ceilðlog2WÞ þ 3

ceilðlog2HÞ þ 9 þ 1=2Þ 16 þ Wlfz

ðceilðH=4Þ þ ceilðW=4Þ þ 2Þ 2 þ Nit

480 ð35Þ

Ntotal real multiplication¼ W H ð2 ceilðlog2WÞ þ 2

ceilðlog2HÞ þ 3 þ 1=4Þ þ Nit 242 ð36Þ

Table 3andTable 4show the computational complexity and the run time of the entire system with and without decision, wherein the directional ﬁlters are LLF and SLF, respectively. With decision, the fastest case occurs when no directional transform is conducted on LH1_{, HL}1_{, and HH}1_{. And the slowest case occurs when we apply} the directional transform to all subbands. In Table 3, the image coding scheme with LLF and decision may save over 84% computa-tional load or 88% run time in the fastest case. In the slowest case, the decision process requires an additional 16% computational load

or 13% run time. InTable 4, the image coding scheme with SLF and decision saves about 48% run time in the fastest case and consumes 52% extra run time in the slowest case. On the average, the image coding schemes with decision require less run time.

5. New ZC context tables for ESCOT

Arithmetic coding methods encode the transformed/quantized coefﬁcients into a bit-stream. ESCOT is a bit-plane coding method and it uses its neighbors for the context model. Let the sequence xN = {xN, xN-1, . . ., x2, x1} represents one bit-plane of a coefﬁcient block. Because the bit-plane consists of binary symbols, i.e., xi

e

{0, 1}, the minimum code length of a binary sequence estimated based on the information theory is shown in(37), wherein P(xi|xi-1) is the condi-tional probability of xigiven xi1= {xi-1, xi-2, . . ., x2, x1}. Clearly, xi1 is the subset of xN_{. Assuming x}N_{is a Markov random sequence of} some ﬁnite order, we then can reduce the size of xi1 _{down to} xi1_{, which is a subsequence of x}i1_{. This x}i1_{is the context model} support [26][27]. Typically, xi1 _{includes the neighbors and the} (bit-plane) parents of xi. Ideally, the optimal context model gives the maximum mutual information[29].

LL

LH 4-0 LH 4-0 LH 4-1 LH 4-2 LH 4-3 LH 4-2 HH 4-0 HH 4-0 HH 4-2 HH 4-2 HH 4-1 HH 4-1 HH 4-3 HL 4-1 HL 4-1 HL 4-3 HL 4-3 HL 4-2 HL 4-2 HL 4-0 HL 4-0 π/2 π 0 -π/2 -π -π -π/2 0 π/2 π horizontal frequency Fx vertical frequency Fy LH 4-3 LH 4-1 HH 4-3

D2

A

H

V

D1

D2

(b)

(a)

Fig. 13. (a) The directional subbands produced by WBCT. (b) The spatial neighbor directions for coefﬁcient A.

100 200 300 400 500 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 20 40 60 80 100 120 140 20 40 60 80 100 120 140 0 0.05 0.1 0.15 0.2 0.25 0.3

(a)

(c)

Fx (π radian) Fy (π radian) Fx (π radian) Fy (π radian) Fx (π radian) Fy (π radian) 100 200 300 400 500 50 100 150 200 250 300 350 400 450 500 20 40 60 80 100 120 140 160 180 200

(e)

(b)

(d)

(f)

Fig. 14. (a) Input signal in spatial domain. (b) Input signal in frequency domain. (c) Filter response of DF_LH 4-0 in spatial domain. (d) Filter response of DF_LH 4-0 in frequency domain. (e) Output signal in spatial domain. (f) Output signal in frequency domain.

Table 5

ZC context table for 2-D wavelet subbands.

Wavelet subband LL LH HL HH Context H V D1+D2 V H D1+D2 H+V D1+D2 8 2 X X 2 X X X P3 7 1 P1 X 1 P1 X P1 2 6 1 0 P1 1 0 P1 0 2 5 1 0 0 1 0 0 P2 1 4 0 2 X 0 2 X 1 1 3 0 1 X 0 1 X 0 1 2 0 0 P2 0 0 P2 P2 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0

(11)

L ¼ log2

Yn

i¼1

Pðxijxi1Þ: ð37Þ

The original ESCOT considers only the 2-D DWT coefficients in the horizontal and the vertical directions. Yet, the coefficients in a certain directional subband may cluster along one specific direc-tion (different from the vertical or horizontal direcdirec-tions). The origi-nal context table fails to handle this case well. Therefore, we redesign the context models of ESCOT.

InFig. 13(a), the 13 subbands produced by WBCT are labeled as ‘‘LL’’, ‘‘HH 4-0’’, ‘‘LH 4-0’’, ‘‘HL 4-0’’, and likewise. InFig. 13(b), the edges passing through A can be H-A-H (0O_{), V-A-V (90}O_{), D1-A-D1} (45O_{), and D2-A-D2 (45}O_{). We denote the 0}O_{, 90}O_{, 45}O_{, 45}O directions as ‘‘H’’, ‘‘V’’, ‘‘D1’’, and ‘‘D2’’, respectively.

InFig. 14, we examine the effect of the directional ﬁlter LH 4-0 (DF_LH 4-0). A concentric-circle pattern, which has edges of all directions, is used as the input pattern.Fig. 14(a) and (b) show this

input signal and its frequency spectrum.Fig. 14(c) shows the spa-tial ﬁlter impulse response of DF_LH 4-0, which is roughly along

Table 6

ZC context table for directional subbands.

Directional subband LH 4-0 LH 4-1 HL 4-2 HL 4-3 LH 4-3 HL 4-1 LH 4-2 HL 4-0 HH 4-0 HH4-2 HH 4-1 HH 4-3 Context D2+H V D1 D1+H V D2 D2+V H D1 D1+V H D2 D1 H+V D2 D2 H+V D1 8 P2 X X P2 X X P2 X X P2 X X 2 X X 2 X X 7 1 P1 X 1 P1 X 1 P1 X 1 P1 X 1 P1 X 1 P1 X 6 1 0 P1 1 0 P1 1 0 P1 1 0 P1 1 0 P1 1 0 P1 5 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 4 0 2 X 0 2 X 0 2 X 0 2 X 0 P2 X 0 P2 X 3 0 1 X 0 1 X 0 1 X 0 1 X 0 1 X 0 1 X 2 0 0 2 0 0 2 0 0 2 0 0 2 0 0 2 0 0 2 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Fig. 15. Frequency magnitude responses of (a) LH 4-0 (b) LH 4-2 (c) HH 4-0 (d) HH 4-2.

Table 7

Abbreviations for the adopted tools in the image coding scheme. Directional

Transform

SLF Short length directional Filter. LLF Long length directional Filter. NDF No directional Filter. Decision

NDS1 No decision, applying directional transform on all subbands (LH1

, HL1

, and HH1

).

NDS2 No decision, directional transform not applied. WDS With decision, applying directional transform on the

chosen wavelet subbands. Entropy coder

O ESCOT with the original ZC context tables. P ESCOT with the proposed ZC context tables.

(12)

the H direction (slightly tilted to the D2 direction). Fig. 14 (d) shows the ﬁlter frequency magnitude response of DF_LH 4-0, whose energy clusters mainly along the vertical axis. In Fig. 14

(e), the ﬁltered output image contains mainly the spatial edges aligned with the H direction (slightly tilted to the D2 direction).

Fig. 14(f) shows the frequency spectrum of ﬁltered signals. Evi-dently, the dominated directions of the LH 4-0 outputs are H and D2. Hence, ‘‘H and D2’’ are the ﬁltered directions of LH 4-0.

Similarly, we identify the ﬁltered directions of the other direc-tional subbands. The ﬁltered directions of LH 4-1 are ‘‘H and D1’’,

those of HL 4-2 are ‘‘V and D2’’, and those of HL 4-3 are ‘‘V and D1’’. The ﬁltered directions of the four corner subbands (LH 4-2, HH 4-3, HH 4-1, and HL 4-0) are D2. And those of the other four corner subbands (LH 4-3, HH 4-2, HH 4-0, HL 4-1) are D1.

ESCOT uses three types of context models or context tables – the zero coding tables (ZC), the sign coding tables (SC) and the magni-tude refinement tables (MR). ESCOT codes bit-planes from the most significant bit-plane to the least significant bit-plane. ESCOT starts with ZC to code the beginning zeros until it hits the first non-zero bit. ESCOT uses ZC to code the magnitude of first non-non-zero bit

Fig. 16. PSNR of the image coding schemes with SLF and LLF (‘‘SLF+WDS+O’’ and ‘‘LLF+WDS+O’’).

Table 8

Run time of the image coding schemes with SLF and LLF.

Scheme SLF+WDS(HL1_{)+O (Barbara, Fingerprint, Boat, Couple, average)} _SLF+WDS(HH1_{)+O (Pepper)} _{SLF+WDS(LH1, HL}1_{, HH}1_{)+O (Elaine)}

Run time 4.547 s 4.550 s 8.023 s

Scheme LLF+WDS(HL1

)+O (Barbara, Fingerprint, Boat, Couple, average) LLF+WDS(HH1

)+O (Pepper) LLF+WDS(LH1, HL1

, HH1

)+O (Elaine)

Run time 23.031 s 23.026 s 62.484 s

(13)

and SC to code its sign. For the remaining bits, ESCOT uses MR to code their magnitudes. To match the characteristics of the WBCT coefficients, we alter the ZC context table in ESCOT. For the coeffi-cients in the ordinary 2-D wavelet subbands, we adopt the ZC con-text table (Table 5) in EBCOT[9]. But for the coefficients in the directional subbands, the proposedTable 6is the ZC context table. InTable 5andTable 6, each ‘‘context’’ denotes a model, and the numbers of non-zero coefficients are listed under the directions, H, V, and D1+D2, and X denotes ‘‘Do not care’’.Fig. 13(b) shows the neighbors and their notations we use in the entropy coding. The neighbors include vertical neighbors (V), horizontal neighbors

(H), left-lower and right-upper neighbors (D1), and left-upper and right-lower neighbors (D2). To code coefficient A in a wavelet subband of a bit-plane, we first calculate the number of non-zero coefficients in all directions. For 2-D wavelets, based on the sub-band location and the non-zero coefficient patterns, we decide which context inTable 5is to be used to code this bit of coefficient A. Similarly, we code the coefficients in the other directional sub-bands usingTable 6.

Fig. 15shows the frequency responses of the WBCT directional filters. We notice the aliasing phenomenon in WBCT. Because the directional filters are not ideal filters, their outputs contain aliasing components. Thus, the outputs of a certain filter populate not only along one direction but also along another direction (with less en-ergy). Consequently, the context model in arithmetic coding be-comes less accurate or its coding efficiency is reduced. We may reduce aliasing by adopting a sharper (and thus longer) filter but the computation time would then increase.

Fig. 18. MSSIM of the image coding schemes with and without decision (‘‘SLF+WDS+O’’, ‘‘SLF+NDS1+O’’, and ‘‘NDF+NDS2+O’’).

Table 9

Average Run Time of the Image Coding Schemes with and without Decision.

Scheme SLF+WDS+O SLF+NDS1+O NDF+NDS2+O

Run time 4.804 s 8.206 s 2.688 s

(14)

6. Experimental results

We have discussed the three proposed tools that enhance a WBCT image coding scheme in computation and/or complexity reduction. They are short length 2-D ﬁlters, a mean-shift-based decision, and new ZC context tables for ESCOT. In this section, we examine the impact of each tool towards the system mance. And, putting them together, we compare the overall perfor-mance between the 2-D DWT image coding scheme, the original WBCT image coding scheme, and the proposed WBCT image coding scheme with three new tools.

A few abbreviations are explained below. The original WBCT image coding scheme can apply directional filtering to either all subbands (NDS1) or no subband (NDS2). With our decision mech-anism (WDS), we adaptively choose the subbands for directional filtering. Moreover, the original WBCT scheme uses long length directional filters (LLF), and our proposed image coding scheme

uses short length directional filters (SLF) instead. The no direc-tional filtering (NDF) situation appears when either the WDS de-clares that no subband needs directional filtering or the NDS2 strategy is adopted. There are two options for ESCOT: the original context tables (O) designed for 2-D DWT coefficients or the pro-posed context tables (P) fine-tuned for the WBCT coefficients. Ta-ble 7summarizes all the aforementioned abbreviations.

The notation of an image coding scheme consists of three parts: the directional transform type, the decision, and the coder tables.

For example, the 2-D DWT image coding scheme is

‘‘NDF+NDS2+O’’, the original WBCT image coding scheme is ‘‘LLF+NDS1+O’’, and our proposed coding scheme with three tools is ‘‘SLF+WDS(HL1_{, HH}1_{)+P’’. Note that the subbands selected by} WDS are listed in the parenthesis after WDS, and thus ‘‘WDS(LH1, HL1_{, and HH}1_{)’’ is the same as ‘‘NDS1’’.}

Our test images are listed inTable 2. The experimental platform is Matlab r2008b on a PC with Intel Core 2 Quad Q9400 CPU. First,

Table 11

PSNR of the image coding schemes with the original and the new ZC context tables (directional ﬁlters = LLF).

Test image Coding Shceme 0.125 bpp 0.25 bpp 0.5 bpp 0.75 bpp 1.0 bpp

Barbara LLF+WDS+O 25.72 28.51 32.22 34.89 37.01 LLF+WDS+P 25.86 28.71 32.41 34.96 37.11 Fingerprint LLF+WDS+O 22.64 25.36 29.09 31.33 33.25 LLF+WDS+P 22.64 25.52 29.09 31.41 33.26 Pepper LLF+WDS+O 30.49 33.33 35.56 36.81 37.93 LLF+WDS+P 30.6 33.37 35.62 36.9 38.07 Elaine LLF+WDS+O 30.99 32.29 33.94 35.34 36.5 LLF+WDS+P 31.09 32.33 34 35.38 36.53 Boat LLF+WDS+O 28.81 32.28 36.13 38.6 40.46 LLF+WDS+P 28.8 32.39 36.22 38.67 40.58 Couple LLF+WDS+O 26.87 29.31 32.55 34.73 36.47 LLF+WDS+P 26.93 29.37 32.56 34.79 36.53 Table 12

Run time of the image coding schemes with different ZC context tables. Scheme SLF+WDS(HL1

)+O (Barbara, Fingerprint, Boat, Couple, average) SLF+WDS(HH1

)+O (Pepper) SLF+WDS(LH1 , HL1 , HH1 )+O (Elaine) Run time 4.547 s 4.550 s 8.023 s Scheme SLF+WDS(HL1

)+P (Barbara, Fingerprint, Boat, Couple, average) SLF+WDS(HH1

)+P (Pepper) SLF+WDS(LH1

, HL1

, HH1

)+P (Elaine)

Run time 4.203 s 4.177 s 7.813 s

Scheme LLF+WDS(HL1_{)+O (Barbara, Fingerprint, Boat, Couple, average)} _LLF+WDS(HH1_{)+O (Pepper)} _LLF+WDS(LH1_{, HL}1_{, HH}1_{)+O (Elaine)}

Run time 23.031 s 23.026 s 62.484 s

Scheme LLF+WDS(HL1

)+P (Barbara, Fingerprint, Boat, Couple, average) LLF+WDS(HH1

)+P (Pepper) LLF+WDS(LH1 , HL1 , HH1 )+P (Elaine) Run time 22.391 s 22.386 s 62.256 s Table 10

PSNR of the image coding schemes with the original and the new ZC context tables (directional ﬁlters = SLF).

Test image Coding Shceme 0.125 bpp 0.25 bpp 0.5 bpp 0.75 bpp 1.0 bpp

Barbara SLF+WDS(HL1 )+O 25.62 28.41 32.22 34.89 36.99 SLF+WDS(HL1 )+P 25.79 28.53 32.33 34.96 37.11 Fingerprint SLF+WDS(HL1 )+O 22.64 25.36 29.09 31.33 33.25 SLF+WDS(HL1 )+P 22.64 25.52 29.09 31.33 33.25 Pepper SLF+WDS(HH1_)+O _30.49 _33.34 _35.54 _36.85 _37.96 SLF+WDS(HH1 )+P 30.6 33.37 35.61 36.82 37.95 Elaine SLF+WDS(LH1 , HL1 , HH1 )+O 30.99 32.3 33.8 35.11 36.29 SLF+WDS(LH1 , HL1 , HH1 )+P 31.09 32.31 33.84 35.12 36.37 Boat SLF+WDS(HL1 )+O 28.88 32.32 36.17 38.68 40.52 SLF+WDS(HL1 )+P 28.9 32.42 36.26 38.78 40.58 Couple SLF+WDS(HL1_)+O _26.92 _29.33 _32.58 _34.81 _36.48 SLF+WDS(HL1 )+P 26.92 29.39 32.6 34.85 36.63

(15)

we show the impacts of ﬁlter length in terms of PSNR and run time by comparing ‘‘SLF+WDS+O’’ and ‘‘LLF+WDS+O’’. Fig. 16 shows their PSNR at various bitrates (bit per pixel, bpp). Obviously, the image coding scheme with SLF has similar PSNR performances as that with LLF.Table 8shows the run time of these two schemes and the image coding scheme with SLF consumes only 10–20% computational time of that with LLF.

Next, we present the impacts of decision algorithm in terms of

PSNR, MSSIM [35] and run time among ‘‘SLF+WDS+O’’,

‘‘SLF+NDS1+O’’, and ‘‘NDF+NDS2+O’’ (the 2-D DWT coding scheme). MSSIM represents mean of structural similarity. A higher MSSIM implies a better image subjective quality. Fig. 17 shows the PSNR of the image coding schemes with and without decision. The image coding scheme with decision (‘‘SLF+WDS+O’’) has simi-lar PSNR performance as those without decisions (‘‘SLF+NDS1+O’’ and ‘‘NDF+NDS2+O’’).Fig. 18shows the MSSIM of the image coding schemes with and without decision. Our proposed image coding scheme with decision (‘‘SLF+WDS+O’’) has similar MSSIM

perfor-mance as ‘‘SLF+NDS1+O’’ and has better MSSIM than

‘‘NDF+NDS2+O’’. The visual quality improvement is most obvious on some pictures such as Elaine.Fig. 19shows portions of the ori-ginal and the reconstructed images of Barbara and Elaine generated

by these three schemes. Noticeably, ‘‘SLF+WDS+O’’ and

‘‘SLF+NDS1+O’’ show more texture details than ‘‘NDF+NDS2+O’’.

Table 9shows the run time of these schemes. ‘‘SLF+WDS+O’’ saves about 50% computational time comparing to ‘‘SLF+NDS1+O’’ but it needs roughly 70% extra computational time comparing to ‘‘NDF+NDS2+O’’. In brief, the image coding scheme with decision, ‘‘SLF+WDS+O,’’ achieves a good balance between quality and speed. Next, we examine the effect of the new ESCOT context tables in terms of PSNR and run time.Table 10shows the PSNR of the image coding schemes with the original and the new ZC context tables when the directional ﬁlters are SLF. AndTable 11shows the PSNR when the directional ﬁlters are LLF. The image coding schemes with the new ZC context tables (‘‘SLF/LLF+WDS+P’’) have a slightly better PSNR performance than those with the original ZC context table (‘‘SLF/LLF+WDS+O’’) in all cases. Moreover,Table 12indicates that ‘‘SLF/LLF+WDS+P’’ consumes less computation time than its ‘‘SLF/LLF+WDS+O’’ counterpart in all cases. The context table of ‘‘O’’ considers 26 neighbors in a 3 3 3 cubic but that of ‘‘P’’ con-siders only 8 neighbors in a 3 3 square. Clearly, ‘‘P’’ uses fewer neighbors and consumes less computation. Thus, our proposed context tables can also speed up slightly the coding process.

At last, we compare the performance of the entire image coding scheme for three candidates: ‘‘LLF+NDS1+O’’ (the original WBCT image coding scheme), ‘‘NDF+NDS2+O’’ (the 2-D DWT image cod-ing scheme) and ‘‘SLF+WDS+P’’ (our proposed WBCT image codcod-ing scheme).Fig. 20shows the PSNR of these three coding schemes. Generally, our proposed ‘‘SLF+WDS+P’’ has better average PSNR than the ‘‘NDF+NDS2+O’’ and its average PSNR is comparable with that of ‘‘LLF+NDS1+O’’.Table 13 shows their run time. Our pro-posed scheme ‘‘SLF+WDS+P’’ saves more than 92% computing time than ‘‘LLF+NDS1+O’’ (the original WBCT image coding scheme). On the other hand, it costs 67% extra computing time than ‘‘NDF+NDS2+O’’ (the 2-D DWT image coding scheme). Clearly, our proposed scheme offers a good balance between computa-tional complexity and image visual quality.

7. Conclusions

The WBCT-based image coding approach is explored in this pa-per. We propose three components to enhance its performance. First, we design a short-length filter set (SLF) to speed up the filter-ing process. It provides similar codfilter-ing performance but requires only 10% of computational complexity of the original long-length filters (LLF). Second, we construct a mean-shift-based decision pro-cess to decide if a higher subband (HH1, HL1, or LH1) is appropriate for directional decomposition. Threshold values are carefully se-lected to identify the energy peaks in each candidate subband. Fi-nally, we design new zero-coding (ZC) context tables for ESCOT because the coefficients produced by directional decomposition have different statistical characteristics among near-by coeffi-cients. Compared with the conventional 2-D DWT coding scheme, our scheme provides better visual quality with a moderate addi-tional computaaddi-tional cost. Compared with the original WBCT cod-ing scheme, the proposed codcod-ing scheme provides comparable image quality (PSNR and MSSIM) but with significantly less com-puting time.

Table 13

Average run time of the 2-D DWT scheme (NDF+NDS2+O), the original WBCT scheme (LLF+NDS1+O), and the proposed scheme with three new tools (SLF+WDS+P).

Scheme SLF+WDS+P LLF+NDS1+O NDF+NDS2+O

Run time 4.499 s 62.469 s 2.688 s

(16)

Acknowledgements

The ﬁrst author would like to thank Dr. Jang-Jer Tsai for his valuable suggestions on improving the quality of this paper. This work was supported in parts by the NSC, Taiwan under Grant 98-2221-E-009-076, by the MOEA, Taiwan under Grant 99-EC-17-A-01-I1-0016, and by the Intelligent Information Communications Research Center, NCTU.

References

[1] D. Taubman, M.W. Marcellin, JPEG2000: Image Compression Fundamentals, Standards, and Practice, Kluwer, Norwell, MA, 2002.

[2] M. Vetterli, ‘‘Wavelets, approximation and compression’’, IEEE Signal Proc. Mag., pp. 59–73, Sep. 2001.

[3] M. N. Do, ‘‘Directional Multiresolution Image Representations’’, Ph.D. Dissertation, Swiss Fed. Inst. Technol., Lausanne, Switzerland, Nov. 2001. [4] D. Taubman, A. Zakhor, Orientation adaptive subband coding of images, IEEE

Trans. Image Process. 3 (4) (1994) 421–437.

[5] D. Wang, L. Zhang, A. Vincent, F. Speranza, Curved wavelet transform for image coding, IEEE Trans. Image Process. 15 (8) (2006) 2413–2421.

[6] V. Chappelier, C. Guillemot, Oriented wavelet transform for image compression and denoising, IEEE Trans. Image Process. 15 (10) (2006) 2892–2903. [7] W. Ding, F. Wu, X. Wu, S. Li, H. Li, Adaptive directional lifting-based wavelet

transform for image coding, IEEE Trans. Image Process. 16 (2) (2007) 416–427. [8] C.-L. Chang, B. Girod, Direction-adaptive discrete wavelet transform for image

compression, IEEE Trans. Image Process. 16 (5) (2007) 1289–1302.

[9] D. Taubman, High performance scalable image compression with EBCOT, IEEE Trans. Image Process. 9 (7) (2000) 1158–1170.

[10] C. Tian , S.S. Hemami, ‘‘An embedded image coding system based on tarp ﬁlter with classiﬁcation’’, in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Process, Montreal, QC, Canada, May 2004, vol. 3, pp. 49–52.

[11] M.N. Do, M. Vetterli, The contourlet transform: an efﬁcient directional decomposition multiresolution image representation, IEEE Trans. Image Process. 14 (12) (2005) 2091–2106.

[12] Y. Lu, M.N. Do, ‘‘The ﬁner directional wavelet transform,’’ in: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, PA, USA, March 2005.

[13] T.T. Nguyen, S. Oraintara, A class of multiresolution directional ﬁlter bank, IEEE Trans. Signal Process. 55 (3) (2007) 949–961.

[14] T.T. Nguyen, S. Oraintara, Multiresolution directional ﬁlterbanks: Theory, design, and applications, IEEE Trans. Signal Process 53 (10) (2005) 3895–3905. [15] I.W. Selesnick, R.G. Baraniuk, N.G. Kingsbury, The dual-tree complex wavelet

transform, IEEE Signal Process. Mag. 22 (2005) 123–151.

[16] R. Eslami, H. Radha, A new family of nonredundant transforms using hybrid wavelets and directional ﬁlter banks, IEEE Trans. Image Process. 16 (4) (2007) 1152–1167.

[17] R. Eslami, H. Radha, ‘‘Wavelet-based contourlet coding using an SPIHT-like algorithm,’’ in: Proceedings of Conference Information Sciences, Systems, Princeton, NJ, USA, March 2004, pp. 784–788.

[18] A. Said, W. Pearlman, A new, fast and efﬁcient image codec based on set partitioning in hierarchical trees, IEEE Trans. Circuits Syst. Video Technol. 6 (1996) 243–250.

[19] J.M. Shapiro, ‘‘An embedded hierarchical image coder using zerotrees of wavelet coefﬁcients’’, in: IEEE Data Compression Conference, Snowbird, UT, 1993, pp. 214–223.

[20] J. Yang, Y. Wang, W. Xu, Q. Dai, Image coding using dual-tree discrete wavelet transform, IEEE Trans. Image Process. 17 (9) (2008) 1555–1569.

[21] P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code, IEEE Trans. Commun. 31 (4) (1983) 532–540.

[22] R.H. Bamberger, M.J.T. Smith, A ﬁlter bank for the directional decomposition of images: Theory and design, IEEE Trans. Signal Process. 40 (4) (1992) 882–893. [23] S.-M. Phoong, C.W. Kim, P.P. Vaidyanathan, R. Ansari, A new class of two-channel biorthogonal ﬁlter banks and wavelet bases, IEEE Trans. Signal Process. 43 (3) (1995) 649–665.

[24] Y.P. Lin, P.P. Vaidyanathan, Theory and design of two-dimensional ﬁlter banks: a review, Multidimensional Syst. Signal Proc. 7 (1996) 263–330.

[25] A.V. Oppenheim, R.W. Schaffer, Discrete-Time Signal Processing, Englewood Cliffs, NJ, Prentice-Hall, 1989.

[26] J. Xu, Z. Xiong, S. Li, Y. Zhang, Three-dimensional embedded subband coding with optimized truncation (3D ESCOT), Appl. Comput. Harmonic Anal. 10 (2001) 290–315. Special issue on wavelet applications in engineering. [27] X. Wu, ‘‘High-order context modeling and embedded conditional entropy

coding of wavelet coefﬁcients for image compression’’, in: Proceedings of 31st Asilomar Conference Signals, Systems, Computers, November 1997, pp. 1378– 1382.

[28] S.-T. Hsiang, J.W. Woods, ‘‘Embedded image coding using zeroblocks of subband/wavelet coefﬁcients and context modeling’’, IEEE Int. Conf. Circuits Systems (ISCAS) 3 (2000) 662–665.

[29] Z. Liu, L. Karam, ‘‘Mutual information-based analysis of JPEG2000 contexts’’, IEEE Trans. Image Process. 14 (4) (2005) 411–422.

[30] C.-H. Hung, H.-M. Hang, ‘‘Image coding using short wavelet-based contourlet transform’’, in: IEEE International Conference on Image Proceedings (ICIP), San Diego, CA, USA, October 2008.

[31] C.-H. Hung, H.-M. Hang, ‘‘Decision-directed adaptive wavelet image coding with directional decomposition,’’ in: IEEE International Symposium on Circuits and Systems (ISCAS), Taipei, Taiwan, May 2009.

[32] Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal. Mach. Intell. 17 (8) (1995) 790–799.

[33] D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Trans. Pattern Anal. Mach. Intell. 24 (5) (2002) 603–619. [34] D. Comaniciu, P. Meer, ‘‘Mean shift analysis and applications’’, in: Proceedings

of IEEE International Conference Computer Vision, 1999, pp. 1197–1203. [35] Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli, Image quality assessment:

from error measurement to structural similarity, IEEE Trans. Image Process. 13 (4) (2004) 600–612.