Introduction - 以方向性小波轉換為基礎之影像與視訊編碼

DWT [1]-[5] is adopted widely in image and video coding in recent years.

Wavelet-based image coding, such as JPEG2000 [6], consists of three stages:

two-dimensional discrete wavelet transform (2-D DWT), coefficient quantization, and

arithmetic coding. A digital image is first transformed by 2-D DWT to produce a set of transform coefficients. After quantization, these coefficients are compressed to a binary stream by an entropy coding tool. For video signal compression, a wavelet-based interframe coding, such as Vidwav [7], includes four stages:

motion-compensated temporal filtering (MCTF) [8]-[13], 2-D DWT, quantization,

and arithmetic coding. MCTF decomposes video frames into temporal low-pass images and high-pass residuals. Then, in addition to image coding, we process the residuals also by 2-D DWT, quantization, and arithmetic coding.

One-dimensional discrete wavelet transform (1-D DWT) represents 1-D

piecewise smooth signals well in few coefficients [5]. 2-D DWT applies two 1-D DWTs along horizontal and vertical axes and ignores 2-D piecewise smooth signal continuity. It represents 2-D signals by many little coefficients and spreads the energy into the high-pass subbands [14][15]. Quantizing these coefficients to zero at low bit rates results in Gibbs artifacts at image edges [16].

Many 2-D multiresolution directional transforms have been proposed to solve this problem, including the directional filter banks [17]-[23] and the direction-adaptive wavelet transforms [27]-[32]. Directional filter banks use a set of pre-selected 2-D filters to perform multiresolution directional decomposition. Each filter corresponds to a basis function with specific spatial direction and resolution.

Directional filter banks can represent 2-D directional texture patterns by relatively few large coefficients. Do and Vetterli proposed the contourlet transform (CT) [17], which is composed of the Laplacian pyramid (LP) [24] and the directional filter bank (DFB) [25]. Lu and Do proposed the finer directional wavelet transform with additional 2-D directional resolution [18]. Nguyen and Oraintara re-designed DFBs and provided enhanced directional decomposition [19][20]. Selesnick et al. proposed the complex wavelet transform with good directionality and shift invariance [21].

Eslami and Radha proposed the wavelet-based contourlet transform (WBCT) and its extension version by applying DFBs to 2-D DWT’s high-pass subbands [22][23].

Among them, the WBCT [22] technique has the critical-sampling property, consumes comparatively less computational power, and requires no side information for decoding. Therefore, we focus on WBCT in this dissertation.

The direction-adaptive discrete wavelet transform (DA-DWT) technique partitions an image into local regions (blocks) and filters along the texture direction

by 1-D DWT lifting scheme [26]. It selects the optimal direction and block size by minimizing the prediction error under the constraint bits. Thus, DA-DWT compacts more energy into the spatial low-pass subbands and provides good compression performance [15]. Chang and Girod proposed a DA-DWT based image compression scheme with integer pixel direction accuracy [27]. Ding et al. uses interpolation to achieve quarter pixel direction accuracy [28]. Liu and Ngan used a weighted function to avoid mismatch in the lifting scheme [29]. Dong et al. proposed a 2-D adaptive interpolation filter for more accurate fractional pixel accuracy [30]. Chang and Girod proposed another DA-DWT based on the quincunx subsampling pattern [31]. Xu and Wu combined different subsampling patterns together and proposed the subsampling and direction-adaptive discrete wavelet transform (SA-DWT) [32]. It is easy to

implementing and integrating DA-DWT into wavelet-based image coding. Thus, DA-DWT becomes our second focus in this dissertation.

Arithmetic coding schemes compress the transformed/quantized coefficients into bitstream. They produce a minimum-distortion scalable bitstream under all the constrained bit rates. They consider three types of correlations among the coefficients.

First, the inter-subband coding methods, such as the set partitioning in hierarchical tree (SPIHT) method [33] and the embedded zerotrees of wavelet transform (EZW) method [34], mitigate the inter-band correlations in a tree structure. Second, the

intra-subband coding methods partition the coefficients in one subband to several non-overlapped coding blocks and handle only the correlations among the neighbors in one coding block (the intra-subband correlations). Examples in this category are the embedded block coding with optimized truncation (EBCOT) method [35], the 3-D embedded subband coding with optimized truncation (3-D ESCOT) method [36], and

the tarp-filter-based system that classifies coefficients to achieve embedding (TCE) method [37]. Third, the mixed inter-subband and intra-subband coding methods cover both the inter-subband and intra-subband correlations. Examples are the embedded conditional entropy coding of wavelet coefficients (ECECOW) method [38] and the embedded coding using zeroblocks of wavelet coefficients and context modeling

(EZBC) method [39]. To save computing power, for single image compression, we use the intra-subband coding methods in this dissertation.

Combining WBCT and 3-D ESCOT, a WBCT image coding scheme can achieve a better coding performance than a regular 2-D DWT image coding scheme. However, there are a few issues in the existing WBCT coding schemes. They need a large amount of computations because the existing WBCT directional filters have a large support. And, we found that for a specific picture, some WBCT frequency subbands do not need further directional transform. Furthermore, the context table in 3-D ESCOT needs adjustment to match the characteristics of quantized WBCT

coefficients.

To solve these issues, we propose three algorithms in this dissertation to enhance the WBCT image coding scheme. First, we suggest a set of short-length 2-D directional filters [40] and verify their performance. Second, we design a mean-shift-based decision scheme to dynamically select the proper subbands for directional transform [41]. Third, we re-design the context tables of 3-D ESCOT to match the data directionality. With these algorithms, our proposed scheme reduces 92% or higher the computational complexity of the original WBCT image coding scheme at similar visual quality [40].

DA-DWT first partitions images into non-overlapping blocks. It then applies the 1-D DWT to each block along the candidate directions and calculates the corresponding prediction errors. It finally selects the candidate direction with the minimal prediction error as the most suitable direction for the block. For the partition blocks in smooth region, each candidate direction produces similar prediction error.

Thus, DA-DWT selects inconsistent directions for these blocks and increases side information. We also encounter similar situation for blocks in similar-textured region.

Tanaka et al. pre-filtered images by 2-D filters [42]. Pre-filtering reduces candidate directions and makes selected directions more consistent. If a block has filtered output less than the threshold, it is considered in smooth region and processed by 2-D DWT.

Selecting a suitable threshold for identifying smooth region is hard. Aligning blocks in similar-textured region also helps reducing side information. Maleki et al. proposed an alignment cost function for entire image to align small blocks or blocks in smooth region [15]. Different local areas of the same image have different characteristics.

Aligning directions based on local characteristics provides better results.

In wavelet-based video coding, we deal with motion compensated prediction residuals instead of images. Kamisli and Lim showed that prediction residuals and images have different spatial characteristics [43]. Images have 2-D anisotropic structures while prediction residuals have 1-D anisotropic structures. Kamisli and Lim proposed 1-D DCT for compressing prediction residuals [43]. They also applied 1-D DA-DWT to prediction residuals [44]. They compared the compression performance based on number of nonzero transformed coefficients instead of number of bits.

Because of different spatial characteristics, 2-D DA-DWT compresses prediction residuals inefficiently. In wavelet-based video coding, temporal low-pass prediction residuals (T_L) are similar to images but high-pass ones (T_H) are similar to prediction residuals [45].

We propose another three algorithms in this dissertation to improve DA-DWT’s coding performance on images and prediction residuals. First, we suggest a direction alignment algorithm to reduce DA-DWT’s side information [46]. Second, we extend

the suggested direction alignment algorithm to 2-D SA-DWT. Third, we analyze prediction residuals’ characteristic in frequency domain and their transformed coefficients. We applied 2-D DA-DWT on T_Ls in previous research [45]. Now, we suggest a 2-D MSA-DWT for compressing T_Hs. Our suggested direction alignment algorithm saves about 60% side information at the cost of about 3% prediction error increment. It also improves DA-DWT’s coding gain about 0.4 dB at low bit rate. 2-D MSA-DWT also provides better coding performance than 2-D SA-DWT on T_Hs and improves about 0.1~0.2 dB in gain.

This dissertation is organized as follows. Chapter 2 introduces the adopted directional filter banks and direction-adaptive wavelet transforms. Chapter 3 gives the introduction of temporal transform and adopted arithmetic coding. Chapters 4 and 5 desribe the proposed algorithms for WBCT and DA-DWT. Chapter 6 gives the experimental results and Chapter 7 contains the concluding remarks. The major contributions of this dissertation are listed as follows.

(1) We design short-length 2-D directional filters to save computational power of directional transform in WBCT.

(2) We propose a mean-shift-based decision scheme to dynamically select the proper subbands for directional transform.

(3) We fine-tune the context tables of 3-D ESCOT to match the data directionality.

(4) We propose a direction alignment algorithm for DA-DWT to reduce side information.

(5) We extend the proposed direction alignment algorithm to SA-DWT.

(6) We modify SA-DWT for compressing T_Hs in wavelet-based video coding.

在文檔中以方向性小波轉換為基礎之影像與視訊編碼 (頁 19-27)