PSNR old: PSNR value obtained by the original coding procedure (MSSVC.) PSNR new: PSNR value obtained by the modified method.
Test sequences: 4CIF - Crew, Harbour, Soccer and City
CIF – BUS, FOOTBALL, FOREMAN and MOBILE
Table 4.4 : PSNR of CREW
PSNR new PSNR old
width height frames/s Kbit/s
Y U V total Y U V total
176 144 15.0 96 31.93 35.51 33.97 32.87 31.94 35.49 33.95 32.87 176 144 15.0 192 34.18 38.07 36.41 35.2 34.19 38.13 36.43 35.22 352 288 30.0 384 33.36 37.49 36.04 34.49 33.35 37.47 36.09 34.49
704 576 30.0 1500 35.8 39.61 39.63 37.08 35.80 39.58 39.61 37.06 704 576 60.0 3000 36.88 40.5 41 38.17 36.87 40.48 40.98 38.16
Table 4.5: PSNR of HARBOUR
PSNR new PSNR old
width height frames/s Kbit/s
Y U V total Y U V total
Table 4.6: PSNR of SOCCER
PSNR new PSNR old
width height frames/s Kbit/s
Y U V total Y U V total
width height frames/s Kbit/s
Y U V total Y U V total
176 144 15.0 64 30.8 40.14 40.88 34.03 30.80 40.14 40.88 34.03 176 144 15.0 128 33.5 41.77 43.66 36.57 33.50 41.77 43.69 36.57 352 288 30.0 256 30.55 41.18 42.82 34.36 30.54 41.07 42.83 34.36 352 288 30.0 512 32.98 41.99 43.61 36.25 32.98 42.01 43.62 36.26
704 576 30.0 1024 32.9 41.59 43.73 36.16 32.90 41.63 43.73 36.16 704 576 60.0 2048 35.07 43.17 45.06 38.08 35.06 43.15 45.06 38.08
Table 4.8:PSNR of BUS
PSNR new PSNR old
width height frames/s Kbit/s
Y U V total Y U V total
Table 4.9: PSNR of FOOTBALL
PSNR new PSNR old
width height frames/s Kbit/s
Y U V total Y U V total
Table 4.10: PSNR of FOREMAN
PSNR new PSNR old
width height frames/s Kbit/s
Y U V total Y U V total
Table 4.11: PSNR of MOBILE
PSNR new PSNR old
width height frames/s Kbit/s
Y U V total Y U V total
176 144 7.5 48 22.54 27.47 26.37 24 22.54 27.47 26.51 24.02 176 144 15.0 64 23.08 28.23 27.31 24.64 23.08 28.24 27.29 24.64 352 288 15.0 128 23.8 28.48 28.04 25.29 23.80 2853 28.10 25.3 352 288 15.0 256 26.94 31.37 30.76 28.32 26.95 31.38 30.77 28.32 352 288 30.0 384 28.57 32.74 32.39 29.9 28.57 32.74 32.35 29.9
4.6 Appendix B: Statistics of SB-Reach Depth and Block Size
Figure 4.6 to Figure 4.8 show the distributions of the block size and SB-reach depth selected in the tests of BUS, FOOTBALL and FOREMAN. From these statistics, SB-reach disable, 2x2 and 4x4 are the most three popular modes in block size selection. From the SB-reach depth distribution, we can find that 2 is chosen most frequently.
(a) (b)
Figure 4.6: (a) Block size distribution and (b) SB-reach depth distribution of BUS
(a) (b)
Figure 4.7: (a) Block size distribution and (b) SB-reach depth distribution of FOOTBALL
(a) (b)
Figure 4.8 (a) Block size distribution and (b) SB-reach depth distribution of FOREMAN
Table 4.12 shows the bits saving ratio and the overhead ratio of the coded BUS, FOOTBALL and FOREMAN bit streams. The formulas of the bit saving ratio and the overhead ratio is defined in (4.1) and (4.2) and the meaning of a, b and c is are shown by Figure 4.9.
Overhead ratio: b (4.1) b c+
Figure 4.9: ‘a’ the original bitstream without SB-reach method and the bitstream ‘b+c’ is the one with SB-reach method. ‘b’ means the overhead of SB-reach method and ‘c’ means the reduced size bitstream
‘a’ after using SB-reach method.
Table 4.12: Bit saving ratio and overhead ratio for BUS, FOOTBALL and FOREMAN
decrease ratio(%) overhead ratio(%)
BUS 0.47 0.3
FOOTBALL 1.31 0.79
FOREMAN 0.53 0.36
Chapter 5
Directional Multiresolution Transform
5.1. Motivation
In an image and video coding scheme, the spatial transform plays an important role. The image data is transformed to frequency domain by a spatial transform. For typical image, the energy is concentrated to some frequency bands, usually lower frequency bands, and due to the energy compaction property, the compression
efficiency can be improved. There are a few popular spatial transforms used for image compression, such as FFT, KLT, DCT and wavelet transform. However, for the reason of performance and realization, most image or video coding systems adopt DCT and
wavelet transform. 2D-DCT is adopted as the spatial transform module by JPEG, MPEG-1, MPEG-2 and H.264. In JPEG2000 and interframe wavelet coding schemes, the wavelet transform is used.
Wavelets are claimed to be more efficient to represent point abrupt changes and singularities. Because of its good approximation performance in one dimension, wavelets are used in signal processing very frequently. However, in two dimensions, the performance is not as good as in the one dimension. 2D separable wavelets are well adapted to point-singularities, but poor in line- or curve-singularities. In the past decade, Candes and Donoho [12] pioneered a new representation, which is named curvelet to approximate the behavior of 2D smooth functions. Inspired by curvelet, Minh N. Do [13] proposed contourlets to build a new image representation. We will describe this method and apply contourlet to image coding in the fallowing sections.
5.2 Contourlet
Contourlets proposed by Do combine the good properties of curvelets and subband decomposition. It mainly decomposes image in two steps: (1) global
multiscale transforms and (2) local directional transforms. The first step is doing edge detection and applying a wavelet-like transform. In the second step, local directional transforms are used to cover contour segments.
In practice, Do suggests a double filter bank approach. His pyramidal directional filter bank consists of the Laplacian pyramid and the directional filter banks. As shown in Figure 5.1, The Laplacian pyramid decomposes an image into a lower frequency band with 1/4 scale of original data and a higher frequency band. The higher frequency frame is processed further by a directional filter bank, which can have 4, 8 and 16 bands. The lower frequency band can be remained or it can be
further decomposed into low-pass and high-pass bands. Laplacian pyramid is used to cover the point discontinuities and the directional filter bank is used to represent line-segment structures. Thus, the contourlet provides multiresolution decomposition and directional decomposition for an image. Because the contourlet uses the
Laplacian pyramid, it contains redundancy factor up to 1.33, and is not critically sampled.
Figure 5.1: Block diagram of Pyramidal directional filter bank. Multiscale decomposition is at the first stage. Down sampling is applied on the lower frequency band and higher frequency band is followed
by a directional filter bank.
In the next two sections, we describe the design of Laplacian pyramid (LP) and directional filter bank (DFB).
5.3 Laplacian Pyramid
Laplacian pyramid, which is proposed by Burt and Adelson[14], is used to achieve multiscale decomposition. Once Laplacian pyramid is applied, the low-pass image is generated from original image, and then down sampled. The difference of the original image and the predicted image produced from the low-pass image produces
frame of the original image can be then generated.
LP decomposition introduces an over sampling with a ratio of 1.33. On the other hand, wavelet scheme is critically sampling. Intuitively, we see the drawback of LP decomposition may influence coding efficiency. However, the LP decomposition does not have “scrambled” frequency, which happens in the wavelet filter bank. This situation appears when high-pass signal, which is down sampled, is folded back into the low frequency band, and cause its spectrum being reflected (see Figure 5.2). LP decomposition only down sample the low-pass channel and “Scrambled” frequency is avoided.
Highpass(HP)
Downsampled HP
Figure 5.2: Illustration of the “ frequency scrambling.” Upper: spectrum after high-pass filtering.
Lower: spectrum after high-pass filtering and downsampling. We can see that the high-pass spectrum is folded back into the low frequency region.
The architecture of LP is shown in Figure 5.3. H and G are orthogonal filters. X is the input image. And C is the coarse version of X and D is the difference between the original image and the reconstruction of C.
M
Figure 5.3: The analysis side of LP scheme. C is the coarse version of original image and D is the difference between C and input X.
The corresponding synthesis side has two input data - C and D. C is up sampled and then filtered by G. Its output is added by D. The final reconstruction X is then generated (Figure 5.4).
Figure 5.4: The synthesis side of the LP scheme. X’ is the reconstructed image.
In realization, the two filters, H and G., have to be selected. We use the 9/7 filters for the LP structure. The coefficients of 9/7 filters are shown in Table 5.1[15].
Table 5.1: 9/7 filter taps
h[n] g[n]
0.037829 -0.064539 -0.023849 -0.040690
-0.110624 0.418092
5.4 Directional Filter Bank
A 2-D Directional filter bank (DFB) is proposed by Bamberger and Smith [16] in 1992. DFB basically partitions the spectrum of 2-D data into wedge-shaped frequency regions and each partition region corresponds to a subband. In realization, a tree structure is used to implement DFB. The number of partition region depends on the level of tree structure realization. For example (Figure 5.5), if the level of tree structure is n, wedge-shaped frequency partitions are generated in frequency domain.
Figure 5.5: DFB with the level of tree structure n = 3 and there are 23 of frequency partition regions.
The construction of DFB involves the QFB’s and fan filters. QFB (quincunx
Figure 5.6: QFB with sub-lattice sampling Q and fan filters. This also forms the first level of DFB with two directions.
In current case, Q can only be Q0 and Q1, which represents two-dimensional quincunx sub-lattice as shown in Figure 5.7. That is, Q0 and Q1 are applied to the coordinate indices
Figure 5.7: Quincunx sampling lattice
With the expansion of the QFB’s, the tree structure become larger and the directions of DFB also increase. In the fallowing discussion, we will use 4-direction and 8-direction DFB’s in our image coding process.
5.4.1 Fan Filter Design
The fan filters are the key components of DFB. In this section, we will describe how to design these filters by using the biorthogonal fan filters designed by Phoong et al. [17].
To obtain the fan filter, diamond-shaped filters are first designed and then modulate the diamond-shaped filters to the fan filters. At first an all pass filter is first established by (5.2) in one dimension. We use the coefficients listed in Table 5.2 (Phoong[17]) as the base of our filter design. The function is derived using {Vk} according to (5.2). In this case, N1 is 6 and theβ
( )
z is 12-taps type II filter.Table 5.2: The base coefficients of 23-45 fan filters
V1 0.630
Because the fan filters are 2-dimensional filters, the shape of filter taps is 2-dimensional too. In the Phoong’s thesis, the diamond-shaped filter is the goal of the design. Figure 5.8 shows the ideal diamond-shaped filter.
Ω1
Figure 5.8: Ideal diamond-shaped filter
The analysis and synthesis diamond-shaped filters are formed by (5.3).When β(z) is replaced by β(z z0 1−1), the 1-dimensional case is turned into 2-dimensional.
( ) ( ) ( )
With the 6 coefficients of base taps, analysis filters, H0 and H1, have 23x23 and 45x45 2D taps areas. Also, the distribution of taps has the diamond shape. Figure 5.9 shows the impulse response (coefficients) of H0( ) and Figure 5.10 shows the spectrum of the designed H0( ).
0, 1
z z
0, 1
z z
Figure 5.9: H0(z z0, 1) designed with 6 base taps.
Figure 5.10: The spectrum of H0( ). The spectrum is FFT shifted. The value 0 and 1on the axis represent -π and π. The value 0.5 means frequency value 0.
0, 1
z z
When the diamond-shaped filter has been designed, the next step is to generate the fan filters. Fan filters can be viewed as the modulated diamond-shaped filter. The spectrum of the fan filters is that of the diamond-shaped filter with frequency shifted by –π or π along the X or Y axis. To shift spectrum by –π or π, the modulation operation is applied (5.4).
[ ]
(
0) (
0) (
0) (
1 1
cos 2 5.4
2 d 2 d
X nT πnf T ↔ X f − f + X f + f
)
In this case, “ f ”, the frequency shift value, is set to be π. The period “T” is set to 0 be 1
2π . The time domain formula can thus be rewritten toX nT[ ]cos
(
nπ)
. Themodulation function cos n
(
π)
is in fact a sequence of interleaved +1 and –1. We multiply the modulation sequence to the taps of the analysis and synthesis diamond-shaped filters. Figure 5.11 shows the spectrum of the analysis diamond-shaped filter - H0(z z0, 1) after frequency shifted byπalone Y axis1
The input and the output energy levels of the 23-45 fan filters should be the same.
The DC value of H0( ) and F1( ) designed is 0.5 (DC value equals sum of total tap values.). On the other hand, the DC value of H1( ) and F0( ) is 1.
The DC levels of signals after passing through two analysis filters are not equal, but the DC value of each analysis and synthesis pair is the same, 0.5*1. Figure 5.12 illustrates DC response of each filter in this structure.
0, 1
Total DC level is 0.5
X X'
Analysis Synthesis
Figure 5.12: Illustration of DC level in DFB. The DC level of Y0 is half of Y1 because of the different DC levels of H0(z z0, 1) and H1(z z0, 1)
We have to modify the DC levels of the fan filters to produce equal DC levels so that the total system has equal DC responses. The simplest way to achieve this goal is to modify the DC value response of H0( ) and F1( ) from 0.5 to 1. Thus, the value of total taps should be multiplied by 2. With this, the signal after passing each analysis filter bank is at the same DC level and the DC response of total system are 1 after adjustment.
0,
z z1 z z0, 1
5.4.2 Quincunx Sampling Lattice
Another key component of DFB is the quincunx sampling lattice. We have shown the sampling lattice Q0 and Q1[18] in formula (5.1). Other than doing lattice sampling, Q0 and Q1 also rotates the sampled image by 45° and -45°. Figure 5.13
shows an example of downsampling by Q0 and Q1.
(a) (b) (c)
Figure 5.13: (a) The original image.
(b) Downsampling by Q0. The original image is downsamped and rotated by 45°
(c) Downsampling by Q1. The original image is downsampled and rotated by -45°
If Q0 and Q1 are used to upsample an image, the image would be rotated by –45°
and 45°. The data of the rotated image are distributed as in Figure 5.7. The blank points are the zero filling point after interpolation.
5.4.3 Patterns of Sampled Images
This section discusses the data formation that presents in the DFB data flow and the outcome when these data are downsampled by Q0 or Q1.
One of the data formats is simply the normal image representation, square-like or
Figure 5.14: Square-like discrete representation.
When the downsampling lattice is applied on a filter bank, we shift one pixel in X or Y direction on one way of a filter bank and the data are separated into two forms shown in Figure 5.15(b)(c). In reconstruction, the two separated data can be combined directly without shifting, if they are arranged in a normal data array. (The data are arranged in positive X and Y axis.)
(a)
(b) (c)
x
x x
y
y y
Figure 5.15: (a) Data partition of a rectangular-shaped image.
(b)(c) The downsampled version of a rectangular-shaped image. We can see that (c) has the same
sampling lattice as (b) with apply one pixel shifted along on X or Y direction.
The other sampling pattern appears in DFB is the diamond-shaped images. At the first level of QFB, an image passes through a fan filter bank and a downsampling lattice. As describe before, the down-sampling process rotates an image by 45°. Figure 5.16 shows an example of the diamond-shaped pattern.
Figure 5.16: The pattern of a diamond-shape like image.
When downsampling into two images, the sampling point should also shift one pixel in the X or Y direction on one of the images. As we can see in Figure 5.17(a), one filter output samples contain the left border and the other contains the right border.
In reconstruction, each set of data is shown in Figure 5.17(b)(c). When combining these two-separated set of data, we need to shift one of them to overlapping. This process is not needed when the original image has the rectangular shape.
x y
y y
x x
(a)
(b) (c)
Figure 5.17: (a) An example of the diamond-shaped image. (b) and (c) are sampled data from (a) under the critical sampling condition. When (b) and (c) are combined to reconstruct (a), (c) have to be shifted
to avoid in case of data overlapping.
5.4.4 Equivalent Representation of DFB
In order to achieve eight or more directional frequency partitions, the QFB have to be combined together with the resampling operations at the third level as shown in Figure 5.18.
QFB
d_Q0 Ri d_Q0 Ri
Figure 5.18: Block diagram of QFB with resampling operation
The resampling matrixes are shown in (5.5). Using these sampling matrixes does not change the image data rate but rearrange the position of image data. Note that R0R1 = R2R3 = I2. Figure 5.19 shows an example of “lena” rearrange by the resampling operation R0 and Figure 5.20 shows the output shapes of an image resampled by R0, R1, R2 and R3, respectively.
0 1
Figure 5.19: An example of the resampling operation- R0. Note that the data coordinates are changed by the “upsampling” process, although the data rate does not increase nor decrease.
(0,0) (0,0)
Figure 5.20:Resampled images in four cases (a) Resampling by R0; (b) Resampling by R1;
(c) Resampling by R2; (d) Resampling by R3.
Figure 5.21 shows the third level DFB on the analysis side and the equivalent block diagram. Note that it is an example of the filter transform type. Replacing Ri
and Qj, the different frequency partition can be generated.
The left side of Figure 5.21 is the original block diagram of the third level DFB.
If the “downsampling by R0” is moved forward to the next two branches, the filter taps of H0 and H1 have to be upsampled by R0. After upsampling, the frequency response of H0 and H1 become F0 and F1. The combination of two sample matrixes becomes P0 (P0 = R0*Q0) and Figure 5.22 shows the image with downsampled by P0. R0 rearranges data but does not change the data rate. On the other hand, Q0 has a real downsampling property. So P0 can change data rate and reshape the sampled image.
To explain in detail, the image downsampled by P0 is actually only downsampled along X-axis and the downsampled data are rearranged into a shape of parallelogram.
To produce more directional frequency partitions, this structure can still be used and the main difference is only the change of sample matrixes. The equivalent architectures can be obtained by using a similar method.
H1
Figure 5.21: An example of the third level in DFB. Left: the original block diagram. Right: the equivalent block diagram.
(0,0) x
y
: the original image : the resampled image
If the outputs of the equivalent block diagram are connected to the channel or the synthesis side, the sampling matrix Pi can be facilitated to downsample along one dimension and no data need arrangment. The equivalent block diagram simplifies sampling process but the drawback is that the kinds of equivalent filters increased with the numbers of directional partitions.
5.4.5 The Architecture of Directional Filter Bank
In the previous two sections, we have discussed the basic function of key components in DFB. DFB can be constructed with these components. Figure 5.6 shows the simplest two-directions filter bank. In the fallowing case, four-directions filter bank and eight-directions filter bank are introduced.
First, we describe the four-directions filter bank. If the level of tree structure is 2, the 4 ( )-directions filter bank is constructed. The analysis and synthesis sides contain two stages of tree structure as shown in Figure 5.23. The two components described in the previous section are the keys to the whole architecture.
22
Stage 1 Stage 2 Stage 2
Analysis Synthesis
Stage 3
Figure 5.23: The architecture of 4-directions filter bank.
The architecture of 8-directions filter bank is the extension of the architecture of 4-directions filter bank. The level of the tree structure is 3. On the first and second stage, the architecture is the same to the architecture of the 4-directions filter bank. On the third stage, the analysis and synthesis sides are a little different to the analysis and synthesis sides on the first and second stage. There are additional sampling processes at the beginning of the analysis side and at the end of the synthesis side. Figure 5.24 shows the architecture of the 8-directions filter bank.
Figure 5.24: The architecture of 8-directions filter bank. The first and second stages are extended from the 4-directions filter bank. The additional sampling processes are on the third stage.
As we describe in the previous section, the additional processes, R0, R1, R3 and R4, can be moved forward on the QFB analysis side and backward on the synthesis side. Two components have to be adjusted to represent the equivalent system. One is
lattice Qm to Qm×Rn . In this case, the matrix, Qm×Rn, can be facilitated to downsample along one dimension. Figure 5.25 shows the architecture of 8-directions filter bank with the equivalent diagrams.
Figure 5.25: The architecture of 8-directions filter bank. The equivalent diagram replace the original diagram (Figure 5.24) on the third level.
5.5 Pyramidal Directional Filter Bank
The pyramidal directional filter bank (PDFB) combines directional and multiscale decomposition. Multiscale decomposition is first applied to the data and generates a high-pass frequency image and a low-pass frequency image, where the low-pass image is subsampled while the high-pass is not. The scheme can be iterated on the low-pass frequency image a few times if needed. Directional decomposition is
applied to the high-pass frequency image and the different number of directions is defined by the user at each scale. Figure 5.26 shows the block diagram of PDFB.
Figure 5.26: The block diagram of PDFB. Multiscale decomposition is first applied to generate the low-pass and high-pass images. The low-pass image can be further decomposed using the same
structure on the next level. A directional decomposition is applied to each high-pass channel.
structure on the next level. A directional decomposition is applied to each high-pass channel.