# A wavelet multiresolution compression technique for 3D stereoscopic image sequence based on mixed-resolution psychophysical experiments

全文

(2) 706. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. image transmission (storage) requires twice the conventional monocular transmission (storage) bandwidth. However, several schemes [2,5,8,14,16,17] have been developed to exploit the disparity relation to achieve compression ratios higher than that are obtained by the independent compression of the two pictures. In order to further increase the compression ratio, in this paper, we apply the mixed-resolution coding technique [11,14,18] to incorporate with the disparity-compensated stereoscopic image compression. The mixed-resolution coding is a perceptually justi"ed technique that is achieved by presenting one eye with a low-resolution picture and the other eye with one high-resolution picture. Psychophysical experiments [11,14,18] have shown that a stereo image pair with one high-resolution image and one low-resolution image provides the almost same stereoscopic depth to that of a stereo image with two high-resolution images. By combining both the mixed-resolution coding and disparity-compensated techniques, one reference (left) high-resolution image sequence can be compressed by motion-compensated scheme [19,20] independent of the other (right) image sequence. By performing low-resolution disparity-compensated technique, the disparity is able to predict the lowresolution right image sequence from the left image sequence at a lower resolution using the disparity relation. The low-resolution images are obtained using the well-known novel wavelet decomposition [1,10,20]. Another advantage of the wavelet decomposition is that it is very suitable for image compression [1,20]. After the wavelet decomposition, an image is divided into several layers with di!erent importance. Subimages at di!erent layers correspond to di!erent resolution and di!erent frequency ranges, which match the frequency-selected properties of human visual system. Antonini et al. [1] proposed a multiresolution-codebook-based vector quantization (VQ) technique for the wavelet transform, where each subcodebook corresponds to a wavelet subimage at its corresponding resolution level. In Section 3, the VQ-based wavelet multiresolution compression technique is proposed to code the left image of a still stereo image. However, for stereo image sequence, we propose an interframe hybrid DPCM/DWT/VQ scheme to. code its left image sequence, where DWT denotes the discrete wavelet transform. The DPCM produces a number of prediction error subimages for the motion-compensated wavelet-decomposed subimages in order to improve the reconstruction image quality. These prediction error subimages are vector quantized using a multiresolution codebook. For low-resolution right image sequence, an inter-view hybrid DPCM/DWT/VQ scheme is applied to the low-resolution right sequence. The DPCM generates a number of prediction error subimages for the disparity-compensated subimages. Similarly, these prediction error subimages are also vector quantized. Since the estimation of both the motion vector and disparity is the computational burden of the joint motion and disparity compensated technique, we apply the variable block-size multiresolution block matching method [17,20] to reduce their computational complexity. These estimated motion vector and disparity are then DPCM coded, and all quantities are entropycoded prior to the transmission.. 2. Stereoscopic image compression using mixed-resolution coding techniques 2.1. Theory of stereovision The sense of stereovision is normally simulated by viewing a true, three-dimensional scene. It is possible to stimulate the sense of stereovision arti"cially by acquiring two pictures of the same scene from separated positions, and by presenting the left picture to the left eye and the right picture to the right eye. Two pictures acquired in this manner form a stereopair. One of the most important ideas in the study of stereopairs is that of disparity. Fig. 1 illustrates the concept of disparity. Given a point A in the left picture, its matching point B in the right picture does not in general lie directly underneath A. The vector connecting B to A has been called the disparity, the stereo disparity, the binocular disparity, and the binocular parallax of the point pair (A, B). The disparity d associated with the point pair (A, B) consists of two components: a horizontal component d and a vertical compon ent d . Depending on the camera geometry being .

(3) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. 707. Fig. 1. Stereo disparity: A and B are matching points in the stereopair, and d is the disparity vector.. used, each component of the disparity can be either positive or negative. When negative disparity occurs, the scene being viewed will appear to be #oating in the space between viewer's eyes and monitor. This type of imagery cannot be produced without the aid of stereoscopic devices such as shutter glasses. For another case of parallel axes camera geometry, the vertical component of disparity is always zero and the horizontal component of the disparity is always positive. This implies that the parallel axes geometry processes a simple mathematical relationship between the disparity of a point pair and the distance to the object it represents. In general, the disparity vector d can be used to predict one image of a stereopair from the other. For example, given a luminance level of left picture at a position p, I (p), the luminance level of its * corresponding right picture can be calculated as I (p)"I (p#d), (1) 0 * where d denotes the disparity vector whose direction from left to right. 2.2. Mixed-resolution coding for stereopair data compression using wavelet multiresolution techniques Mixed-resolution coding is a perceptually justi"ed technique for compressing stereopairs. The compression is achieved by presenting one eye with a low-resolution picture and the other eye with a high-resolution picture. Psychophysical experiments [11,14,18] have shown that a stereo image with one high-resolution image and one lower resolution image is su$cient to provide the almost same stereoscopic depth as that of a stereo image with two high-resolution images. Thus, the eye/brain can easily fuse such stereopairs and per-. ceive depth in them. In summary, the concept of mixed-resolution technique can be symbolically represented by the following equations: (Stereo image) +(High resolution left image) #(Low resolution right image),. (2). (LR right image) "(LR left image) #(Disparity between both LR images),. (3). where LR denotes the low-resolution image. From the above discussion, the mixed-resolution coding is able to signi"cantly reduce the bit-rate required to transmit a stereo image with two highresolution images. To implement the mixed-resolution coding, one of the novel techniques is based on the well-known wavelet multiresolution signal representation [1,10,20]. In the remainder of this section, we would give a brief review on the wavelet multiresolution technique. Wavelets are functions generated from one single function t by dilations and translations. . t (x)""a"t ? @. x!b , a3R>, b3R, a. (4). where the shift parameter `ba gives the position of wavelet, whereas the dilation parameter `aa governs its frequency. The mother wavelet t has to satisfy the admissibility condition. . . "WK (u)" "u"\ du(R, \ where WK is the Fourier transform of t.. (5).

(4) 708. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. For a1, the wavelet t is a very highly ? @ concentrated `shrunkena version of the mother wavelet, with frequency content mostly in the highfrequency range. Conversely, for a1, t is very ? @ much spread out and has mostly low frequencies. Grossman and Morlet [7] showed that any square integrable function f (x)3¸ (R) can be repre sented in terms of a set of wavelet basis functions that cover all the scales. Such a representation writes f as an integral over a and b of t with ? @ appropriate weighting coe$cients [4]. Practically, one prefers to write f as a discrete superposition since sum is preferred rather than integral. Thus, one introduces a discretization, a"2K, b"n2K, with m, n3Z. The wavelet becomes therefore t (x)"2\Kt(2\Kx!n), (m, n)3Z, (6) K L where m is a scaling parameter and n is a shift parameter. The basic idea of the wavelet transform is to represent any arbitrary function f as a superposition of wavelets, f (x)" =f (m, n)t (x), (7) K L K L where the wavelet transform =f (m, n) is de"ned as in terms of an inner product =f (m, n)"1 f (x), t (x)2 K L f (x)t (x) dx. (8) " K L \ =f (m, n) yields to the detail signal of f (x) at the resolution 2K. It is characterized by the set of such inner products. . = K f"(1 f (x), t (x)2) "+=f (m, n); n3Z,, (9) K L LZ8 where = K f is called the discrete detail signal or wavelet at the resolution 2K. It contains the di!erence of information between the approximation of f (x) at the resolution 2K> and 2K. In a multiresolution analysis, one really has two functions: the mother wavelet t and a scaling function . Let < denote the vector space spanned by K. , which are generated by the dilation and transK L lation of the scaling function :. (x)"2\K (2\Kx!n). K L. (10). The vector space < can be interpreted as the set K of all possible approximations at the resolution 2K of functions in ¸(R). These spaces < describe K successive approximation spaces, 2< -< < -< -< 2, each with resolution 2K. For \ \ each m, the t span a space O which is exactly K L K the orthogonal complement in < of < ; the K\ K discrete detail signal = K f, therefore, describes the information lost when going from an approximation of f with resolution 2K\ to the coarser approximation with resolution 2K. Therefore, a discrete wavelet transform at resolution depth M decomposes a signal f (x) into a set of signal segments at di!erent scales: f (x)P+S + f, = + f, = +\ f,2, = f ,. . (11). Let +S Kf : m"0, 1,2, M, represent a set of ap proximations of f (x) at resolution levels +m" 0, 1,2, M,. S K f is de"ned as S Kf"(1 f (x), (x)2) , K L LZ8. (12). where Sf (m, n)"1 f (x), (x)2, m, n3Z. K L Obviously, S + is the approximation of f (x) at the lowest resolution level M and S f"f is the orig inal signal. Therefore, +S Kf : m"0, 1,2, M, form a multiresolution pyramid where resolution increases as the layer decreases. All layers in the pyramid structure are highly correlated since the higher layers of the pyramid constitute a subset of the lower layers, i.e., S K f3S K\ f. In order to re move the interlayer redundancies, the pyramid +S +, = K : m"0, 1,2, M, is de"ned as = K f"Di! (S K\ f, S Kf ), m"0, 1,2, M, . (13). S K f"Add (= K>f, S K>f ), m"0, 1,2, M!1, (14) which carries the information from layer m to m!1 ("ner approximation) or conveys the lost information from layer m!1 to m (coarser approximation). The Di!( ) and Add( ) operations in (13) and (14) correspond to the expansion of the scaling function and wavelet functions, respectively [20]. The extension of the 1D wavelet transform to 2D is straightforward. A separable wavelet transform is the one whose 2D scaling function (x, y) can be.

(5) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. written as. (x, y)" (x) (y).. (15). Mallat [10] showed that the 2D wavelet at a given resolution 2K can be completely represented by three separable orthogonal 2D wavelet basis functions in ¸(R): t (x, y)" (x)t (y), (16) KLJ KL KJ t (x, y)"t (x) (y), (17) KLJ KL KJ t (x, y)"t (x)t (y), (18) KLJ KL LJ where n and l are shift parameters for x- and ydirection, respectively. Therefore, a 2D dyadic wavelet transform of image f (x, y) between the scale 2 and 2+ can be represented as a sequence of subimages:. 709. g (n) are the conjugate "lters of h(n) and g(n), respectively. The "lter pair H and G related by h(n) and g(n) correspond to the expansion of the scaling and wavelet functions, respectively. This decomposition provides subimages corresponding to di!erent resolution levels and orientations shown in Fig. 3 with resolution depth 2 (M"2). The wavelet reconstruction scheme of the image is illustrated in Fig. 4. Zhang and Zafar [20] have shown that the decomposed image forms a pyramid structure up to M layers with three detail subimages in each layer and one lowest resolution subimage on the top. The pyramid structure of the 2D wavelet decomposition with resolution depth 3 is depicted in Fig. 5,. +S + f, [=H +f ] , , [=H f ] ,, (19) H 2 H where S K f is the approximation of image f (x, y) at the lowest resolution 2K and the three detail subimages at resolution 2K are de"ned by =HK f"(1 f (x, y), tH (x, y)2) , 1)j)3. (20) KLJ L JZ8 The 2D separable wavelet decomposition can be implemented "rst in columns and then in rows independently. Fig. 2 shows the data structure of wavelet decomposition. g(n)"(!1)Lh(!n#1) and h(n)"2 (x!n) (2x) dx denote the 1D high-pass and low-pass "lters, respectively. hI (n) and. Fig. 3. Frequency band distribution of wavelet decomposition.. Fig. 2. Wavelet decomposition of an image S H>f into S Hf, =Hf, =Hf and =Hf. .

(6) 710. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. Fig. 4. Wavelet reconstruction of an image S H>f from S Hf, =Hf, =Hf and =Hf. . Fig. 5. The pyramid structure of wavelet decomposition and reconstruction.. which consists of a total of seven subimages +S ,=,=,=,=,=,=,. The resolution de creases by a factor of 4 (2 in the horizontal direction and 2 in the vertical resolution) with each layer increased. After the wavelet decomposition, an image is divided into several layers with di!erent importance. Subimages at di!erent layers correspond to di!erent resolutions and di!erent frequency ranges, which match the frequency-selected properties of human visual system. It is well-known. that human viewers are more sensitive to lower frequency than higher frequency image components. Additionally, energies after wavelet decomposition become highly nonuniform. The higher the layer is, the higher the energy becomes. For example, over 80% of the energy is concentrated in the subimage S . For the implementation of mixed-resolution coding, the low-resolution right image can be obtained by performing the wavelet decomposition..

(7) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. 711. However, in order to further achieve the higher compression ratio, the next section will present a disparity-compensated multiresolution coding scheme which is able to compress the stereo image by aid of low-resolution disparity estimation.. the mixed-resolution psychophysical experiments, the disparity is estimated from both the left/right images at a low resolution, and the prediction error image is also obtained using the inter-view DPCM scheme from the low-resolution disparity-compensated right image.. 3. Disparity-compensated wavelet multiresolution coding for still stereo image compression. 3.1. Intraframe hybrid DWT/<Q scheme for still left images. This section presents a disparity-compensated wavelet multiresolution coding scheme for still stereo image in order to achieve the higher compression ratio. The left image is compressed independently of the right image using a combination of discrete wavelet transform (DWT) and vector quantization (VQ) [9,12,15]. Since the disparity can be used to predict the right image of a stereopair from the left image, only disparity has to be transmitted, together with the reconstruction (prediction) error or residual image for the disparitycompensated right image in order to improve the reconstruction image quality. Instead of transmitting the right image directly, this section proposes a new inter-view hybrid DPCM/DWT/SQ scheme to obtain both the disparity and the prediction error (residual) image. Furthermore, by employing. Fig. 6 shows the basic architecture of the intraframe hybrid DWT/VQ scheme applied to still image compression for M"2. Wavelet transform is made to organize the information of the original image into several resolutions. The image is split into 3;M detail subimages of wavelet coe$cients (resolution level 1 to M) and a subimage at the lowest resolution level M. Since statistics of the subimage at the lowest resolution is similar to the statistics of the original image, a DPCM technique is used to encode it. Only the detail subimages or wavelet coe$cients are vector quantized. Antonini et al. [1] have shown that statistics of detail or wavelet coe$cient's subimages were modelized by the generalized Gaussian. They use the generalized Gaussian probability density to design the vector quantizer for each detail subimages.. Fig. 6. Interframe hybrid DWT/VQ scheme for still left images..

(8) 712. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. Vector quantization has proven to be a powerful tool for digital image compression [9,12,15]. The principle involves encoding a sequence of samples (vector) rather than encoding each sample individually. Encoding is performed by approximating the sequence to be coded by a vector belonging to a catalogue of shapes, usually known as a codebook. Let x be an n-component source vector with joint probability function (pdf) f (x)"f (x , x ,2, x ). 6 6 L A vector quantizer VQ of dimension n and size ¸ is de"ned as a function that maps an arbitrary vector x3RL into one of ¸ output or reproduction vectors y , y ,2, y called codewords belonging to RL. * Thus, we have the chart VQ : RLP>,. (21). where >"+y , y ,2, y , is the set of reproduction * vectors called the codebook and VQ(x)"y if x3C G G where C is a region corresponding to y . G G The vector quantizer is completely speci"ed by listing the ¸ codewords y and their corresponding G nonoverlapping partitions C (i"1, 2,2, ¸) of G RL called Voronoi regions. A Voronini region is de"ned by the equation C "+x3RL" ""x!y "")""x!y ""iOj, (22) G G H and represents the subset of vectors of RL, which are well matched by the codeword y of the codebook. G "" "" denotes the ¸ norm. The total distortion per dimension of this quantizer is then given by 1 d " E+""x!VQ(x)"", L n. . 1 * " ""x!y ""f (x) dx. (23) G 6 n x G G Z! For the purposes of transmission or storage, a binary word c of length b bits called the index of G G the codeword is assigned to each reproduction vector y . Thus, vector quantization can also be seen as G a combination of two functions: an encoder that views the input vector x and generates the index of the reproduction vector speci"ed by VQ(x) and a decoder uses this index to generate the reproduction vector y via the same codebook as the coder. G The average binary word length is given by the. formula * H(y)"! p (y )log p (y ) bits/vector, (24) G G G G G which is the so-called entropy measure of the codebook that speci"es the minimum bit-rate necessary to achieve a distortion d . L The codebook is created and optimized using the well-known Linde}Buzo}Gray (LBG) [9] classi"cation algorithm with a mean square error criterion (23). This algorithm is designed to perform a classi"cation based on a training set which comprises of vectors belonging to di!erent images; it converges iteratively toward a locally optimal codebook. In addition, a globally optimal codebook can be obtained using the simulated annealing techniques [15]. However, global codebook design has drawbacks in that it results in edge smoothing (loss of resolution) and is not easy to take account the properties of human visual system. To tackle this di$culty, a multiresolution codebook [1] has been proposed to preserve edges. The multiresolution codebook contains a number of subcodebooks for wavelet coe$cients or detail subimages from resolution level 1 to M. Each subcodebook has been designed for the detail subimages at its corresponding resolution level. The subcodebook has a low distortion level and contains few words, which clearly facilitate the search for the best code vector; the coding computational load is reduced. In the multiresolution codebook, each detail subimage was associated with one subcodebook that was generated by LGB, and all the reproduction vectors for this subimage were quantized based on this subcodebook. Hence, before applying the Hu!man entropy coding, binary code words of equal length were used to represent the quantizer output. Antonini et al. [1] have proposed an optimal bit allocation method, which takes into consideration the fact that the human eye is not equally sensitive to subimages at all spatial frequencies. According to the contrast sensitivity data collected by Campbell and Robson [3], a controlled degree of noise shaping across the subimage =HK is de"ned as follows: B "rK log(p@K H), 1)m)M, 1)j)3, K H K B. (25).

(9) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. where p is the standard deviation corresponding K H to =HK and the values of r and b are chosen K H experimentally in order to match human vision. To apply noise shaping across the VQ subimages, a total distortion of the image for a total target bit-rate R is given by 2 1 D+ (R+ ) D (R )" 2 2 2+ 1/>".!+ 1/>".!+ + 1 # D (R );B , (26) K H K H K H 2K K H where D+ (R+ ) corresponds to the 1/>".!+ 1/>".!+ DPCM MSE distortion with a scalar quantizer of R+ bits per pixel and D (R ) denotes 1/>".!+ K H K H the MSE distortion in the VQ coding of the subbits per pixel. Note that image =HK for R K H D+ "G\D+ where G denotes the predic1/>".!+ 1/ tion gain and D+ denotes the PCM MSE distor1/ tion. Therefore, the problem of "nding an optimal bit assignment for each subimage vector quantizer is formulated as min D (R ) 2 2 0K H. (27). 713. + 1 1 subject to R " R+ # R . 2 2+ 1/>".!+ K H 2K K H (28) This minimization problem can be solved using Lagrangian multipliers. The optimal bit allocation for M"2 is carried out according to the suggestion of Antonini et al. [1]. The resulted bit assignment is illustrated in Fig. 7. Subimage = (diagonal orientation) is discarded. Subimages =, = and = are coded using 256-vector sub codebooks (codeword size 4;4) resulting in a 0.5 b/pixel rate, while subimages = and = are coded at a 2-b/pixel rate using 256-vector subcodebooks (codeword size 2;2). Finally, the lowest resolution is DPCM coded at 8 b/pixel. 3.2. Inter-view hybrid DPCM/DWT/SQ scheme The aim of disparity estimation is the matching of corresponding picture elements in the simultaneous two-dimensional (2D) pictures of the same 3D scene, viewed under di!erent perspective angles. Two of those pictures may be the left and right. Fig. 7. Bit-rate allocation for subimages in the intraframe hybrid DWT/VQ scheme for still color left images..

(10) 714. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. views of a stereopair. A number of block-based disparity-compensated methods have been proposed for coding stereopairs [17,19,20]. With block methods, it is assumed that the disparity between left image and right image is constant within a small two-dimensional block B of pels. ThereB fore, the disparity d can be estimated by minimizing the ¸ norm of disparity prediction error such as (29) DPE(d)" "I (z)!I (z!d)". * 0 z Z B Consider a block of P ;Q pels centered around B B pel z in the left image (reference image). Assume that a maximum horizontal and vertical disparity are p pels and q pels, respectively. Thus, the search B B region in the right image would be an area containing (P #2p )(Q #2q ) pels. A simpli"ed version B B B B of the criterion of (29) is given by DPE(z , x, y)" 1. "I (z #p, z #q) * PQ B B NW.B OW/B !I (z #p#x, z #q#y)", (30) 0 where !p )x)p ,!q )y)q , and z and B B B B z are the x- and y-coordinates of z , respectively. . The minimization of the disparity prediction error of (30) is performed using any one of the four promising methods used in motion estimation [13]. There are (i) full search, (ii) 2D-logarithmic search, (iii) three-step search, and (iv) modi"ed conjugate direction. According to the mixed-resolution psychophysical experiments, the block-based disparity estimation is performed at low resolution. The computational complexity of the low-resolution disparity estimation becomes smaller due to the smaller search area at the lowest resolution. At the receiver, the low-resolution right subimage is estimated using the disparity from the low-resolution left subimage. A full-size reconstruction is obtained by upsampling a factor of 4 and reconstructing with the synthesis low pass "lter. However, in order to improve the reconstruction quality of low-resolution right image, disparity has to be transmitted, together with the reconstruction error. In this section, we present an interview hybrid DPCM/ DWT/SQ scheme (shown in Fig. 8) to both improve the reconstruction image quality and achieve the higher compression ratio. An original right image S is "rst decomposed into a number of subimages. +S +, =H K; m" 1,2, M, j"1, 2, 3,. Only the lowest resolution. Fig. 8. Inter-view hybrid DPCM/DWT/SQ scheme..

(11) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. subimage S + is considered in our system. After using the block-based disparity compensation scheme, the disparity-compensated right subimage SK + is estimated from its corresponding low-resolu tion left images using the low-resolution disparity. Then the lowest resolution prediction error subimages PED for the disparity-compensated subim+ age SK + is formed using the disparity compensation. Since all the pixels in PED are uncorrelated, it is + suggested that the lowest resolution reconstruction error subimage PED is coded using an optimal + Lloyd Max scalar quantizer instead of DPCM. Furthermore, in order to increase the compression ratio, a DCT-based compression with a DPCM coding for its DC term and a scalar quantizer for its AC terms can be applied to PED . However, + Zhang and Zafar [20] have shown that DCT-based compression may introduce the undesired blocking artifacts. Thus, in the paper, we consider only the scalar quantizer for PED . Finally, disparity vec+ tors are DPCM-coded, and all the quantities are entropy-coded prior to the transmission according to a variable length code table which is quite similar to Table B.10 for motion vectors in MPEG-2 standard [6].. 4. Joint motion/disparity-compensated wavelet multiresolution coding for stereo image sequence compression Stereo image sequence processing requires estimation of the displacements created by the motion of objects and, also, by the disparity between two views of the 3D scene projected on the two images. The estimation of motion and disparity displacements may be performed separately. Motion estimation problem is similar to that of disparity estimation. The associated motion prediction error for the left (reference) image sequence is given by MPE(*)" "I (z)!I (z!*)", (31) * I * I\ z Z K where I (z) denotes the video frame k at z"(x, y) * I for left image sequence, * represents the motion vector, and B is the 2D block for motion estimaK. 715. tion of size P ;Q pels. Assume that the maxK K imum horizontal and vertical displacements are p and q , respectively. Thus, the search area in the K K previous frame is (P #2p )(Q #2q ). Similarly, K K K K the simpli"ed version of (34) can be written as MPE(z , x, y)" 1 "I (z #p, z #q). * I P Q K K NW.K OW/K !I (z #p#x, z #q#y)", (32) * I\ where !p )x)p , !q )y)q and z " K K K K (z , z ). Similarly, the motion vector can be found by applying any one of the promising search methods [13] to the minimization of (32). In order to further reduce both the computational complexity and searching time of above four methods, an e$cient hierarchical block matching algorithm [17,20] has been applied to both motion and disparity estimation, in which agreement of large block is "rst attained and block size is subsequently and progressively decreased. One approach to hierarchical block matching uses multiple resolution versions of the image, and variable block-size at each level of the pyramid [17,20]. In a multiresolution motion estimation (MRME) scheme, the motion vector "eld is "rst calculated for the lowest resolution subimage, which sits on the top of the pyramid [17,20]. Motion vectors at lower layers, of the pyramid are re"ned using the motion information obtained at higher layers and again propagated to the next pyramid level until the high-resolution level is reached. The motivation for using the MRME approach is the inherent structure of the wavelet representation. MRME schemes signi"cantly reduce the searching and matching time and provide a smooth motion vector "eld. A video frame is decomposed up to the two levels. A total of seven subimages are obtained with three subimages at the "rst level, and four on the top level including subimage S with the lowest frequency band. It is well known that human vision is more perceptible to errors in low frequencies than those incurred in higher bands to be selective in spatial orientation and position, e.g., errors in smooth areas are more disturbing to a viewer than.

(12) 716. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. those near edges. The subimage S contains a large percent of the total energy though it is only 1/16th of the original video frame size. Additionally, errors in higher layer subimages will be propagated and expanded to all subsequent lower layer subimages. To tackle this di$culty, Zhang and Zafar [20] have proposed a variable block-size MRME scheme to take all these factors into considerations. Assume that b is the size of the block used at the + lowest resolution or highest layer M. The size of the block varies with the resolution level; at level m it is b ;2+\K; then at level m#1 the block becomes + b ;2+\K\ in pixels of the level. If * K>(x, y) + G represents the motion vector centered at (x, y) for the subimage =G K> belonging to the (m#1)th layer, the initial estimation *K of the motion vecG tor for the same block at the mth level of the pyramid will be *K(x, y)"2* K>(x, y), i"1, 2, 3. (33) G G At the mth pyramid level, a full search is used with a small search area about the displaced position z #*K(x, y). In other words, the motion vecG tors at level m are given by (x, y)"2* K>(x, y)#(*x, *y) G G K "2+\K* +(x, y)#(*x, *y), (34) where * +(x, y) denotes the motion vector for the subimage S + and (*x, *y) is the incremental motion vector found by a full search with reduced search area. Fig. 9 shows an example of the proposed variable block-size MRME scheme. First, the motion vector * for the highest layer subimage S are estimated by full search with block size (b ) of 2;2. These motion vectors are then scaled appropriately to be used as initial estimates for motion estimation in higher resolution subimages. An alternative approach to "nd the incremental motion vector at level m is to use 2+\K times the motion vector for level M as an initial estimation of a full search with a relatively small search area. Similarly, Tzovaras et al. [17] have shown that the above variable block-size multiresolution block matching techniques are also valid for disparity estimation in order to reduce the amount of processing time. This particular disparity estimation. *. Fig. 9. Variable block-size multiresolution motion estimation (MRME) of the left image sequence.. scheme is called the variable block-size multiresolution disparity estimation (MRDE). The main di!erence between MRME and MRDE is that the disparity prediction error of MRDE is given by DPE(z , x, y)" 1 "I (z #p, z #q). * I PQ B B B NW. OW/B !I (z #p#x, z #q#y)", (35) 0 I where I (z) and I (z) are the kth left and right * I 0 I video frames at position z. Note that the MRDE is performed for both the left and right frames at the same time instant. Similarly, according to the mixed-resolution psychophysical experiments, this section presents a low-resolution MRDE scheme for disparity estimation and compensation, in order to improve the computational e$ciency of estimation. The low-resolution MRDE scheme illustrated in Fig. 10 requires a two-level wavelet decomposition. The "rst-level wavelet decomposition is used to obtain both the lowest resolution left and right video frames. The second-level wavelet decomposition is conducted to perform the MRDE at the lowest resolution. Fig. 11 shows the block #ow diagram of the joint motion/disparity-compensated wavelet multiresolution coding for stereo image sequence. However, the quality of the reconstructed stereo image sequence may be degraded using only the.

(13) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. motion and disparity vectors for some cases. In order to improve the reconstruction image quality, the motion/disparity vectors have to be transmitted together with prediction errors. The next two sec-. Fig. 10. Variable block-size multiresolution disparity estimation (MRDE) of the low resolution right/left image sequences.. 717. tions will propose two novel schemes to achieve the goal. 4.1. Interframe hybrid DPCM/DWT/VQ scheme for left image sequence using a variable block-size MRME scheme The left image stream is compressed using the hybrid DPCM/DWT/VQ scheme illustrated in Fig. 12 independent of the right image stream. Using wavelet decomposition, an original image S is "rst decomposed into subimages +S +,=H K; m"1,2, M, j"1, 2, 3,. After using the block-based motion compensation, the prediction error subimages, +PEM , PEM H ; m"1,2, M, j"1, 2, 3, + K for the motion-compensated subimages +SK +, = K HK; m"1,2, M, j"1, 2, 3, are obtained. The lowest resolution prediction error subimage PEM is + PCM-coded using an optimum scalar quantizer instead of DPCM since the pixels in PEM + are uncorrelated. However, the prediction error. Fig. 11. Encode-decoder structure for the joint motion/disparity-compensated stereoscopic image sequence compression scheme..

(14) 718. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. Fig. 12. Interframe hybrid DPCM/DWT/VQ scheme for the left image sequence using a variable block-size MRME.. subimages PEM H are vector quantized using K a multiresolution codebook, where each error image has its corresponding subcodebook. Fig. 13 shows the bit assignment for PEM and + PEM H , j"1, 2, 3; 1)m)M and M"2. The K motion vectors are DPCM-coded, and all quantities are then entropy-coded prior to the transmission. Furthermore, the variable block-size MRME scheme is applied to perform the motion estimation in order to improve the computational e$ciency of the proposed interframe compression scheme. For color video frames, the motion vectors obtained from the luminance component of the color video frames are used as the motion vectors for both the Cr and Cb chrominance components. Since the luminance component contains more than 60% of the total energy of the original image and Cr- and Cb-chrominance components have less than 20% of the total energy, the size of Cr- or Cb-component is 1/4th of the original video frame size. Thus, each chrominance component is decomposed into four. subimages. Fig. 13(b,c) shows the bit assignment for each chrominance component. 4.2. Inter-view hybrid DPCM/DWT/VQ scheme for low-resolution right image sequence using a variable block-size MRDE technique All the low-resolution right video frames can be estimated from their corresponding left video frames using the low-resolution variable block-size MRDE procedure described above. In order to improve the image quality of the reconstructed right video frames, both the disparity and reconstruction (prediction) errors should be transmitted. We use the similar architecture of inter-view hybrid DPCM/DWT/SQ scheme proposed in Section 3.2 to implement this concept. Fig. 14 shows its basic structure. The main di!erence between Figs. 8 and 14 is that Fig. 14 contains a two-level wavelet decomposition and a variable block-size multiresolution disparity estimation. Both the.

(15) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. 719. Fig. 13. Bit-rate allocation for prediction error subimages, PEM in the interframe hybrid DPCM/DWT/VQ scheme using MRME.. Fig. 14. Inter-view hybrid DPCM/DWT/VQ scheme for low-resolution right and left image sequences using MRDE..

(16) 720. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. lowest resolution left and right video frames are obtained by the "rst-level wavelet decomposition. The resulting low-resolution left and right video frames are again decomposed into the second-level lowest resolution subimage and the wavelets in di!erent scales and resolutions. The second-level prediction error subimages +PED , PED H ; m"1,2, M, j"1, 2, 3, for low+ K resolution disparity-compensated subimages are formed using the low-resolution variable block-size multiresolution disparity compensation (MRDE) scheme. Like the motion-compensated interframe compression scheme, the PED is coded using an + optimum scalar quantizer, and the subimages PED H are coded using multiresolution vector K quantizer. Fig. 15(a) shows the bit assignment for PED and PED H . For color stereo video frames, + + the disparity vectors obtained from their luminance components are used as the disparity vectors for their Cr- and Cb-chrominance components. Fig. 15(b,c) shows the bit assignment for the Crand Cb-chrominance components, respectively. Certainly, the proposed interview DPCM/DWT/ VQ scheme with variable block-size MRDE can be directly applied to still stereo image instead of in-. terview hybrid DPCM/DWT/SQ scheme in order to improve its computational e$ciency.. 5. Simulation results To examine the performance of the proposed compression methods, a still stereoscopic color test image `Achooa with both the left and right pictures illustrated in Fig. 16 is considered in our system. Each picture is a 640;480 color image and eight bits for each of R-, G- and B-color components. Since the RGB color components are strongly correlated, the encoding of a color image in the RGB domain is not very e$cient in the image compression. By performing a transformation of the RGB signals to the Y, Cr, Cb domain, it produces nearly decorrelated components and most of the signal energy is contained within the luminance (Y) signal. For simplicity, both the left and right pictures are decomposed to their lowest resolution subimages and detail subimages using Daubechie's = wavelet [4]. According to the mixed-resolution psychophysical experiments, the low-resolution disparity can be obtained from the luminance. Fig. 15. Bit-rate allocation for prediction error subimages, PED, in the inter-view hybrid DPCM/DWT/VQ scheme using MRDE..

(17) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. 721. Fig. 16. Still color stereoscopic image `Achooa. (a) Left image; (b) right image.. Fig. 17. Wavelet decomposition of the right picture of Achoo.. Fig. 18. Low-resolution disparity "eld for still stereoscopic image `Achooa.. components of these two resulted lowest resolution pictures. However, in order to improve the computation e$ciency of disparity estimation, a variable block-size MRDE scheme is applied to these two lowest resolution pictures. However, it requires the second-level wavelet decomposition for both the lowest resolution pictures. Fig. 17 shows the image decomposition of the right picture after performing a two-level wavelet decomposition. The resulting low-resolution disparity "eld is illustrated in Fig. 18. For the left picture, its lowest resolution subimage is DPCM-encoded, and its three detail subimages are vector quantized according to the bit assignment of Fig. 7. On the other hand, an interview hybrid DPCM/DWT/VQ scheme using. MRDE is applied to the right picture instead of the original inter-view hybrid DPCM/DWT/SQ scheme. The proposed compression techniques are also valid for Cr- and Cb-chrominance components except for that their disparities are obtained from Y-component. Finally, all quantities are entropy-coded. Throughout this paper, there are two measures used to evaluate the compression performance, i.e., compression ratio and peak signal-to-noise ratios (PSNR) for Y-, Cr- and Cb-components. The compression ratio is de"ned by (the number of bits in the original image) CR" . (the number of bits in the compressed image) (36).

(18) 722. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. Fig. 19. Left image reconstruction.. Fig. 20. Low-resolution right image reconstruction with upsampling.. Fig. 21. The 8th stereoscopic image frame of Spiral Ball. (a) Left image frame; (b) right image frame.. Fig. 22. Left image frame reconstruction.. Fig. 23. Low-resolution right image frame reconstruction..

(19) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. 723. Fig. 24. Motion vector "eld obtained from the 8th and 9th left image frames. (a) Lowest resolution subimages S 's; (b) wavelet subimages ='s; (c) wavelet subimages ='s; (d) wavelet subimages ='s; (e) wavelet subimages ='s; (f) wavelet subimages ='s. .

(20) 724. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. The peak signal-to-noise ratio in decibels (dB) is computed as I 6 , PSNR "20 log 6 RMSE 6. X"Y, Cr, Cb, (37). where I denotes the peak-to-peak value for the 6 X-component of input color image, and RMSE is 6 the root mean-squared reconstruction error between the X-components of the input and reconstructed color images, where X denotes each of Y-, Cr-, Cb-components. It should be noted that PSNR for the low-resolution right picture are 6 obtained at the low-resolution level.. By performing our still stereoscopic color image compression technique, the compression ratio for the color stereo image Achoo is found to be 42.3, where the total number of bits in the original image and in the compressed image are 1.0368;10 and 2.451;10 bytes, respectively. Fig. 19 shows the reconstructed left image. Its signal-to-noise ratios are PSNR "38.52 dB, PSNR "34.02 dB 7 * ! * and PSNR "33.44 dB. Fig. 20 shows a full-size ! * reconstructed right image which is obtained by upsampling the resulting low-resolution right image. Its signal-to-noise ratios in the low-resolution manner are PSNR "39.72 dB, PSNR " 7 0 ! 0 38.80 dB and PSNR "38.22 dB. However, its ! 0. Fig. 25. Low-resolution disparity "eld obtained from both the 8th low-resolution left and right image frames. (a) Lowest resolution subimages S 's; (b) wavelet subimages ='s; (c) wavelet subimages ='s; (d) wavelet subimages ='s. .

(21) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. PSNRs become worst, i.e., PSNR "34.62 dB, 7 0 PSNR "32.45 dB and PSNR "31.73 dB, ! 0 ! 0 when these PSNRs are evaluated in the same resolution level as the left images. Next, it is of interest to conduct a similar study for a color stereoscopic image sequences. The proposed joint motion/disparity-compensated compression techniques were tested on the "rst 20 "elds of the color (24 bits/pixel) stereo image sequence `Spiral Balla (size 640;480), in the 3D library (delic side) of the 3D museum, constructed by Multimedia Creators Network of Pioneer Electronic Corp. The size of the block used at the lowest resolution level for both the MRME and MRDE schemes is set to 2;2. The left and right images of the 8th Spiral Ball stereo image frame are illustrated in Fig. 21. Figs. 22 and 23 show the left image reconstruction by interframe DPCM/DWT/VQ decoder and the right image reconstruction by interview DPCM/DWT/VQ decoder, respectively. Fig. 24 depicts the motion vector "eld obtained from both the 8th and 9th left image frames using the variable block-size MRME technique. On the other hand, the disparity "eld obtained from both. 725. the 8th low-resolution left and right image frames is illustrated in Fig. 25 using the variable block-size MRDE technique. In Fig. 26, 20 "elds of the left color image sequence including Y-, Cr- and Cbcomponents are given, with the respective PSNR per "eld used in the proposed codec. In Fig. 27, the same results are shown from the low-resolution right image sequence. It is notable that the PSNR for each component of both the left and right image frames is greater than 33 dB. Furthermore, the compression ratio of the Spiral Ball image sequence over 20 "elds is found to be 91.02, where the total numbers of bits in the original and the compressed image sequence over 20 "elds are 3.6864;10 and 4.05;10 bytes (or 2.025;10 bytes/"eld), respectively.. 6. Conclusion This paper has introduced a joint motion/ disparity-compensated wavelet multiresolution coding scheme based on mixed-resolution psychophysical experiments which is capable of achieving. Fig. 26. PSNR for the left color image sequence..

(22) 726. P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727. Fig. 27. PSNR for the low-resolution right color image sequence.. the high compression ratio for typical stereoscopic image sequences, without any signi"cant loss in the perceived 3D stereo image quality. A variable block-size multiresolution block matching is applied to perform the estimation of both the motion and disparity vectors in an e$cient manner. Both the interframe and interview DPCM/DWT/VQ schemes are able to produce the motion and disparity vectors together with their associated prediction error subimages, in order to both increase the compression ratio and improve its reconstruction image quality. Results show that such schemes can achieve the higher compression ratio of 91.02 for Spiral Ball stereo image sequence. References [1] M. Antonini et al., Image coding using wavelet transform, IEEE Trans. Image Process. 1 (2) (April 1992) 205}220. [2] H. Aydinoglu et al., Compression of multi-view images, in: Proceedings of "rst IEEE International Conference Image Processing, Austin, TX, USA, November 1994, pp. 385}389.. [3] F.W. Campbell, J.G. Robson, Application of Fourier analysis to the visibility of gratings, J. Phys. 197 (1968) 551}556. [4] I. Daubechies, Orthonormal bases of compactly supported wavelets, Commun. Pure Appl. Math. XLI (1988) 909}996. [5] R.E.H. Franich et al., Stereo-enhanced displacement estimation by genetic block matching, SPIE Visual Commun. Image Process. 2094 (1993) 362}371. [6] Generic Coding of Moving Pictures and Associated Audio Information: Video, ISO/IEC International Standard 13818-2, 1995. [7] A. Grossmann, J. Morlet, Decomposition of Hardy functions into square integrable wavelets of constant shape, SIAM J. Math. 15 (1984) 723}736. [8] S. Lethurman et al., A multiresolution framework for Stereoscopic image sequence compression, in: Proceedings of "rst IEEE International Conference on Image Processing, Austin, TX, USA, November 1994, pp. 361}365. [9] Y. Linde, A. Buzo, R.M. Gray, An algorithm for vector quantizer design, IEEE Trans. Commun. 28 (January 1980) 84}95. [10] S. Mallat, A theory for multiresolution signal decomposition: the wavelet represen, IEEE Trans. Pattern Anal. Mach Intell. 11 (7) (July 1989) 647}693. [11] T. Mitsuhashi, Subjective image position in Stereoscopic TV systems } consideration on comfortable stereoscopic images, SPIE 2179 (1994) 259}265..

(23) P.-R. Chang, M.-J. Wu / Signal Processing: Image Communication 15 (2000) 705}727 [12] N.M. Nasrabadi, R.A. King, Image coding using vector quantization: a review, IEEE Trans. Commun. 36 (August 1988) 957}971. [13] A. Netravali, B. Haskell, Digital Pictures } Representation and Compression, Plenum Press, New York, 1988. [14] M.G. Perkins, Data compression of stereopairs, IEEE Trans. Commun. 40 (4) (April 1992) 684}696. [15] K. Rose, E. Gurewitz, G.C. Fox, A deterministic encoding approach to clustering, Pattern Recognition Lett. 11 (1990) 589}594. [16] A. Tamtaoui, C. Labit, Constrained disparity and motion estimation for 3DTV image sequence coding, Signal Processing: Image Communication 4 (1) (November 1991) 45}54.. 727. [17] D. Tzovaras, M.G. Strintzis, H. Sahinoglou, Evaluation of multiresolution block matching techniques for motion and disparity estimation, Signal Processing: Image Communication 6 (1994) 59}67. [18] G. Westheimer, S.P. Mckee, Stereoscopic acuity with defocused and spatially "ltered retinal images, J. Opt. Soc. Amer. 70 (7) (July 1980) 772}778. [19] S. Zafar, Y.Q. Zhang, B. Jabbari, Multiscale video representation using multiresolution motion compensation and wavelet decomposition, IEEE J. Selected Areas Commun. 11 (1) (January 1993) 24}34. [20] Y.Q. Zhang, S. Zafar, Motion-compensated wavelet transform coding for color video compression, IEEE Trans. Circuits Systems Video Technol. 2 (3) (September 1992) 285}296..

(24)

相關文件

決議文草案寫作技巧(Writing the draft resolution Sample resolution)Training for Speaking skills/ Debate skills 學生指 南:Handout: Draft Resolution Format

With the proposed model equations, accurate results can be obtained on a mapped grid using a standard method, such as the high-resolution wave- propagation algorithm for a

Review a high-resolution wave propagation method for solving hyperbolic problems on mapped grids (which is basic integration scheme implemented in CLAWPACK) Describe

1.1.3 In an effort to provide affordable and quality KG education to all eligible children, the Hong Kong Special Administrative Region Government (the Government)

Accordingly, we reformulate the image deblur- ring problem as a smoothing convex optimization problem, and then apply semi-proximal alternating direction method of multipliers

Rugged, high resolution, full-color, video-rate displays enable a multitude.

We have also discussed the quadratic Jacobi–Davidson method combined with a nonequivalence deﬂation technique for slightly damped gyroscopic systems based on a computation of

• Similar to Façade, use a generic face model and view-dependent texture mapping..