Progressive Image Segmentation by Wavelet Transform for Content-Based Image Retrieval
全文
(2) Photobook further permit users to specify more flexible queries owing to the use of more sophisticated representation relating image properties. In VisualSEEK, color and texture features of several regions are used to index an image. All above systems involve the issue of image segmentation, the problem of image segmentation is getting important for contentbased image retrieval in the modern image databases. The Blobworld system [14] used a well feature extraction algorithm to precisely isolate natural objects on the proposed joint color-texture-position feature space. Although the segmented results are precise, the computational cost is still too high for query by image examples. However, the segmenting issue will be simpler and more efficient if we focus only on entire objects of an image regardless of the texture inside each image’s object.. by various basic waveforms for different applications [2]. In our proposed scheme, we first adopt Haar wavelet transform as an analytic key [7] since the transformed high-frequency coefficients can directly represent the differences of the around pixels. Thus, by passing the YUV pixel sequences through Haar wavelet transform, we exploit the whitening idea to the composition of processed color signals. The colors in distinct objects are classified by analyzing the distributions of composed color signals resulting from high-frequency coefficients. By the histogram analysis and the efficiency requirement, we use the technique of sub-variance equalization [8] to determine the appropriate thresholds of pixel clustering and merge near colors. The high-frequency color signals lower than these thresholds will be viewed as the color changing caused by the light inside the identical object. Thus, these signal removing can facilitate the grouping about clusters. Applying the above processing to LH, HL and HH subbands in each level of the wavelet transform domain, recursively, the criteria will be increasing level by level. By testing various images thoroughly in our experiments, it can be found that a good visual segmentation can be generated when a process termination criterion is met while a level ratio is smaller than a critical threshold. The wavelet processing will stop at that time. While an appropriate critical threshold is given, the proposed method will stop at lower transform levels automatically if images contain few simple objects. On the contrary, complicated images will fail the halting condition until higher processing levels. At the end of the decomposition, color reference vectors [9] are extracted from color coefficients of the LL-subband in the last level. Then all the pixels are grouped into their corresponding regions to sketch significant objects. Since the decomposed objects by our method are near the visual perception of human beings, we can select the significant objects for further analysis to catch more features like objects' shapes and spatial relationships. Such features should be simple and meaningful for images representing and retrieving. This paper is organized as follows. We will review some important related researches in Section II. Then, the proposed progressive wavelet-based and the color signal composition methods are presented in Section III. The experiments and results are demonstrated in Section VI. At last, we make a conclusion in Section V.. To specify regions of objects by uniform or smooth colors for image retrieval, the color segmentation algorithm proposed in [2] exploited a simple color classifier to perform adaptive color classification on fine art images. It addresses a region-based approach to promote the image annotation technique. It needs prespecify or pre-define the desired classes such that we can apply the knowledge of human beings to obtain good segmentation results for special objects. For further outstanding objects, Lin et al. [6] proposed an effective block-based color segmentation method. In this method, each basic 8 × 8 pixels block is assigned a representative color initially, which is the mean color of the pixels inside the block of itself. From basic blocks, the algorithm merges smaller adjacent blocks with the closest colors into larger regions recursively. During all the merging procedure, the suitable number of separated classes can be automatically determined by checking the occurrence of the maximal change between the merged pictures and their previous merged ones. In this algorithm, the basic contour for the separated objects can be clearly displayed, but finer and more detailed features will not be exhibited obviously. Jacobs et al. [15] used the Haar wavelet transform to generate a feature vector for each image that the feature vectors including color, texture and shape can be captured very fast. However, for lack of segmenting techniques to extract the individual objects, the user cannot perform partial queries through the corresponding image databases. To overcome such drawbacks, the WALRUS [16] system employed a novel similarity model, which uses some sliding windows of varying regions for decomposing an image into many rectangular regions, based on the proximity of their signatures. Although this algorithm is effective and regular, the information about shapes of objects is not actually expressed and the significant objects cannot be identified due to segmented superfine. II. Related works QBIC system [10] is one of the most famous image retrieval systems. The QBIC system was developed by IBM Almaden Research Center, which integrates color, texture, shape, and sketch into the descriptor of each image inside the database. The other well-known systems include Virage [11], Photobook [12], VisualSEEK [13] and so on. Virage and. 2.
(3) YUV images to pass their Y, U and V components through the 2D Haar DWT. In our method, the socalled high-frequency coefficients are referred as the coefficients of transforming outcomes in the generated HH-subband, HL-subband and LH-subband of all levels, and the so-called low-frequency coefficients are located in the LL-subband of the current transform domain. For characteristics of Haar wavelet transform, the differences of pixels’ colors between two adjacent areas can be directly or indirectly represented by highfrequency coefficients. Hereafter, we denote the highfrequency coefficients of the ith level HL, LH and HH subbands as YHi, UHi and VHi, respectively, and the lowfrequency ones in the lowest LL subband of the ith level as YLi, ULi and VLi, respectively. For simplicity, generality and the consideration of perceptual effect, we will use the transformed Y, U and V coefficients to compose the appropriate color signals as the analyzed medium for our proposed algorithm. B. Color Signal Composition Motivated by the whitening transform idea in [17], each transformed YUV coefficients in the ith level are weighted and unified to form the proposal color signal shown as:. regions. Moreover, the representative information of such algorithm is too verbose to provide image retrieval efficiently and even effectively under a large image database. Therefore, we propose a wavelet-based technique to improve the efficiency and effectiveness of image segmentation. After the effective segmenting, the significant objects in an image can be extracted to provide image retrieval by indexing the features of the objects. III. The proposed segmentation algorithm For achieving image segmentation effectively and efficiently,. a. wavelet-based. image. segmentation. algorithm is proposed in this paper for isolating the significant objects with complete contours. Our method can decompose an image into separate objects that satisfy the visual perception automatically. Moreover, the proposed technique applies the fastest wavelet transform to perform the progressive processes. The progressive coding technique is very popularly used in. YHi + UHi + VHi λHy λHu λHv Y Li + U Li + V Li , λ Ly λ Lu λ Lv. cHi =. scaleable transmission systems [7], and the concept of progressive processing is used to construct the framework of segmentation in this paper. The detailed processing will be shown in the following four subsections.. and,. cLi =. (1). where λHy , λHu and λHv as well asλLy , λLu and λLv are the eigenvalues of all high-frequency subband coefficients and the lowest LL-band coefficients of Y, U and V up to the present decomposition status, respectively. For fitting the dynamic region for different bands, above inverse process by the whitening transform lead the Y, U and V signals to even significant influences for effectively fitting the human perceptual vision of interests. In order to consider the processing speed under remaining the functionality of whitening for a large image collection, the computation load of (1) should be reduced. For simplifying (1), with simulation verification, we approach (1) by replacing the eigenvalues by the means of the high-frequency subbands and the lowest LL-band;. A. Progressive Procedure of Wavelet Transform The discrete wavelet transform (DWT) can achieve entropy concentration, de-correlation and feature detection to provide excellent signal analysis. It decomposes a discrete time sequence into coefficients of different frequency resolutions that, theoretically, different wavelet subbands can be analyzed and processed individually. At the same time as the target image is progressively decomposed, the influence and the effect can be traced from the updating of each individual subband at various levels. In this paper, by examining all the experimental results, without loss of generality, it can provide adequate solutions that all subbands are only classified into two parts after each Haar wavelet transform. One part includes the LLsubband coefficients in the current level of wavelet transform, i.e., the coefficients of the present lowest band. And, another contains all the high-frequency coefficients in LH, HL and HH subbands of all levels. Except the final decomposition, only the latter part is remained to analyze chromatic and luminous differences as well as perform color smoothing for all objects behind each decomposing. Our proposed algorithm can segment objects very rapidly in the image retrieval applications. We adopt. Y Hi + U Hi + V Hi M Hy M Hu M Hv Y Li + U Li + V Li , and, cLi = M Ly M Lu M Lv cHi =. (2). where MHy ,MHu ,MHv , MLy ,MLu and MLv are the Y, U and V means of all high-frequency subband coefficients and the lowest LL-band ones, respectively. After use (2) to achieve the composed color signals, the proposed. 3.
(4) process by (3). In addition, the process speed is one of the focuses in the proposed algorithm. Therefore, we addressed a variance equalization technique to derive the suppression thresholds by dividing the histograms of signals into two regions with equal variances. In Fig. 1, as soon as the (i-1)th level LL-band is decomposed by Haar wavelet transform, the histogram hi(cHi) is. segmentation procedure will analyze the histograms related to these signal for colors mergence and modification. C. Suppression Threshold Determination The proposed procedure performs the wavelet transforming once, the composed color signals in high frequency subbands are partitioned into two regions according to their strengths. The region with relatively smaller signals is considered to be the precise change of colors in individual objects. Such values may blur the judgement of boundaries between neighboring objects, they should be neglected for clarifying the contours and smoothing the colors of objects. Fig. 1 shows one of the experimental examples that the highfrequency signals in the region R0 will be truncated to zeros. The parameter τi at cutting line is a decided suppression threshold after the ith decomposition that it partitions the signals into two regions, R0 and R1. In the following, we will illustrate the method of the progressive wavelet transform and the determination of the suppression threshold in detail. For generalizing the truncation procedure, to eliminate the high-frequency signals lower than the suppression threshold can be formulated as a recursive form along the progressive decomposing. As the wavelet transform of the ith level LL-subband is performed, the corresponding histogram is generated and denoted as hi(cHi). A local suppression threshold τi is determined for dropping the high-frequency signals and the corresponding Y, U and V coefficients in the ith level. Meanwhile, the ith global suppression threshold Τi is proceeded to partition the high-frequency signals of all levels, in which signals are less than Τi and need be set as zero signals. The histogram processing function f i(CHi) is formulated in the following recursive functions: f1(CH1) = h1(cH1) ⋅U(cH1 −τ1); (3). separated into two regions individual variances. Vi 0 and Vi1 are computed as. 0 2 0 pi (cHi ) ⋅ cHi pi (cHi ) ⋅ (cHi − Ei ) 0 , where V E = = ∑ ∑ i i ∑ p (C ) (5) pi (CHi ) cHi∈Ri0 cHi∈Ri0 CHi∈Ri0 i Hi CHi∑ ∈Ri0 1 2 V 1 = ∑ pi (cHi ) ⋅ (cHi − Ei ) , whereE1 = ∑ pi (cHi ) ⋅ cHi i p (C ) i cHi∈Ri1 pi (CHi ) cHi∈Ri1 i Hi CHi∑ CHi∑ ∈Ri1 ∈Ri1. where. Ei0 and Ei1 are the means of regions Ri0 and. Ri1 , respectively. By continuously moving the intermediate boundary τ toward the region of the larger variance, the most proper boundary just noted as τi can be obtained by terminating the moving while the difference between. Vi 0 and Vi1 is minimal or small. enough. Therefore, this suppression determination can be formulated as. τ i = Arg{Min(| Vi1 − Vi 0 |)} . τ. threshold (6). where Arg{.} is invoked to obtain the argument parameter, which meets the criterion inside the brace. The similar processing can be performed on the global high-frequency subbands, then the global suppression threshold Τi can be yielded. With the recursive procedure mentioned above, the unimportant color differences are progressively truncated to zeros level by level such that the population of the zero highfrequency signals grows up. Thus, the colors of merged neighboring pixels are getting closer and closer along each Haar wavelet transform. Finally, every object uses a homogeneous color for identifying itself for isolation facilitation.. fi (CHi) = [ fi−1(CH(i−1) ) + hi (cHi) ⋅U(cHi −τi )]⋅U(CHi − Ti ) , fori >1,. where cHi denotes the high-frequency color signals of the ith level and CHi is the global high-frequency signals including the first level up to the ith one. The unit-step function U(x) is used for truncating color signals denoted as. 1, x ≥ 0; U (x ) = 0, x < 0 .. Ri0 and Ri1 . Then their. D. Significant Object Indication The proposed progressive procedure will decompose images more detailed while the process level of wavelet transform is higher. Now we face the problem how many levels of decomposition is the best result. A critical threshold should be decided to achieve the goal that major individual objects can be successfully separated out according to human visual perception. There is an important property for the eliminating criteria in different levels. This property is that the values of criteria will tend to increase along the decomposition level of wavelet transform. The reason is that the values of high-frequency signals below the. (4). It is used to mask signals less than the thresholds τi or Τi to be zeros. By the recursive formulas in (3), the unimportant signals will be dropped out and color smoothing can be quickly obtained. For the achievement of above appropriate histogram, it is necessary to decide the suppression thresholds τi and Τi, in advance. Observing Fig.1, we find the histogram is too non-regular and sprawling to model them with celebrated distributions such as the multi-Gaussian function especially after the recursive. 4.
(5) criteria are removed by forcing them as zeros. Simultaneously, dropping insignificant signals will lead the lower region under the current decomposition to have a lower variance than under the previous one. It will make the eliminating criteria increasing in next level of transformation. However, such processing does not influence the larger frequency signals, i.e., significant differential colors. For above phenomenon, a level ratio for the progressive processing is defined as. γi =. Max{cHi } − τ i Max{cHi }. algorithm will be halted when the level ratio is either less than or equal to ξ. For simple pictures, such as trademarks or pictures with simple and few objects, the level ratios have been already quite small in lower levels, i.e., first level or second level, because most of the large variation of colors occurs only on the edges of objects. These simple images can be decomposed well at lower levels. Complicated images, contrary to simple images, have larger level ratios and the decomposed results of lower levels cannot fit the visual perception. It always needs segmentation of higher levels. Thus, the value of ξ is the key parameter for deciding the quality of segmentation. According to our experiments, the segmented results are close to human observation when the value of ξ is 0.01 for most images. Consequentially, we are also able to analyze all the various kinds of images precisely to derive the suitable ξ values that may lead the recursion of segmentation algorithm to converge more naturally. All experimental results are exhibited in Fig. 2. The series of pictures shown in this paper are randomly selected from the 1000 test images. The original images are shown in the first column. The names of original images are given below the original images. The pictures in middle columns are the results of segmentation of our proposed method with ξ = 0.01. The second and third columns respectively represent the results from the algorithm without the whitening process and one with using the whitening. Under the segmented images, the terminal levels of progressive process and execution times are listed. The pictures in the right column show the segmented results of Lin's method proposed in Lin [6] and their execution times. In Fig. 2, both RGB001 and RGB002 pictures are trademarks and RGB001 is simpler. They are decomposed only at level 1 and level 2. For simple natural pictures having clear boundaries and few objects, like RGB003 and RGB004, the algorithm will be stopped at level 3 and level 4, respectively. The segmented results are obviously better than that by the Lin's method. For complicated pictures, such as RGB005, RGB006 and RGB007, they usually have more objects with distinct color features in them or possibly vague boundaries caused by shadows or out of focus. These pictures will not be decomposed well until level 5. Basically, the more complicated pictures with active background or texture require the decomposition process to continue up to higher levels. But, the decomposition depth will not be unlimited even for the consecutive increase of image complexity. In our simulation testing, no matter how active the pictures are, the decomposition of seven levels is enough as the last two rows in Fig.2 as shown. However, we suggest that developers had better use other models to capture the features of such pictures, because it is difficult to recognize objects from these pictures correctly even for the human.. (7). where level ratio γ i is obtained after the ith decomposition by Harr wavelet transform. Thus, we select a proper critical threshold ξ, and then set a process termination criterion defined by follows. As soon as the ratio γ i is less than ξ, the progressive procedure halts immediately. Certainly, the design of the critical threshold should be based on image natures, individual query requirements and human visual perceptions of various interests for letting the segmentation automatically stop according to the image characteristics. At the end of proposed segmentation, the remained larger high-frequency coefficients can effectively strengthen the discrimination between the significant objects. The Y, U and V coefficients in the lowest LL-band are re-quantized by an experimental quantizer such that the number of color vectors can be effectively constrained. Then, these YUV vectors play the key color references that each pixel is mapped into the most possible object by searching the nearest key color reference. The key color references will become the color features of image objects and be inserted into the index of image database for subsequent applications of content-based image retrieval. IV. Simulation Results For demonstrating the execution speed and effectiveness, the proposed segmentation method was implemented in C++ language on a personal computer with Intel Pentium 233 CPU and 64M RAM. The total number of test images is about 1000. Each tested image is in size of 256×256. Types of the images include trademark, natural scene and texture. The contents of images have various characteristics more or less in each type. Some of the pictures are simple but some of them are of different complex extents. To segment all images appropriately and automatically, the decision of critical threshold ξ is significant in our approach. Since the number of wavelet processing levels in our method determines the result of segmentation, various number of wavelet processing levels should be found for different types of images after some off-line analysis processes. Here, the value of ξ is used to handle the termination of progressive process. The segmentation. 5.
(6) Image Processing, Vol. 54, No. 4, pp.308-328, 1992.. However, the segmented results of our method are more effective and meaningful than that of Lin's method. For computation time, our method is far less than Lin's method. It takes half of computation time of Lin's method even in the worst case. Most of the pictures can be decomposed in one second successfully. Significantly, in comparison with the method without using whitening factors, after invoking the whitening factors in the color signal composition, we can find in Fig. 2 that more complete contours of significant objects are exhibited just like that of the black cat and the purple ball in the third row in this series of test pictures. In addition, as Fig. 2 shown, the segmented objects have more desired signatures and less redundant details, thus the segmentation results can fairly match human visual perception of interest.. 4. H. Derin and W. Cole, “ Segmentation of textured image using Gibbs random field, “ Computer Vision, Graphics and Image Processing, Vol. 35, pp.72-98, 1986.. 5. D. Cortez, P. Nunes, M.M.D. Sequeira and F. Pereira, “ Image segmentation towards new image representation methods, “ Signal Processing: Image Communication, Vol. 6, pp.485-498, 1995.. 6. H. C. Lin, L. L. Wang, and S. N. Yang, “ Color image retrieval based on hidden Markov models, “ IEEE Transactions on Image Processing, Vol. 6, No. 2, pp.332-339, 1997.. V. Conclusions Image segmentation is an important task in many applications. In this paper, we develop an effective and efficient method based on the progressive Haar wavelet transform to segment the significant objects out of an image. By exploiting the natural characteristics of each image, the proper number of decomposition levels can be determined automatically. After our experiments, the result of segmentation is close to the visual perception of human. The proposed segmentation mechanism can be applied to image retrieval as Lin [6] directly. The results show that the proposed mechanism indeed improves the efficiency and effectiveness. Further, with the whitening factors used to color signal composition, the segmented objects have quite complete outlines and, be remained more desired features and eliminated more color redundancies than without the use of whitening factors. Nevertheless, more significant features and signatures can be extracted from our results of segmentation, e.g., shapes and representative colors. Consequently, after performing the proposed segmentation algorithm, the shape and significant features belonging to an object can be tracked and captured precisely for the application of content-based image retrieval.. 7. H. S. Malvar, “ Fast progressive wavelet coding, “ Data Compression Conference Proceedings, pp.336 –343, 1999.. 8. T. H. Reeves, M. E. Jernigan, “Multiscale-based image enhancement, “ IEEE 1997 Canadian Conference on Electrical and Computer Engineering, Vol.2, pp. 500-503, 1997.. 9. A. D. Bimbo, M. Mugnaini, P. Pala and F. Turco, “ Visual Querying by Color Perceptive Regions, “ Pattern Recognition, Vol. 31 (9), pp. 1241-1253, 1998.. 10. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, and C. Faloutsos, “ The QBIC project: querying images by content using color, texture, and shape,” Research Report, No. 9203, IBM Almaden Research Center, 1993.. 11. A. Gopto and R. Jain, “ Visual Information Retrieval, “ Communication of the ACM 40(5), pp.69-79, 1997.. 12. A. Pentland, R. W. Picard and S. Scarloff, Photobook: Content-Based Manipulation of Image Databases, Storage and Retrieval for Image and Video Databases II, San Jose, 1995.. VI. References. 1. S. Belongie, C. Carson, H. Greenspan, J. Malik,. 13. J. R. Smith and S. F. Chang, “ Visually Searching. “Color- and texture-based image segmentation using EM and its application to content-based image retrieval, “ the 6th International Conference on Computer Vision, pp.675–682, 1998.. the Web For Content, “ IEEE Multimedia, Vol. 3, pp. 12-20, 1997.. 14. C. Carson, M. Thomas, S. Belongie, J. M.. 2. E. Saber, A. M. Tekalp, R. Eschbach, and K. Knox,. Hellerstein and J. Malik, Blobworld: A System for Region-Based Image Indexing and Retrieval, VISUAL, pp.509-516, 1999.. “ Automatic Image Annotation Using Adaptive Color Classification, Graphical Model and Image Processing, “ Vol.58, No.2, pp.115-126, 1996.. 15. C. E. Jacobs, A. Finkelstein, D. H. Salesin, “ Fast. 3. C.S. Won and H. Derin, “ Unsupervised. multiresolution image querying, “ International Proceeding of SIGGRAPH 95, Annual Conference Series, p277-286, 1995.. segmentation of noisy and textured image using Markov random field, “ CVGIP: Graphical models. 6.
(7) 16. A. Natsev, R. Rastogi, K. Shim, WALRUS “ A. 17. K. Fukunaga, Introduction to Statistical Pattern Recognition.2nd INC.1990.. Similarity Retrieval Algorithm for Image Databases, “ in Proceedings of ACM SIGMOD on Management of Data, pp.395-406, 1999.. edition,. Academic. hi(cHi) R0. 0. R1. τi τi. cHi. Figure 1. The decision of suppression threshold τi. Original Image. Proposed Method Without Whitening factors. Proposed Method With Whitening factors. Lin’s Result. RGB001. Level 1 0.206 sec Level 1 0.206 sec. 4.212 sec. RGB002. Level 2 0.606 sec Level 2 0.608 sec. 4.415 sec. Figure 2. The test pictures and segmentation results. 7. Press.
(8) RGB003. Level 3 0.861 sec Level 3 0.861 sec. 3.975 sec. RGB004. Level 4 1.081 sec Level 4 1.082 sec. 4.4 sec. RGB005. Level 5 1.375 sec Level 5 1.377 sec. 4.221 sec. RGB006. Level 6 1.777 sec Level 6. 1.78 sec. 4.413 sec. RGB007. Level 7 2.021 sec Level 7. 2.02 sec. 4.434 sec. Figure 2. (Continued) The test pictures and segmentation results. 8.
(9)
數據
相關文件
Based on the coded rules, facial features in an input image Based on the coded rules, facial features in an input image are extracted first, and face candidates are identified.
• Paul Debevec, Rendering Synthetic Objects into Real Scenes:. Bridging Traditional and Image-based Graphics with Global Illumination and High Dynamic
Retrieval performance of different texture features according to the number of relevant images retrieved at various scopes using Corel Photo galleries. # of top
– Camera view plane is parallel to back of volume – Camera up is normal to volume bottom. – Volume bottom
– Any set of parallel lines on the plane define a vanishing point. – The union of all of these vanishing points is the
“A feature re-weighting approach for relevance feedback in image retrieval”, In IEEE International Conference on Image Processing (ICIP’02), Rochester, New York,
Since we target a general framework for serving different appli- cations, we will first adopt the proposed method to visual domain for image object retrieval in Section VIII-A and
Deep Learning of Binary Hash Codes for Fast Image Retrieval!. Kevin Lin, Huei-Fang Yang, Jen-Hao Hsiao,