Chapter 2 Related Works
2.1 The Magic Wand
It is a popular object selection tool supported in Adobe, which allows the user to select the area with similar color by drawing a point or a region. The user interface of the Magic Wand [3] is shown in Figure 2-1, which is the version from Adobe Photoshop 7. The process of selecting an area starts by marking a seed pixel on the image manually by the user, and the area grows from the seed pixel to find a region of connected pixels where all the selected pixels fall within certain tolerance of the color statistics of the specified pixel. The method is fast; however, it usually needs to specify many seed points for users to select the completely area of a target object.
Figure 2-1 User interface of Magic Wand [3].
5 2.2 The Intelligent Scissors
Intelligent Scissors [8] is an object segmenting method proposed by Mortensen and Barrett in 1995. The working model of this method is shown in Figure 2-2. It allows the user to segment out the object from the background by roughly indicating the object’s boundary using mouse points. The segmentation of the foreground object from background image is conducted by finding the “minimum cost contour” from the cursor position back to the last “seed” point. The process is analogy to the problem of solving a graph searching problem where the goal is to find the optimal path between a start node and a set of goal nodes. If the computed path does not reach the predefined requirement, additional user-specified seed points can be added to refine the result.
Figure 2-2 Working model of Intelligent Scissors [8].
2.3 The Bayes Matting
Image matting is the process of compositing two different images in a seamless blended image. In 2001 Chuang et al. [12] proposed a digital matting method based on the Bayesian theory which models color distributions probabilistically to achieve
6
full alpha mattes. The Bayes Matting [12] needs a “trimap” which classifies the pixels of an image in three types: (1) definitely foreground pixel 𝑇𝐹, (2) definitely background pixel 𝑇𝐵, and (3) uncertain pixel 𝑇𝑈, before extracting the image object.
The trimap is generated by the user through marking out the definitely image foreground 𝑇𝐹 and definitely background 𝑇𝐵 with strokes of two different colors. The working interface is illustrated in Figure 2-3 where white color line segments represent the foreground and red color line segments are for the background. After the indication to the foreground and background, the color of the pixels in the remaining region 𝑇𝑈 is determined by a compositing equation:
𝐶 = 𝛼𝐹 + (1 − 𝛼)𝐵
,
(2.1) where F and B represent the color model of 𝑇𝐹 and 𝑇𝐵, respectively, and α is the pixel’s opacity component used to linearly blend between foreground and background, and C is the composition result.Figure 2-3 Working model of Bayes Matting [12].
2.4 The Graph Cut
The Graph Cut [13] is a foreground object extracting technique that applies a similar setting to Bayes Matting, including the “trimaps” and probabilistic color
7
models. This method was first mentioned in 2001 by Boykov and Jolly [13], and its goal is to achieve robust segmentation even when the foreground and background color distributions of an image are not well separated. The working interface is similar to the configuration set in Bayes Matting, in which the user has to specify the foreground and background by using different colors of strokes as shown in Figure 2-4. In the illustration white color strokes represent the foreground and red color strokes are for the background.
After the user had pointed out the foreground and background, it defines a cost function called the “Gibbs” energy function. The “Gibbs” energy function as shown in Eq. 2.2 consists of two parts, one is to evaluate the degree of fitness U of the opacity distribution to the input data and the other is to calculate the smoothness term V:
𝐸(𝛼, 𝜃, 𝑧) = 𝑈(𝛼, 𝜃, 𝑧) + 𝑉(𝛼, 𝑧)
,
(2.2) where 𝛼 is the opacity value associated with each pixel , 𝜃 is the color model for the foreground and background defined by the user and z is the input data.After the energy function is fully defined, it starts the segmenting process by finding a global minimum to the “Gibbs” energy function:
𝛼̂ = 𝑎𝑟𝑔 𝑚𝑖𝑛
𝛼 𝐸(𝛼 , 𝜃)
,
(2.3) The minimization step is done by using a standard minimum cut algorithm proposed in the study of Boykov and Jolly [13].8
Figure 2-4 Working model of Graph Cut [13].
2.5 The Grab Cut
The object extraction method of The Grab Cut [6] is proposed by Rother et al. in 2004, which is an extension to the Graph Cut [13] described in Section 2.3. The method improves the Graph Cut method in three aspects. First, the monochrome image model represented in histogram is replaced by a Gaussian Mixture Model (GMM). Secondly, the one-shot minimum cut estimation algorithm is replaced by a more powerful, iterative procedure that alternates between estimation and parameter learning. Finally, the demands on region specification by the user are relaxed by allowing incomplete labeling. The user needs to specify only 𝑇𝐵 for the “trimap”, and this is done by simply placing a rectangle surrounding the object as illustrated in Figure 2-5.
9
Figure 2-5 Working model of GrabCut [6].
10
Chapter 3 The Proposed Method
This chapter presents the details of the proposed object cutout method. Section 3.1 gives an overview to the proposed scheme. The working model and user interface are presented in Section 3.2. Section 3.3 gives the image data model in the proposed method, and the iterative segmenting algorithm is described in Section 3.4.
3.1 Method Overview
discriminating the foreground object from the background image. In the proposed scheme the user has to draw a rectangle encompassing the target object, which provides the information to build the Gaussian Mixture Model (GMM) in a hierarchical way as a representative of the color probability statistics for the object and background. The detail of this step is discussed in section 3.3.The segmenting algorithm proposed in this study is an iterative region growing based method. It starts from the rectangle defined by the user and gradually extends the background region to fit the boundary of the target object. The process iteratively updates the GMM for representing the color distribution of the foreground object and background area, and the rectangle will shrink gradually to fit the boundary of the object. The detail is discussed in section 3.4.
11
Figure 3-1 Block diagram of the proposed scheme.
3.2 User Interaction
Given an input image on which interested object is located, the goal of this study is to provide a simple and efficient tool for the user to select and cut out the object.
The interaction between the user and the system should be as simple as possible, therefore makes the method feasible for novice user or in a low-resolution display device. The user interface for extracting an object from the input image in the proposed method is illustrated in Figure 3-2. The only one action should be taken by the user is drawing a rectangle box surrounding the target object using the mouse or other pointer device, and all of the remaining segmentation activities will be proceed
12
by the system automatically.
Figure 3-2 User interface of the proposed scheme.
The object specification method in the proposed scheme lessens the demand of the heavy input from the user to the system; however, it is a big challenge for the segmentation algorithm to produce a satisfactory result because of the inadequate information about the foreground object. In the proposed method, the area outside the rectangle box is the definitely background, while the area inside the box contains the target object as well as part of the background. The proposed method to build the foreground color model by using the user-defined background as a “guide” is discussed in section 3.4.
3.3 Image Data Modeling
A Gaussian Mixture Model (GMM) is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with different parameters. With this particular characteristic, the GMM can precisely describe the color distribution of an image because a picture is usually composed of several objects/regions with each object/region is dominated with certain
13
representative colors.
To generate a GMM, the first step is to decide the number of components 𝐾 in the model. In other words, it is to choose the number of GMs associated in the GMM.
After the number of components is decided, the parameters of each Gaussian Model (GM) should be determined. A GM can be characterized by 𝜃𝑙:
𝜃𝑙 = {𝜇𝑙 , ∑ 𝑙 , 𝑙 = 1,2, … , 𝐾}, (3.1) where 𝜇𝑙 is the mean and the ∑𝑙 is the covariance matrix for the data. The Gaussian probability distribution of each pixel can be represented as:
𝑃( 𝑧𝑛, 𝜃𝑙 ) = 1
An agglomerative hierarchical clustering method is designed to assign pixels with similar color attribute to the same GM in the proposed scheme. Agglomerative Hierarchical Clustering is a "bottom up" approach in which each data sample starts in its own cluster, and pairs of clusters are merged as one move up the hierarchy. Usually, it will take the two closest elements according to the chosen distance metric. After the merging operation, a distance updating process should be conducted to recalculate the distance between clusters. Each agglomeration occurs at a greater distance between clusters than the previous agglomeration, and one can decide to stop the process either when the clusters are too far apart to be merged or when the number of clusters reaches a predefined threshold. This clustering approach has the advantage of simple and high flexibility in the number of clusters compared with other methods.
14
The main goal of this clustering step is to find appropriate representative colors for the foreground object and background image, which is the basis in later segmenting activities. It is assumed that the input picture is a RGB image with n pixels Z = {z1, z2, …, zn} in which both the foreground (the object to be cut out) and the background are composed of multiple principal colors, and the major colors of the foreground are discriminative from that of the background. The mean color is applied to represent the color of a group of pixels, and the Euclidean distance is employed to measure the distance between two groups of pixels.
The conventional Agglomerative Hierarchical Clustering Method is modified in two major ways to fit the requirements in the proposed scheme. (a) A protection set is added to postpone the merging time of two clusters with large quantity of pixels. (B) When a cluster is being merged in two candidate clusters with the same distance, it tends to be merged in the cluster with larger quantity.
The steps of the proposed clustering algorithm are summarized in Figure 3-5. In the proposed method, each color is initialized as a separate cluster. The count of each color is calculated and the K clusters with highest number of pixels are put in the protection set S. Because the algorithm is processed in RGB color space, the full set 𝐂 will be:
𝐂 = {𝑐0,0,0 , 𝑐1,0,0 , … … , 𝑐255,255,255}, (3.3) because the RGB value of each pixel range from 0 to 255. The use of the protection set 𝐒 is to avoid merging clusters with large pixel count too quickly, and preserves the main components/colors of the image.
15
Figure 3-3 The proposed agglomerative clustering algorithm.
After the protection set is determined, the adjacent clusters are merged together to get a new cluster. The proposed method merges all clusters cj to cluster ci, i ≠ j if the distance between the two clusters is smaller than merge radius threshold R, and ci
and cj are not in S simultaneously. The order of the merging procedure is done from the biggest cluster to the smallest one, which can help the clusters to lessen the
Step 1: Count of pixel number for each color in Z, and initialize each color to a cluster. C={cr,g,b}, r,g,b = 0, 1, …, 255, where cr,g,b represents the number of pixels with color value (r, g, b).
Step 2: Sort the elements of C in non-increasing order.
Step 3: Set the top K elements of C to the protection set S.
Step 4: Sequentially fetch a not-process-yet cluster ci from the top of C, and merge the cluster cj to ci, i ≠ j if (1) the distance between the two clusters is smaller than merge radius threshold R, i.e., dist(cj, ci) R, and (2) cj is not in S.
Step 5: Repeat step 4 until all the clusters in C are processed.
Step 6: Update the centroids of the new clusters, and set the threshold R = R, where 1.0 is a user-defined radius increasing rate.
Step 7: Repeat steps 2 to 6 until the number of the clusters in C is smaller than
× 𝐾, where is the control parameter for the number of candidate clusters.
Step 7: Set the merge radius R to initial value.
Step 8: Repeat steps 2 and 6 without considering the Protection Set until the the clusters in C is smaller than K.
Step 9: End of algorithm.
16
influence from the outliers because bigger clusters usually have higher resistance against noise. When all of the clusters in 𝐂 are merged, the centroids of the merged clusters are updated, and the radius 𝑅 which set the threshold for merging two clusters is increased with rate 𝜌:
𝑅𝑛𝑒𝑤 = 𝜌 × 𝑅𝑜𝑙𝑑 . (3.4) The cluster merging process repeats until the number of clusters are less than 𝛽 × 𝐾, where 𝛽 is a user-defined parameter that controls the condition to stop the clustering process. Step 7 will reset the merge radius 𝑅 to the initial value and start clustering again without the constraint of the protection set 𝐒 . The clustering procedure terminates when the number of clusters is equal to or smaller than the threshold K.
3.4 Iterative Segmentation
The proposed image cut out method is an iterative region growing based segmentation method. It can adaptively adjust the color model to better represent the color distributions of the image. Region growing [14] is a region-based image segmentation method and it is also classified as a pixel-based image cutout technique since it involves the selection of initial “seed points”. The main goal of region growing is to partite an image into regions by examining neighboring pixels of the initial “seed points” and determine whether the pixel neighbors should be added to the region. To decide whether to join a pixel into the region, a logical predicate 𝑳 is specified which in our case, is the color distribution model.
The basic formulation of region growing is shown in Figure 3-6 which is defined by R. C. Gonzalez and R.E. Woods. Part 1 & 2 shows that the segmentation of the image must be complete, which means that every pixel must be in a region and all the pixels inside the region must be connected. The next part indicates that all the regions must be disjoint. Part 4 deals with the properties that must be satisfied by the pixels in
17
a segmented region. For example, 𝑳(𝑅𝑖) = TRUE if all the pixels in 𝑅𝑖 belongs to the same color model. The last part shows that region 𝑅𝑖 and 𝑅𝑗 is different in the sense of the logical predicate 𝑳.
Before perform the cutout algorithm, the clustering method proposed in Section 3.3 is applied to build two Gaussian Mixture Models to represent the color distributions of the foreground and background. The parameters that decide a model is an extension from (3.1):
𝜃(𝛼 , 𝑙) = {𝜇(𝛼 , 𝑙) , ∑(𝛼 , 𝑙) , 𝜋(𝛼 , 𝑙) , 𝛼 = 0 , 1 , 𝑙 = 1 … … 𝐾}, (3.5) where 𝛼 is 0 for the background, and 1 for the foreground. 𝜋(𝛼 , 𝑙) represents the weight of the Gaussian Model inside the GMM:
𝜋 = 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑖𝑥𝑒𝑙𝑠 𝑖𝑛 𝑡ℎ𝑖𝑠 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 𝑀𝑜𝑑𝑒𝑙
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑖𝑥𝑒𝑙𝑠 𝑖𝑛 𝑡ℎ𝑒 𝐺𝑎𝑢𝑠𝑠𝑖𝑎𝑛 𝑀𝑖𝑥𝑡𝑢𝑟𝑒 𝑀𝑜𝑑𝑒𝑙 . (3.6)
We assign a parameter 𝛼𝑛 to each pixel to represent the color model that it is related to, 0 stands for the background and 1 is for the foreground. The classifying principle is decided by the user-defined area where the region outside the rectangle box is the background (𝛼𝑛 = 0) and the region inside is the foreground (𝛼𝑛 = 1).
18
The proposed region growing based iterative cutout algorithm is given below:
Figure 3-5 The proposed image cutout algorithm.
The proposed algorithm first initializes a set 𝐒 with the pixels 𝑧𝑛 that lies on the user marked rectangle as seed points for the iterative region growing process. For each pixel 𝑧𝑛 in set S, the Gaussian probability distribution function of each Gaussian Model is evaluated to determine the likeness probability the pixel. The following rule are evaluate to assign pixel 𝑧𝑛 to 𝛼𝑛:
𝛼𝑛 ∶= arg max
𝛼 𝜋(𝛼 , 𝑙) × 𝑃( z𝑛 , 𝜃(𝛼 , 𝑙)) . (3.7) The above process can decide whether the pixel belongs to the background (𝛼𝑛 = 0) or the foreground (𝛼𝑛 = 1).
Region growing based iterative mage cut out algorithm Initialization pixels on the rectangle drawn by the user).
Iterative segmentation
19
After assigning each pixel to the Gaussian Mixture Model, if the newly assigned 𝛼𝑛 of each pixel belongs to the background (𝛼𝑛 = 0), put the neighbors (pixels that are inside the box) of the pixel 𝑧𝑛 into the set 𝐒 and repeat the assigning process until the set S is empty (𝐒 = ∅).
The next step is to adjust the Gaussian Mixture Models by using the result of the previous assigning approach. The effect of this step is to make the color model more similar to the color distribution of the image. As we used the region inside the user-defined box to represent the foreground, there is a proportion of the background included. To exclude the background, we need to iteratively execute the segmentation algorithm to make the color model more approximate to reality. The algorithm will terminate when the result of 𝛼𝑛 does not change anymore. Figure 3-8(a) shows an example of two 5-component GMM in the R-G color space. As you can see, the two GMM overlap considerably. However, after the iterative segmentation, the results of the GMM are much better separated (Figure 3-8(b)).
(a) (b)
Figure 3-6 Evolution of the Gaussian Mixture Model. (a) The original Gaussian Mixture Model. (b) The result of the Gaussian Mixture Model after segmentation.
20
Chapter 4 Experiment Results
In this chapter, the experiment results of the proposed method are tested and discussed. The hardware used to implement the test program is a notebook with Intel Core i5-3210 2.50 GHz CPU and 4 GB DDR3 RAM. The program was written using C# in Microsoft .Net Framework 4.0 and ran in the Windows 8 system. All of the test images are in JPG file format.
Two types of target objects are cut off from an image using the proposed scheme.
The first kind of object exhibits explicit contour and the boundary of the object can be clearly recognized by human eye. The second kind of object consists of complicated contour such as human hair or animal fur, which is difficult to segment form the background even when the operation is done by the user manually in contemporary image processing software. The cutout experiments of the two kinds of objects are shown in Section 4.1 and Section 4.2, respectively. To demonstrate the performance of the proposed scheme, the cutout results of the proposed scheme is compare with the result obtained in the popular image cutout tool GrabCut [6].
4.1 Cutout Object with Explicit Contour
Four test images with explicit object contours include (a) yellow flower, (b) orange flower, (c) metal object, and (d) building, are tested in this experiment. The following experiments exhibit the results for cutting out these target objects from the input images.
A. Yellow flower
The first experiment aims to cut out the yellow flower from the input image shown in Figure 4-1(a), which scale is 1024768 pixels. Figure 4-1(b) shows the rectangle drawn by the user, which is done by moving the mouse to the left-top corner
21
to press down the mouse button, and drag the mouse to the right-bottom corner and press the mouse button again. It encloses the target object “yellow flower”. The cut out result using the proposed scheme is shown in Figure 4-1(c) and the result obtained in the GrabCut technique is shown in Figure 4-1 (d). It can be seen from the two images that both the two schemes successfully segment the flower from the background. The extracted flower object is complete and has clear-cut boundary, while the object extracted using GrabCut over-segment a little green area to the flower as shown in Figure 4-1(d) with a blue box. To test the execution speed of the two
to press down the mouse button, and drag the mouse to the right-bottom corner and press the mouse button again. It encloses the target object “yellow flower”. The cut out result using the proposed scheme is shown in Figure 4-1(c) and the result obtained in the GrabCut technique is shown in Figure 4-1 (d). It can be seen from the two images that both the two schemes successfully segment the flower from the background. The extracted flower object is complete and has clear-cut boundary, while the object extracted using GrabCut over-segment a little green area to the flower as shown in Figure 4-1(d) with a blue box. To test the execution speed of the two