O RGANIZATION OF THIS T HESIS - 使用型態篩選法及動態部分函數作影像擷取

Chapter 1 Introduction

1.3 O RGANIZATION OF THIS T HESIS

The remainder of this thesis is organized as follows. In chapter 2, we will survey the research of image retrieval and discuss some issues need to concern. Then, we will survey the concept of morphological operations and morphological granulometry. In chapter 3, we will present our methods include using granuloemtric distribution, primitives of granuometric distribution, and granulometric histogram to describe objects sizes. Then, we will define how to evaluate similarity value between two images. In chapter 4, we will experiment with different features. Then, we will compare the performance of our method with other method.

In chapter 5, the conclusion and future work will be stated.

Chapter 2 Previous Research

We describe several color models in section 2.1, and color features which are used for image retrieval in section 2.2. In section 2.3, the basic morphological operators will be introduced. Finally, the distance functions for measuring image similarity are described in section 2.4.

2.1 Color Models

Color is one important factor for human vision. In this section, we introduce the formats of several color models, such as RGB, HSI, YIQ, YUV, XYZ, and L*a*b*. We also list the results of [3] which give the characteristics of above color models.

2.1.1 RGB Color Model

In the RGB model, each color appears in its primary spectral components of Red, Green, and Blue. This model is based on a Cartesian coordinate system, its color subspace of interest is the cube shown in Fig. 2-1. It is a hardware-oriented model, which is most commonly used for color monitors and a broad class of color video cameras. Therefore, processing images in the RGB color space directly is the fast method. However it is not natural for human’s perception. We can summarize by saying that RGB is ideal for image color generation, but its use for color description is much more limited.

Red Yellow

White

Green Cyan

Black Magenta

Blue

Fig. 2-1 RGB color cube.

2.1.2 HSI Color Model

The HSI color model is composed of Hue, Saturation, and Intensity. It is referred to HSV color model using the term Value instead of Intensity. Intensity is the brightness value, Hue and Saturation encode the chromaticity values. The HSI is very important and attractive for image processing since it represents colors similar to which human eye senses.

Given an image in RGB color format, the H component of each RGB pixel is obtained using the equation [1]

360 >

θ θ

⎧ ≤

= ⎨⎩ −

if B G

H if B G ^{( 1 )}

where

( ) ( )

2 1/ 2

1 cos 2

( )

θ ⁻

⎧ ⎡⎣ − + − ⎤⎦ ⎫

⎪ ⎪

= ⎨ ⎬

⎡ ⎤

⎪⎣ − + − − ⎦ ⎪

⎩ ⎭

R G R B

R G R B G B

( 2 )

The saturation component is given by

(

) ( )

1 min , ,

S R G B

R G B

= − + + ⎡⎣ ⎤⎦ ^{( 3 )}

Finally, the intensity component is given by

( )

=3 + +

I R G B ( 4 )

Fig. 2-2 shows the HSI model based on color circles. The Hue component describes the color itself in the form of an angle between [0,360] degrees. 0 degree mean red, 120 means green 240 means blue. 60 degrees is yellow, 300 degrees is magenta. The Saturation component signals how much the color is polluted with white color. The range of the S component is [0,1]. The Intensity range is between [0,1] and 0 means black, 1 means white.

Fig. 2-2 HSI color model.

2.1.3 YUV and YIQ Color Model

In the YUV color model, Y represents the luminance of a color while I and Q represent the chromaticity of a color. The conversion of RGB to YUV is given by [2].

0.30 0.59 0.11 0.493*( ) 0.877 * ( )

= + +

= −

Y R G B

U B Y

V R Y

( 5 )

The YIQ is used in NTSC color TV broadcasting. The original meanings of these names came from combinations of analog signals – I for in-phase chrominance, and Q for quadrature chrominance. The conversion of RGB to YIQ is given by [2].

0.299 * 0.587 * 0.114*

0.596* 0.275* 0.321*

0.212* 0.523* 0.311*

= + +

= − −

= − +

Y R G B

I R G B

Q R G B

( 6 )

The luminance-chrominance color models (YIQ, YUV) are proven effective. Hence, they are also adopted in image compression standards such as JPEG and JPEG2000. The YUV color space is very similar to the YIQ color space and both were proposed to be used with the NTSC standard, but because the YUV do not best correspond to actual human perceptual color sensitivities. NTSC uses I and Q instead.

2.1.4 XYZ Color Model

The conversion of RGB to XYZ is given by [2].

0.607 * 0.174 * 0.200 * 0.299 * 0.587 * 0.114 * 0.000 * 0.066 * 1.116 *

= + +

X R G B

Y R G B

Z R G B

( 7 )

2.1.5 L*a*b* Color Model

L*a*b* color model (also known as CIELAB) is a kind of visual uniform color system proposed by C.I.E. Its color model is shown in Fig. 2-3. The L* value represents luminance and ranges from 0 for black to 100. The maximum and minimum of value a* correspond to red and green, while b* ranges from yellow to blue.

Fig. 2-3 L*a*b* color model

The conversion of XYZ to L*a*b* is given by [2].

1/ 3

2.1.6 Comparison of above Color Models

Above color models has been claimed by many research that they are the most suitable color model for some reasons. In Table2-1, we list the comparison [3] of these color models in the following five terms.

A. Is the color model device independent?

B. Is the color model perceptual uniform?

C. Is the color model linear?

D. Is the color model intuitive?

E. Is the color model robust against varying imaging conditions?

Table 2-1 Characteristics of each color models

A B C D E

RGB N N N N

Dependent on viewing direction, object geometry, direction of the illumination, intensity and color of the illumination.

HSI N N H:N Y H: Dependent on the color of the illumination.

S: N S: Dependent on highlights and a change in the color of the illumination

I:Y

I: Dependent on viewing direction, object geometry, direction of the illumination, intensity and color of the illumination.

YUV and YIQ

Y N Y N

Dependent on viewing direction, object geometry, highlights, direction of the illumination, intensity and color of the illumination.

XYZ Y N Y N

Dependent on viewing direction, object geometry, highlights, direction of the illumination, intensity and color of the illumination.

L*a*b* Y Y N N

Dependent on viewing direction, object geometry, highlights, direction of the illumination, intensity and color of the illumination.

2.2 Color Features

The most frequently cited visual features for image retrieval are color, texture, and shape.

Among them, the color feature is most commonly used. It is robust to complex background and independent of image size and orientation. In this section, we will introduce two color features, color histogram, and color moment.

2.2.1 Color Histogram

Color Histogram is the most popular feature that is used in image retrieval. It is a global image feature. The advantages of color histogram are simple operation, invariant to translation, rotation, and scales of images.

A color histogram of an image counts the number of pixels with a given pixel value in red, green, and blue (RGB). To save storage and smooth out differences in similar but unequal images, the histogram is defined coarsely, with bins quantized to 8 bits, with 3 bits for each of red and green and 2 for blue.

2.2.2 Color Moment Primitives

The color distribution of an image can be characterized by color moments. Primitives of color moments [6] of an image can be viewed as a partial image feature. This feature contains local and global information of an image such that partly similar images will be retrieved.

The first color moment of the i -th color component (i=1,2,3) is defined by total number of pixels in the image. The h-th image moment, h=2,3,…, of the i-th color component is then defined as

(

^, ¹

)

^1/

Take the first H moments of each color component in an image s to form a feature vector, CT, which is defined as

1 2

where Z=H*3 and α α α₁, ₂, ₃ are the weights for Y, I, Q.

Based on the above definition, an image is first divided into X non-overlapping blocks.

For each block a, its h-th color moment of the i-th color component is defined byM_{a i}^h_. . Then, the feature vector, CB , of block a is represented as _a

,1 ,2 ,

From the above definition we can obtain X feature vectors. To speed up the image retrieval, we will find some representative feature vectors to stand for these feature vectors.

To reach this aim, a progressive constructive clustering algorithm [7] is used to classify all CB s into several clusters and the central vector of each cluster is regarded as a representative a

vector and called the primitive of the image. The central vector, PC , of the k-th cluster is _k defined by

2.3 Morphological Operator

Mathematical morphology offers a powerful tool for extracting image components that

are useful in the representation and description of region shape, such as boundaries, skeletons, texture, and the convex hull. Mathematical morphology is a set-theoretic method. Sets in mathematical morphology represent the shapes of objects in an image. The operations of mathematical morphology were originally defined as set operations and shown to be useful for image processing.

In general, morphological approach is based upon binary images. In binary images, each pixel can be viewed as an element in Z . Gray-scale digital images can be represented as ² sets whose components are in Z , two components are the coordinates of a pixel, and the ³ third corresponds to its discrete intensity value. The morphological operations input a source image and a structuring element which is another image usually smaller than source image.

The structuring element is a predetermined geometric shape, and there are some common structuring elements as shown in Fig. 2-4.

Fig. 2-4 Examples of structuring elements.

Here, we will discuss morphological operators in binary images [1, 7]. Given a source image A and a structuring element B inZ , with components a and b respectively. And ² φ denotes the empty set.

2.3.1 Basic definition

The Translation of A by the point x inZ , denoted² A_xv, is defined by

{

}

Ax^v = av v+x ∀ ∈av A = +A xv _{( 14 )}

where the plus sign refers to vector addition.

And the Reflection of B, denoted Bˆ , is defined as

{

}

B= −b bv v∈B

( 15 )

The examples of Translation and Reflection are shown in Fig. 2-5.

Fig. 2-5 Examples of (a) Translation and (b) Reflection.

2.3.2 Dilation, Erosion, Opening ,and Closing of Binary Images

We introduce two of the fundamental morphology operations：Dilation and Erosion used in binary images, and two operators：Closing and opening that extended from Dilation and Erosion.

The Dilation of A by B, denotedD

( )

A , is defined as xr

Bˆ

(a) (b)

( )

^ˆ

{

^| ^ˆ

}

DB A = ⊕ =A B x Bv + ∩ ≠xv A φ _{( 16 )}

where B is the structuring element.

And the Erosion of A by B, denotedE_B

( )

A , is defined as

( ) {

}

EB A =A B0 = x Bv + ⊂xv A ^{( 17 )}

Fig. 2-6(c) and (d) are examples of Dilation and Erosion. The dilation of A by B is the set of all xv displacements such that Bˆ and A overlap by at least one nonzero element. The erosion of A by B is the set of all points xv such that B translated by xv is contained in A.

The Closing of set A by structuring element B, denotedC_B

( )

A , is defined as

( )

^ˆ

( ( ) ) (

^ˆ

)

B B B

C A = • =A B E D A = A⊕ 0B B ^{( 18 )}

And the Opening of set A by structuring element B, denoted O_B

( )

A , is defined as

( )

( ( ) ) ( )

B B B

O A = A Bo =D E A = A B0 ⊕B ^{( 19 )}

The examples of closing and opening are shown in Fig. 2-6(e) and (f). The closing of A by B is simply the dilation of A by B, followed by the erosion of the result by Bˆ . The opening of A by B is simply the erosion of A by B, followed by the dilation of the result by Bˆ .

Another examples of closing and opening are shown in Fig. 2-7(b) and (c). From the figure, closing will connect thin lines and fill small holes and opening will leave objects larger than structuring element and subtract small thin lines.

(a). Set A (b). Structuring element B

(c). Dilation (d). Erosion

(e). Closing (f). Opening

Fig. 2-6 (a). Set A. (b) Structuring element B. (c). The Dilation of A by B. (d). The Erosion of A by B. (e). The Closing of A by B. (f). The Opening of A by B.

(a) (b) (c)

Fig. 2-7 (a) original image (b)(c) closing and opening with 3*3 structuring element, respectively.

2.3.3 Extension to Grayscale Images

In this section, we extend to gray-level images the basic operations of dilation, and erosion. Throughout the discussions that follow, we deal with digital image functions of the forms f x y and ( , )( , ) b x y , where ( , )f x y is the gray-scale image and ( , )b x y is a structuring element.

Gray-Scale dilation of f by b, denoted by f ⊕ , is define as： b

(f ⊕b s t)( , )=max{ (f s−x t, −y)+b x y( , ) (s−x), (t− ∈y) D_f; ( , )x y ∈D_b} ( 20 )

where D_f and D are the domain of _b f and b , respectively. As before, b is the structuring element of the morphological process but note that b is now a function rather than a set.

Because dilation is based on choosing the maximum value of f + b in a neighborhood defined by the shape of the structuring element, the general effect of performing dilation on a gray-scale image is two-fold: (1) if all the values of the structuring element are positive, the output image tends to be brighter than the input; and (2) dark details either are reduced or eliminated, depending on how their values and shapes relate to the structuring element used for dilation. As illustrated in Fig 2-8 (b).

Gray-scale erosion of f by b, denoted by f0 , is define as： b

(f0b s t)( , )=min{ (f s+x t, +y)−b x y( , ) (s+x), (t+y)∈D_f; ( , )x y ∈D_b} ⁽²¹⁾

where D_f and D are the domain of _b f and b . Because erosion is based on choosing the minimum values of f − in a neighborhood defined by the shape of the structuring element, b the general effect of performing erosion on a gray-scale image is two-fold: (1) if all the values of the structuring element are positive, the output image tends to be darker than the input; and

(2) bright details are reduced or eliminated, depending on the used structuring element. As illustrated in Fig 2-8 (c).

(a) (b) (c)

Fig. 2-8 (a) The original of Lena image (b) Dilation of Lena image (c) Erosion of Lena image.

The usage of closing and opening is to smooth contours of objects. In gray-level images, opening is used for darker object objects with brighter background, and closing is used for brighter objects with darker background. As illustrated in Fig 2-9 (b) and (c).

(a) (b) (c)

Fig. 2-9 (a) The original of Lena image (b) Closing of Lena image (c) Opening of Lena image.

2.3.4 Morphological Gradient

The Morphological Gradient of an image, denoted G:

( ) ( ) ( ) ( )

B B

G=D A −E A = A⊕B − 0A B ^{( 22 )}

( ) ( )

G=DB A − =A A⊕B −A ^{( 23 )}

( ) ( )

G= −A EB A = − 0A A B ^{( 24 )}

The morphological gradient highlights sharp gray-scale transitions in the source image.

In other words, morphological can extract the boundary of an object. However, the morphological gradient is sensitive to the shape of the chosen structuring element. Here are examples of dilation, erosion, closing, opening, and morphological gradient of the gray-scale image, Lena, with the 3×3 structuring element as shown in Fig 2-10.

(a) (b)

Fig. 2-10 (a) The original of Lena image (b) Morphological gradient of Lena image

2.3.5 Morphological Granulometry

Granulometries were introduced first by G. Matheron [8], and have been proven to be very useful in image analysis, finding application in image tasks such as texture classification, pattern analysis, and image segmentation. A granulometry can simply be defined as series of morphological openings with structuring elements of increasing sizes. The granulometric function maps each structuring element size to the number of image pixels removed during the opening operation with the corresponding structuring element. It is easy to extend the binary granulometry into grayscale form by replacing the binary opening with a grayscale opening.

There are three kinds of granulometric function [9]:

1 Number of particles of ( )

Sk f γ .

2 Surface area of ( )

Sk f

γ , often called size distribution. This measure is typically chosen to

be the area (number of ON pixels) in the binary case, and the volume (sum of all pixel values) in the grayscale case.

3 Loss of surface area between ( )

Sk f γ and

1( )

Sk f

γ ₋ , called the granulometric curve, or

pattern spectrum.

where γ is the binary morphological opening, f represents the original image, and , 1, 2,...

S kk = , is a sequence of opening structuring elements of increasing size.

2.3.6 Morphological Primitives

This feature is proposed by J-S Wu [10]. Before the morphological operation, first we transform RGB color model into YIQ model for the sack of human perception. Then we down-sample each image to remove the noise. Let f(x,y)=

(

Yxy,Ixy,Qxy

)

denote the color

value of the down-sampled image located at

( )

x,y . We define the following 9-tuple vector as color channels respectively.

From the definition above, we get both the color and shape information of a pixel within the filter. We use large morphological gradient value to identify an edge pixel, and a small one to a pixel on a flat region. Then we get the color information in the neighborhood of the edge pixels. Therefore, two pixels with the same color value might not have the same context.

After morphological context extraction, there will be many similar contexts. So we use the following algorithm to cluster these contexts to small number of contexts which are more representative.

1. Initially, we extract the first pixel’s context, and then this context is viewed as the center of the first cluster.

2. Extract next pixel’s context, C_x_,_y, find the nearest cluster center MP_nearest from

currently existing clusters. If the nearest distance is smaller than a threshold, T , then _d assign this context to the nearest cluster and update MP_nearest. Otherwise, construct a new cluster that the cluster center is C_x_,_y

3. Repeat .step2 until all pixels are processed.

In the above algorithm, MP_nearest is defined by

[ ]

2.4 Distance Functions for Similarity Measure

For measuring perceptual similarity between images, many distance functions have been used. Here we will introduce Minkowski-like metrics [4] and Dynamic Partial Function [5].

2.4.1 Intersection

Histogram intersection is the standard measure used for color histograms. First, a color histogram H is generated for each image i in the database. Next, each histogram is _i normalized, so that its sum equals unity. After this step, it effectively removes the size of image. The histogram is then stored in the database. For any selected model image, its histogram H is intersected with all database image histograms _m H , according to the _i equation

intersection min( , )

∑

ⁿ ⁱ^j ^m^j

H H ( 28 )

where j denotes histogram bin j , with each histogram having n bins. The closer intersection value is to 1, the better the image match.

Although, it is fast to compute the histogram, the intersection value is sensitive to color quantization. And two different images may have similar color histograms as shown in Fig.

2-11. Therefore, only use the color histogram as color feature of image is not enough for image retrieval.

Fig. 2-11 Two different images with similar color histograms.

2.4.2 Primitive Similarity

First, we provide several definitions. The k -th primitive of a query image q is represented as:

.1 .2 .

[ , ,..., ] , where 1, 2 ,..., ,

q q q q

k k k k Z

PC = pc pc pc k= m ( 29 )

and m is the number of primitives in the query image. Theλ -th primitive of a matching image s is denoted as

.1 .2 .

[ , ,..., ]

s s s s

PC_λ = pc_λ pc_λ pc_λZ ( 30 )

The distance between PC and _k^q PC_λ^s is defined as follows:

( )

. . .

q s q s

k k i i

D PC _λ pc pc_λ

∑

− ^{( 31 )}

The minimum distance between PC_k^{q s}^, and all primitives of s is defined by

( )

, ,

_ _k^{q s} min _ _k^{q s}.

D PC = _λ D PC _λ ( 32 )

The distance between the query image q and the matching image s is defined by

, ,

wheren is the size of the k-th cluster. The similarity measure between q and s is defined as _k^q

Note that the largerSim^{q s}^, a matching image has, the more similar it is to the query image.

2.4.3 Minkowski-Like Metrics

Each image can be represented by a p-dimensional feature vector ( ,x x₁ ₂,...,x_p),

where each dimension represents an extracted feature. Δd_i= x_i−y_i , i=1,2, . . .,p is defined as the feature distance in feature channel i . Minkowski-like distance functions are commonly used to measure similarity. The distance between any two images, X and Y, is defined as

p 1/ the Euclidean or L2 distance. And, when r is a fraction, it defines a fractional function.

2.4.4 Dynamic Partial Function

To disagree with Minkowski-like metric, DPF does not consider all features for similarity measurement. If we define the similar data set

1 2

The smallest ' of ( , ,..., )

Δ =_m m d sΔ _i Δ Δd d Δd_p ( 36 )

then the dynamic partial distance between two image X and Y, is defined as

where m and r are tunable parameters. When m= p , DPF degenerates to a Minkowski-like metric. When m< p , it counts the smallest m feature distances between two images.

Studies have shown [5] that the Minkowski approach does not model human perception. The performance of DPF has been shown [5] to be superior to that of widely-used distance functions, For instance, for a recall of 80 percent, the retrieval precision of DPF is shown to be 84 percent for the test data set used.

2.4.5 The DPF Enhancement 2.4.5.1 Thresholding Method

Here, we select the features that have a difference smaller than θ belong to the similar feature set A . The Thresholding distance of image pair (X,Y) is then defined as _θ

2.4.5.2 Sampling Method

The main idea of Sampling-DPF is to try different m settings simultaneously and to guess that some of the sampling points can be near the optimal m. Suppose we sample N settings of m, denoted as m , 1,...,_n n= N . At each sampling m , a ranking list, denoted as _n

( ,Φ )

Rn X , is generated. Finally, the sampling method obtains N ranking lists and produces the final ranking, denoted as R_r( ,Φ X . )

Two methods are suggested, one is using the smallest rank as the final rank (SR). The other is using the average distance (or rank) as the final distance (AR). The AR distance is

在文檔中使用型態篩選法及動態部分函數作影像擷取 (頁 12-0)