• 沒有找到結果。

Visual Keyword Generation

5.2 Image Pre-processing and Feature Extraction

5.2.3 Visual Keyword Generation

Similar to the (text) keywords for representing the key information of a docu-ment, the visual keyword is proposed to illustrate the visual key characteristics of an image. In general, an image can be characteristically specified by a few objects, each of which usually is composed of one or few near homogenous re-gions. For each of these regions, a set of visual features such as color, texture, and shape can be extracted to represent the region. With these visual features, the visual keyword is defined as follows.

Definition 5.2.1 (Visual Keyword). Given a homogeneous region i in an image, the visual keyword ωi is a triple of Gaussian mixture models (GMM)

{Gsi, Gci, Gti} to formulate the spatial, color, and texture features of the region i.

The 2D GMM Gsi approximates the spatial features (location and shape) of the region i according to its means and the covariance matrices. The other GMMs Gci and Gti formulate the average and variation of color and texture features over the region i by their means and the covariance matrices, respectively.

The key issues for the visual keyword representation are shown as follows:

(1) The precise segmentation or skillful sketch of the region is no longer needed, and (2) the approximating a region by mixture Gaussian distributions allows the searching for its similar regions more flexible and robust. An exemplar of the visual keyword is shown in Fig. 5.5. The three visual keywords, shown as the elliptic regions ω1, ω2, and ω3, are created to cover the two sails and the boat body. As we can see, the shape and the location of these regions can be formulated by three 2D mixture Gaussian distributions, respectively.

Figure 5.5: An example of the visual keyword. The visual keywords ω1, ω2 and ω3 are used to represent the sailboat in the image.

A visual keyword is generated to formulate the spatial, color and texture features of a homogeneous region via two steps: (1) the spatial modelling and

(2) the color and texture modelling.

Spatial Modelling

For illustration purpose, we often use an elliptic region to illustrate a 2D Gaus-sian distribution. In addition, the shape of an elliptic region can be altered by changing the parameters (the mean, the covariance matrix, and the prior probability) of its corresponding 2D Gaussian distribution. Thus, an arbitrary shaped region can be approximated by the union of several elliptic regions.

In the following, we will present the methods and procedures of adjusting the parameter values of a 2D mixture Gaussian distribution.

For a given homogeneous region Ai = {x(l) : l = 1, 2, . . . , L} and its corresponding Homogeneous Region Array (HRA) Hi, where L is the number of pixels in Ai, suppose a 2D mixture Gaussian distribution ps(x(l) | ωi) formulates the spatial feature of Ai. Define ps(x(l) | θs,ri, ωi) as a Gaussian cluster to comprise ps(x(l) | ωi), i.e.,

ps(x(l) | ωi) =

Ni

X

ri=1

Pss,ri | ωi)ps(x(l) | θs,ri, ωi), (5.2.5)

where θs,ri represents the parameter set {µs,ri, Σs,ri}, and Pss,ri | ωi) denotes the prior probability of the cluster ri. By definition, PNi

ri=1Pss,ri | ωi) = 1, where Ni is the number of clusters in ps(x(l) | ωi). Suppose the cluster ri is a 2D Gaussian distribution:

ps(x(l) | θs,ri, ωi) = exp

12(x(l) − µs,ri)TΣ−1s,ri(x(l) − µs,ri)

2π|Σs,ri|1/2 . (5.2.6)

The dissimilarity between the HRA Hi and 2D mixture Gaussian

distribu-tion ps(x(l) | ωi) can be measured by the cross-entropy function

E = − XL

l=1

Hi(x(l)) ln(ps(x(l) | ωi)), (5.2.7) regarded as an error function between the region Ai and visual keyword ωi. By applying the EM algorithm, (5.2.7) is minimized by the following update equations for the parameters of 2D mixture Gaussian distribution: At each epoch j,

The iteration of EM computation is continuous until (5.2.7) becomes less than a given threshold. Fig. 5.6 illustrates the spatial modelling of a sail boat.

The original image with a reference point (the black dot) is depicted in Fig.

5.6(a), and the corresponding homogenous region and its spatial model, a 2D GMM comprised of two Gaussian clusters, is depicted in Fig. 5.6 (b) and (c, respectively.

Color and Texture Modelling

After the spatial modelling is done, the homogeneous region Ai is approximated by the 2D mixture Gaussian distribution ps(x(l) | ωi). Suppose ps(x(l) | ωi)

(a) (b) (c)

Figure 5.6: The spatial feature modelling of a sail region. (a) The original image with a reference point shown as a black dot on a sail. (b) The corresponding homogeneous region of the reference point. (c) The modelling results of a 2D mixture Gaussian approximation on the sail region. The pictures in (b) and (c) are shown as gray level images.

consists of Ni Gaussian clusters, then Ai can be divided into Ni elliptic regions, {ar1, ar2, . . . , arNi}, each of which corresponds to an Gaussian cluster in ps(x(l) | ωi). For each elliptic region, its color and texture features are modelled by one of Gaussian clusters in Gci and Gti, respectively. In the following, only the color modelling is illustrated. The notations and formulas for texture modelling can be obtained by replacing the subscript c by t.

Suppose a Gaussian distribution pc(cx(l) | θc,ri, ωi) is used to approximate the color (texture) feature distribution in an elliptic region ari. Let cx(l) be a Dc-dimensional color (texture) feature vector at a pixel x(l), then the visual keyword ωi models the color (texture) with a GMM pc(cx(l) | ωi) comprised by Ni Gaussian clusters:

pc(cx(l) | ωi) =

Ni

X

ri=1

Pcc,ri | ωi)pc(cx(l) | θc,ri, ωi), (5.2.8)

where θc,ri represents the parameter sets {µc,ri, Σc,ri}, and Pcc,ri | ωi) denotes the prior probability of the cluster ri. Suppose pc(cx(l) | θc,ri, ωi) is a Gaussian

distribution,

where N(ari) denotes the number of pixels in the elliptic region ari.

After modelling the spatial, color, and texture of an elliptic region ari, ps(x(l) | θs,ri, ωi), pc(cx(l) | θc,ri, ωi), and pt(tx(l) | θc,ri, ωi) can be merged into a Gaussian cluster as

p(z(l) | θri, ωi) = ps(x(l) | θs,ri, ωi)pc(cx(l) | θc,ri, ωi)

× pt(tx(l) | θc,ri, ωi), (5.2.12) where z(l) = (x(l), cx(l), tx(l))T, and θri = (θs,ri, θc,ri, θt,ri). Then, for the ho-mogenous region Ai, the visual keyword ωi formulates its spatial, color, and texture features by a uniformed GMM:

p(z(l) | ωi) = prob-ability of clusters ri.

Since a visual keyword is in the form of mixture Gaussian distribution, its difference from the other one can be measured by (3.1.9), which is described in Section 3.1. While the spatial relation of regions is concerned, the visual string is generated and presented in the following section.

相關文件