The EM based Multiple-Instance Learning Method

Multiple Instance Learning Methods

3.2 The EM based Multiple-Instance Learning Method

When treating the image retrieval problem as the multiple-instance problem, each image will be labelled by the subjective concept of the user with a set of positive example images and a set of negative example images. A user can express their desired concepts through the interface provided by the system, and the concepts which the user want to describe may be a certain objects or just an abstract concepts. For example, when the users want to identify an images with ’waterfall’, then the region of the ’waterfall’ or the subimage of the

’waterfall’ in that image is one instance. User can describe her/his concepts by labelling the image as a positive example if it contains at least one instance of the ’waterfall’, or a negative example if it does not contain any instance of the ’waterfall’.

The above definition accords with the characteristic in CBIR system when the user would like to submit their images. For example, when the user submits the images which she/he is interested, using several images to describe their concepts in the way of Query-by-Example. Why these examples will be submitted to the system? It is because these example images contain certain region or several regions meet the user’s concept.

The label of the image which is given by the user is based on the whole image to the content, rather than individual examples of images, so the system does not know each instance in a image belongs to which class, that is to say, the issue of the CBIR accords with the characteristic of the instance learning problems. The goal of multiple-instance learning is to find a concept that correctly classifies the labels on the training set and to predict the labels for new images.

3.2.1 The Energy Function of the EM based Multiple-Instance Learning Method In the Multiple-Instance learning, conceptual related (positive) images and conceptual unrelated (negative) images are designed for reinforced and antireinforced learning of a user’s interesting image class. Each positive training image contains at least one interested subimage related to the desired image class, and each negative training image should not contain any subimage related to the desired image class. The target of the

Multiple-Figure 3.1: The schematic of the positive and negative for the class t

Instance learning is to search the optimal point of the image class in the feature space, where the optimal point is close to the intersection of the feature vectors extracted from the subimages of the positive training images and is far from the union of the feature vectors extracted from the subimages of the negative training images.

For example, if one wants to train an image class t with P_tpositive images and N_t neg-ative images. Each positive image has V_t⁺ instances (i.e. |P₁| = V₁⁺, |P₂| = V₂⁺, · · · , |P_t| = V_t⁺), and each negative image has V_t⁻ instances (i.e. |N₁| = V₁⁻, |N₂| = V₂⁻, · · · , |N_t| = V_t⁻). It can denote the k^thfeature vector extracted from the k^thinstances of the i^thpositive image as X⁺_ik, and the k^thfeature vector extracted from the k^thinstances of the i^th negative example as X⁻_ik. The schematic of the positive and negative for the class t is shown as Figure 3.1.

The probability that X⁺_ik belongs to class t is P (t | X⁺_ik), and the probability that X⁻_ik belongs to class t is P (t | X⁻_ik). A measurement called Diverse Density is used to evaluate that how many different positive images have feature vectors near a point t, and how far the negative feature vectors are from a point t. The Diverse Density for a class t is defined as [18]

DD_t =

i=1

(1 −

V_t⁺

k=1

(1 − P (t | X⁺_ik)))

i=1

(

V_t⁻

k=1

(1 − P (t | X⁻_ik))). (3.1)

3.2.2 The computation of Diverse Density

The optimal point of the class t is appeared where the Diverse Density is maximized. By taking the first partial derivatives of Eq.(3.1) with respect to parameters of the class t and setting the partial derivatives to zero, the optimal point of the class t can be obtained.

Suppose the density function of the class t is a D-dimensional Gaussian mixture with uncorrelated features. The parameters are the mean µtcd, the variance σ_tcd² , and the cluster prior probability p_tc of each dimension d in each cluster c of the class t. The estimating parameters of the class t can be derived by _∂µ^∂

tcdDD_t= 0, _∂σ^∂2

Q_tc(X^?_ik) = P (t|X^?_ik)P (c|X^?_ik, t) 1 − P (t|X^?_ik) , P (X^?_ik|c, t) = 1

Q_D

d=1(2πσ²_tcd)¹² · exp(−1 2

XD d=1

(x^?_ikd− µ_tcd σ_tcd )²),

and the l represents the iterative number of the EM algorithm which is introduced below, and the notation ? represents + or −.

According to Eqs.(3.2), (3.3), and (3.4), an EM based Multiple-Instance learning al-gorithm to learn these parameters was proposed.[48]. The EM based Multiple-Instance learning algorithm contains two steps: the expectation step (E-step) and the maximiza-tion step (M-step). The algorithm is described as follows.

EM based Multiple-Instance learning algorithm

1. Choose an initial point in the feature space, and let its parameters are µ⁽⁰⁾_tcd, σ²⁽⁰⁾_tcd , and p⁽⁰⁾_tc .

2. E-Step : Using the calculated model parameters µ^(l)_tcd, σ_tcd^2(l), p^(l)_tc, Eqs.(3.5) and (3.6), estimate P^(l)(c|X^?_ik, t) and P^(l)(t|X^?_ik).

M-Step : Using the estimated P^(l)(c|X^?_ik, t) and P^(l)(t|X^?_ik), compute the new model parameters µ^(l+1)_tcd , σ_tcd^2(l+1), and p^(l+1)_tc according to Eqs.(3.2), (3.3), and (3.4).

3. Calculate the diverse density DD^(l+1). If (DD^(l+1)−DD^(l)) is smaller than a predefined threshold ², then stop the process. Otherwise, incremental iterative number l and loop step 2.

As we can see, the proposed algorithm provides comprehensive procedures to maxi-mizing the measurement of diverse density of the given multiple instances X⁺_ik and X⁻_ik. Furthermore, the new EM based learning framework converts multiple-instance problem into a single-instance treatment by using EM algorithm to estimate and to maximize the instance responsibility for the corresponding label of each bag of instances.

However, how to properly determine the proper number of the clusters in the mixture

Gaussian model of each class, that is, a problem about the model selection, is still an important issue which we want to tackle in the next section.

Besides, in order to build a more powerful model, we consider the relationship between weight and features, for example, when the user click certain point to show the concept of the image, the importance of the neighbor pixels of that point should be decreased gradually when the distance of the neighbor points more far away from the that point, will also be included in the future systems.

In the next section, we combine the proposed EM based Multiple-Instance Learning Method with the probabilistic variant of the decision-based modular neural networks(PDBNN)[50]

to become a new learning model : Multiple-Instance Neural Network (MINN). The new learning model let the user’s concept forms in the training phase and gets the relevance feedback from the users in the testing phase. At the same time, we propose a new method of the instance extraction from the image, which can consider the the weighting of the feature.

在文檔中多實例類神經網路影像檢索之研究 (頁 25-29)