Multiple Instance Learning

In document 誌謝 (Page 40-44)

Since we adopt the assumption of multiple instance learning (MIL) to the problem, we choose the algorithm of multiple instance classifier as our core process in the system.

This section describes several MIL methods which are applied in the experiment.

3.5.1 Single Instance Learning: a Naive Approach

One of the naive algorithm of MIL learns with instances in the bags. The instances used for training are labeled according to the bags they belong to. Comparing with multiple instance learning, this approach is named as “Single Instance Learning (SIL)”. Without violating the assumption of MIL, instances in negative bags are certainly labeled as neg-ative. However, instances in positive bags are all regarded as positive, which causes the negative instances to be mislabeled as positive. Even though suffering from mislabeling, SIL performs well in some problems. Soumya[25] provided an empirical comparison to show that SIL is superior to some MIL algorithms.

3.5.2 Semi-Supervised Approach

MissSVM (Multi-Instance learning by Semi-Supervised Support Vector Machine) is an approach proposed by Zhou[33], which combines multiple instance learning with semi-supervised learning. MIL learns by bags, which contain unsure labels; semi-semi-supervised

learning learns from both labeled and unlabeled data. Hence, MIL could be viewed as a special case of semi-supervised learning.

Zhou[33] gives the definition to MIL by unfolding the instances from bags. X is the instance space. A bag is defined as Xi = {xi,1, xi,2, ..., xi,ni}, xi,j ∈ X , with ni, the length of Xi. The training bags are concatenated by placing negative bags before positive bags. The bag set is{X1, X2, ..., Xq, Xq+1, ..., Xq+p−1, Xm}, where X1to Xqare negative bags and Xq+1to Xmare positive bags. All bags are unfolded without changing the order. The instance set is{x1,1, x1,2, ..., x1,n1, x2,1, ..., xm,1, xm,2, ..., xm,nm}. Then the set is re-indexed as{x1, x2, ..., xTL, ..., xT}, in which T =m

i=1ni and TL = ∑q i=1ni. The original bag set is transformed to an instance set. With the instance set, the problem is defined as Definition 1.

Definition 1. Given a set of labeled negative instances{(x1,−1), (x2,−1), ..., (xTL,−1)}

and a set of unlabeled instances{xTL+1, xTL+2, ..., xT}, to learn the target function Fs : X → {+1, −1} s.t. each positive bag Xicontains at least one positive instance.

The definition is a semi-supervised task with a constraint. For any unseen bag X = {x∗,1, x∗,2, ..., x∗,n}, the prediction function F is defined as:

F (X) = +1, if there exists a j ∈ {1, 2, ..., n} s.t. Fs(x∗,j) = +1 F (X) =−1, otherwise


After formulating the MIL problem to semi-supervised learning problem, semi-supervised learning algorithm could be applied to solve the MIL problem.

3.5.3 Multiple-Instance Classification Algorithm

MICA (multiple-instance classification algorithm) is an algorithm proposed by Mangasar-ian[20], solving MIL problem by representing each bag as the convex combination of in-stances in the bag. And then linear programming is adopted to the solution to predict the unseen bags.

Given a positive bag, which is a set of instances X = {x1, x2, ..., xn}, there is a set of coefficients α = 1, α2, ..., αn}, 0 ≤ αi ≤ 1,n

i=1αi = 1 such that x = αTX.

In the coordinates space, x is a node on the convex combination exterior, and a decision boundary that separates x as the positive side is found. The objective function realizes the MIL assumption.

3.5.4 Support Vector Machine for Multiple Instance Learning

Andrew[3] raised the algorithm “Support Vector Machines (SVM) for Multiple-Instance Learning (MIL)”, which contains 2 types of SVM for MIL problem: mi-SVM and SVM. Both deal with data in bag format, but are different in the optimization target. MI-SVM optimizes on the bag level while mi-MI-SVM optimizes on the instance level.

According to the assumption of MIL, all instances in negative bags are definitely neg-ative. In positive bags, the assumption only tells that at least one instance is positive;

therefore, all the labels of instances in positive bags are unknown. The assumption is de-fined as Definition 2 by Andrew and formulated as a linear constraint, which is illustrated in Formula (3.2).

Definition 2. Given a bag set B ={X | X is a bag}, where each X = {x} is associated with a label Y ∈ {1, −1}. Each instance x ∈ X carries a label y. When YI = −1, all instances in XI are labeled with−1. If YI = 1, the instance labels are not ensured but constrained with YI = maxxi∈XIyi.






yi+ 1

2 ≥ 1,∀I s.t. YI = 1 yi =−1,∀I s.t. YI =−1


The two types of SVM for MIL are explained with the notations used in SVM[27].


mi-SVM aims at maximizing instance margin. The process learns the best labeling pat-terns in positive bags. Not violating the constraint defined in Formula (3.2), each instance in positive bags is assigned with a label y∈ {1, −1}; on the other hand, instances in neg-ative bags are all assigned with y = −1. The support vector machine learns a model w

based on the assigned pattern, objecting to the SVM objective function. The solution for mi-SVM is the model w with maximal margin of instances and is defined as Formula (3.3)

w = arg min





2∥w∥2+ C



s.t.∀i :yi(⟨w, xi⟩ + b) ≥ 1 − ξi, ξi ≥ 0, yi ∈ {−1, 1}



MI-SVM aims at maximizing bag margin. Each positive bags are represented by an in-stance (witness) in the bag and all negative bags are expanded as negative inin-stances. The best witness in positive bags are selected by optimizing the SVM objective function. The solution of MI-SVM is defined as Formula (3.4), where s is a selector to decide the wit-ness.

w = arg min





2∥w∥2+ C



s.t. ∀I :YI =−1 ∧ −⟨w, xi⟩ − b ≥ 1 − ξI,∀i ∈ I or YI = 1∧ ⟨w, xs(I)⟩ + b ≥ 1 − ξI, and ξI ≥ 0.


3.5.5 Multiple Instance Learning for Sparse Positive Bags

The series of sparse positive MIL algorithms deal with the data where few positive in-stances are in positive bags. Bunescu[9] proposed three scenarios of sparse positive MIL, which are explained as following:

Sparse MIL

Sparse MIL (sMIL) modifies the constraint of SIL. In SIL, every instance in negative bags is regarded as negative, and the same as positive bags. But sMIL assumes that few instances in positive bags are really positive, so it favors the situation that few positive instances exist in positive bags. sMIL also models for large bags. It looses the constraint when bag size is large because it is not easy to find a positive instance for a sparse positive bag. sMIL is equivalent to SIL when there is only one instance in the positive bag.

Sparse Transductive MIL

The transductive SVM modifies the standard SVM to a constrained version, where the decision boundary is assumed as far from the unlabeled data as possible. In the problem of MIL, instances in positive bags could be viewed as unlabeled instances since the as-sumption “at least one instance in the positive bag is positive” indicates that the labels in positive bags are unsure. Sparse Transductive MIL (stMIL) replaces the original SVM with transductive SVM.

Sparse Balanced MIL

Sparse balanced MIL (sbMIL) takes advantages of SIL and sMIL. The former favors on rich positive bag while the latter favors on sparse positive bag. sbMIL adopts the parameter η to decide the percentage of positive instances in positive bags, and then trains a decision function as sMIL with the original bags. And the instances in positive bags are adjusted with the distribution decided by η. Given the result predicted by sbMIL and ranked by the score, the top (η× positive bag size) instances in a positive bag are regarded as positive and the rest in the bag are negative. The rearranged bags are used for training the final decision function with SIL and for predicting the final results.

In document 誌謝 (Page 40-44)

Related documents