Since we adopt the assumption of multiple instance learning (MIL) to the problem, we choose the algorithm of multiple instance classifier as our core process in the system.

This section describes several MIL methods which are applied in the experiment.

**3.5.1** **Single Instance Learning: a Naive Approach**

One of the naive algorithm of MIL learns with instances in the bags. The instances used
for training are labeled according to the bags they belong to. Comparing with multiple
**instance learning, this approach is named as “Single Instance Learning (SIL)”. Without**
violating the assumption of MIL, instances in negative bags are certainly labeled as
neg-ative. However, instances in positive bags are all regarded as positive, which causes the
negative instances to be mislabeled as positive. Even though suffering from mislabeling,
SIL performs well in some problems. Soumya[25] provided an empirical comparison to
show that SIL is superior to some MIL algorithms.

**3.5.2** **Semi-Supervised Approach**

MissSVM (Multi-Instance learning by Semi-Supervised Support Vector Machine) is an approach proposed by Zhou[33], which combines multiple instance learning with semi-supervised learning. MIL learns by bags, which contain unsure labels; semi-semi-supervised

learning learns from both labeled and unlabeled data. Hence, MIL could be viewed as a special case of semi-supervised learning.

Zhou[33] gives the definition to MIL by unfolding the instances from bags. *X is*
*the instance space. A bag is defined as X**i* = *{x**i,1**, x**i,2**, ..., x**i,n**i**}, x**i,j* *∈ X , with n**i*,
*the length of X**i*. The training bags are concatenated by placing negative bags before
positive bags. The bag set is*{X*1*, X*_{2}*, ..., X*_{q}*, X*_{q+1}*, ..., X*_{q+p}_{−1}*, X*_{m}*}, where X*1*to X** _{q}*are

*negative bags and X*

_{q+1}*to X*

*are positive bags. All bags are unfolded without changing the order. The instance set is*

_{m}*{x*

*1,1*

*, x*

_{1,2}*, ..., x*

_{1,n}_{1}

*, x*

_{2,1}*, ..., x*

_{m,1}*, x*

_{m,2}*, ..., x*

_{m,n}

_{m}*}. Then the*set is re-indexed as

*{x*1

*, x*

_{2}

*, ..., x*

_{T}

_{L}*, ..., x*

_{T}*}, in which T =*∑

*m*

*i=1**n*_{i}*and T** _{L}* = ∑

*q*

*i=1*

*n*

*. The original bag set is transformed to an instance set. With the instance set, the problem is defined as Definition 1.*

_{i}**Definition 1. Given a set of labeled negative instances***{(x*1*,−1), (x*2*,−1), ..., (x**T*_{L}*,−1)}*

and a set of unlabeled instances*{x**T**L*+1*, x*_{T}_{L}_{+2}*, ..., x*_{T}*}, to learn the target function F** ^{s}* :

*X → {+1, −1} s.t. each positive bag X*

*i*contains at least one positive instance.

*The definition is a semi-supervised task with a constraint. For any unseen bag X** _{∗}* =

*{x*

*∗,1*

*, x*

_{∗,2}*, ..., x*

_{∗,n}

_{∗}*}, the prediction function F is defined as:*

*F (X*_{∗}*) = +1, if there exists a j* *∈ {1, 2, ..., n**∗**} s.t. F*^{s}*(x** _{∗,j}*) = +1

*F (X*

*) =*

_{∗}*−1, otherwise*

(3.1)

After formulating the MIL problem to semi-supervised learning problem, semi-supervised learning algorithm could be applied to solve the MIL problem.

**3.5.3** **Multiple-Instance Classification Algorithm**

MICA (multiple-instance classification algorithm) is an algorithm proposed by Mangasar-ian[20], solving MIL problem by representing each bag as the convex combination of in-stances in the bag. And then linear programming is adopted to the solution to predict the unseen bags.

*Given a positive bag, which is a set of instances X =* *{x*1*, x*_{2}*, ..., x*_{n}*}, there is a set*
*of coefficients α =* *{α*1*, α*_{2}*, ..., α*_{n}*}, 0 ≤ α**i* *≤ 1,* ∑_{n}

*i=1**α*_{i}**= 1 such that x = α**^{T}*X.*

**In the coordinates space, x is a node on the convex combination exterior, and a decision**
**boundary that separates x as the positive side is found. The objective function realizes the**
MIL assumption.

**3.5.4** **Support Vector Machine for Multiple Instance Learning**

Andrew[3] raised the algorithm “Support Vector Machines (SVM) for Multiple-Instance
**Learning (MIL)”, which contains 2 types of SVM for MIL problem: mi-SVM and **
**SVM. Both deal with data in bag format, but are different in the optimization target. **
MI-SVM optimizes on the bag level while mi-MI-SVM optimizes on the instance level.

According to the assumption of MIL, all instances in negative bags are definitely neg-ative. In positive bags, the assumption only tells that at least one instance is positive;

therefore, all the labels of instances in positive bags are unknown. The assumption is de-fined as Definition 2 by Andrew and formulated as a linear constraint, which is illustrated in Formula (3.2).

**Definition 2. Given a bag set B =**{X | X is a bag}, where each X = {x} is associated*with a label Y* *∈ {1, −1}. Each instance x ∈ X carries a label y. When Y**I* = *−1, all*
*instances in X** _{I}* are labeled with

*−1. If Y*

*I*= 1, the instance labels are not ensured but

*constrained with Y*

*I*= max

*x*

*i*

*∈X*

*I*

*y*

*i*.

∑

*i**∈I*

*y** _{i}*+ 1

2 *≥ 1,∀I s.t. Y**I* = 1
*y** _{i}* =

*−1,∀I s.t. Y*

*I*=

*−1*

(3.2)

The two types of SVM for MIL are explained with the notations used in SVM[27].

**mi-SVM**

mi-SVM aims at maximizing instance margin. The process learns the best labeling
pat-terns in positive bags. Not violating the constraint defined in Formula (3.2), each instance
*in positive bags is assigned with a label y∈ {1, −1}; on the other hand, instances in *
*neg-ative bags are all assigned with y =* **−1. The support vector machine learns a model w**

based on the assigned pattern, objecting to the SVM objective function. The solution for
**mi-SVM is the model w with maximal margin of instances and is defined as Formula (3.3)**

**w = arg min**

*{y**i**}*

min

**w,b,ξ**

1

2**∥w∥**^{2}*+ C*∑

*i*

*ξ*_{i}

s.t.*∀i :y**i*(**⟨w, x***i**⟩ + b) ≥ 1 − ξ**i**, ξ*_{i}*≥ 0, y**i* *∈ {−1, 1}*

(3.3)

**MI-SVM**

MI-SVM aims at maximizing bag margin. Each positive bags are represented by an
in-stance (witness) in the bag and all negative bags are expanded as negative inin-stances. The
best witness in positive bags are selected by optimizing the SVM objective function. The
*solution of MI-SVM is defined as Formula (3.4), where s is a selector to decide the *
wit-ness.

**w = arg min**

*s*

min

**w,b,ξ**

1

2**∥w∥**^{2}*+ C*∑

*I*

*ξ*_{I}*,*

s.t. *∀I :Y**I* =**−1 ∧ −⟨w, x****i***⟩ − b ≥ 1 − ξ**I**,∀i ∈ I*
*or Y** _{I}* = 1

**∧ ⟨w, x***s(I)*

*⟩ + b ≥ 1 − ξ*

*I*

*, and ξ*

_{I}*≥ 0.*

(3.4)

**3.5.5** **Multiple Instance Learning for Sparse Positive Bags**

The series of sparse positive MIL algorithms deal with the data where few positive in-stances are in positive bags. Bunescu[9] proposed three scenarios of sparse positive MIL, which are explained as following:

**Sparse MIL**

Sparse MIL (sMIL) modifies the constraint of SIL. In SIL, every instance in negative bags is regarded as negative, and the same as positive bags. But sMIL assumes that few instances in positive bags are really positive, so it favors the situation that few positive instances exist in positive bags. sMIL also models for large bags. It looses the constraint when bag size is large because it is not easy to find a positive instance for a sparse positive bag. sMIL is equivalent to SIL when there is only one instance in the positive bag.

**Sparse Transductive MIL**

The transductive SVM modifies the standard SVM to a constrained version, where the decision boundary is assumed as far from the unlabeled data as possible. In the problem of MIL, instances in positive bags could be viewed as unlabeled instances since the as-sumption “at least one instance in the positive bag is positive” indicates that the labels in positive bags are unsure. Sparse Transductive MIL (stMIL) replaces the original SVM with transductive SVM.

**Sparse Balanced MIL**

Sparse balanced MIL (sbMIL) takes advantages of SIL and sMIL. The former favors on
rich positive bag while the latter favors on sparse positive bag. sbMIL adopts the parameter
*η to decide the percentage of positive instances in positive bags, and then trains a decision*
function as sMIL with the original bags. And the instances in positive bags are adjusted
*with the distribution decided by η. Given the result predicted by sbMIL and ranked by the*
*score, the top (η× positive bag size) instances in a positive bag are regarded as positive*
and the rest in the bag are negative. The rearranged bags are used for training the final
decision function with SIL and for predicting the final results.