Active Learning by Learning

(1)

Active Learning by Learning

Wei-Ning Hsu

Department of Electrical Engineering, National Taiwan University

[email protected]

Hsuan-Tien Lin

Department of Computer Science and Information Engineering, National Taiwan University

[email protected]

Abstract

Pool-based active learning is an important technique that helps reduce labeling efforts within a pool of unlabeled instances. Currently, most pool-based active learning strategies are constructed based on some human-designed philosophy; that is, they reflect what human beings assume to be “good labeling questions.”

However, while such human-designed philosophies can be useful on specific data sets, it is often difficult to es- tablish the theoretical connection of those philosophies to the true learning performance of interest. In addition, given that a single human-designed philosophy is un- likely to work on all scenarios, choosing and blending those strategies under different scenarios is an important but challenging practical task. This paper tackles this task by letting the machines adaptively “learn” from the performance of a set of given strategies on a particular data set. More specifically, we design a learning algorithm that connects active learning with the well-known multi-armed bandit problem. Further, we postulate that, given an appropriate choice for the multi-armed bandit learner, it is possible to estimate the performance of different strategies on the fly. Extensive empirical studies of the resulting ALBLalgorithm confirm that it performs better than state-of-the-art strategies and a lead- ing blending algorithm for active learning, all of which are based on human-designed philosophy.

1 Introduction

Active learning is a machine learning setup that enables machines to cleverly “ask questions” to reduce the amount of labeling efforts (Settles 2010). The vast majority of research work on active learning is focused on transforming the human philosophy of asking questions into programmable strategies (Tong and Koller 2002; Donmez and Carbonell 2008; Huang, Jin, and Zhou 2010). When the philosophy happens to match the characteristics of the data set on hand, the corresponding strategy can result in promising practical performance. However, there are many different philosophies behind different “human-asked” questions, and no single philosophy is likely to satisfy the characteristics of every data set. For any given data set, properly choosing the strategies is thus an important practical task.

Consider the scenario when we were children. We were not commanded to ask questions based on a single philosophy. Instead, it is our nature to experiment with several different (perhaps given) philosophies, and gradually de- termine the advantages and disadvantages of each philosophy in different situations by evaluating our learning performance along the way. That is, we learn our own powerful strategy as we grow, rather than merely implementing one fixed strategy. In other words, we conduct active learning by learning, as opposed to active learning by acting.

In this paper, we study how the machines can likewise conduct active learning by learning, instead of merely acting with a single human-designed strategy. We consider a practical task that mimics our childhood learning scenario:

letting the machine learn to adaptively choose among some existing strategies based on their estimated contributions to the performance measure of interest. We connect the task to the multi-armed bandit problem for adaptive decision making (Beygelzimer et al. 2011). Consequently, we propose a novel approach for active learning that involves modifying a state-of-the-art method to solve the problem, and deriving a reward scheme that closely relates to the performance measure of interest. The proposed approach, shorthanded ALBL

for active learning by learning, essentially results in a clever probabilistic blending of the strategies subject to their time- varying performance on the given data set, and echoes adaptive blending approaches for other machine learning problems (Jacobs et al. 1991).

We also empirically compare ALBLwith several human- designed strategies, and demonstrate that ALBL is indeed able to use the derived reward to adaptively choose the best strategy and therefore achieve the best performance. Fur- thermore, experimental results show that ALBLis not only significantly better than a naive blending approach, but is also often better than the state-of-the-art adaptive blending approach COMB(Baram, El-Yaniv, and Luz 2004), which is based on a human-designed criterion during blending. The results indicate that ALBLis a superior choice for blending human-designed strategies adaptively.

The remainder of this paper is organized as follows. Sec- tion 2 outlines the active learning problem and reviews related work. Section 3 presents and illustrates the proposed ALBL approach. We discuss experimental results in Sec- tion 4 and conclude in Section 5.

(2)

2 Related Work

Active learning can be divided into two categories: stream- based and pool-based. In stream-based active learning, each instance is drawn from some distribution in a streaming manner and the learner has to decide immediately whether to query the label of this instance or not. Although their data access is more restricted, stream-based active learning algorithms (Cohn, Atlas, and Ladner 1994; Chu et al. 2011;

Beygelzimer, Dasgupta, and Langford 2009) generally come with more solid theoretical performance guarantees.

Pool-based active learning (Settles 2010) is a more realistic setup that allows more flexible access to data and will be the focus of this paper. In pool-based active learning problems, a learner is presented with an unlabeled pool and a labeled pool in the begin- ning. We denote the whole data pool of n instances by D = {x₁, x₂, . . . , x_n_u, (x_n_u₊₁, y_n_u₊₁), . . . , (x_n, y_n)}, where the input instances xi ∈ R^dand yiis the label of xi. The pool D is the union of the unlabeled pool Duthat contains the first nu instances, and the labeled pool Dl of the other n − nu instances along with their labels. The initial D is assumed to be generated i.i.d. from some distribution P (x, y) before the labels yiare hidden to form Du. Note that in general, we can only access a small Dlinitially, while the unlabeled pool Duis relatively large.

Given the initial D_l, the learner trains a classifier f₀. Dur- ing the iterations t = 1, . . . , T , by considering Dl, Du, and ft−1, the learner selects one instance xj ∈ Duto query its label. This instance is then moved to the labeled pool, and the learner can train a new classifier ftwith the updated Dl. With a small query budget T , the goal of active learning is to maximize the average test accuracy of those ft, where the test accuracy can be defined from a separate test set that is also sampled from P (x, y) like D.

Existing works on pool-based active learning predomi- nantly focus on establishing reasonable criteria for selecting which instance to label. One popular criterion is called uncertainty sampling (Lewis and Gale 1994), which queries the instance that is most uncertain to the classifier ft−1. For example, Tong and Koller (2002) propose to query the instance closest to the decision boundary of the SVM- trained f_t−1(Vapnik 1998). Uncertainty sampling assumes that ft−1only needs fine-tuning around the boundary, and thus can be less satisfactory if ft−1is not good enough.

Representative sampling resolves the caveats of uncertainty sampling by taking both uncertainty and representativeness of an instance into account. Researchers have proposed a wide variety of measurements for the representativeness. For example, Nguyen and Smeulders (2004) and Don- mez, Carbonell, and Bennett (2007) claim that an instance around the boundary is more representative if it resides in a denser neighborhood, and propose a density-weighted criterion. Another route for measuring representativeness is through clustering. The PSDS approach (Donmez and Car- bonell 2008) establishes a distance function through clustering to estimate the representativeness of instances. On the other hand, Dasgupta and Hsu (2008) propose a more so- phisticated hierarchical clustering view, and use the cluster information to calculate representativeness. Some another

work like the QUIREapproach (Huang, Jin, and Zhou 2010) measures the representativeness by estimating the possible label-assignments for unlabeled instances.

However, there are usually no strong connections between the human-designed criterion and the performance measure of interest. In addition, what a human believes to be good questions may not work well on every data set and every situation. This deficiency hints the need for choosing from several different algorithms in a data-dependent and adaptive manner. One existing solution is called COMB(Baram, El-Yaniv, and Luz 2004), which will be further discussed later in Section 3.3 after we introduce our solution.

3 The Proposed Approach

Our solution is a novel approach called Active Learning by Learning (ALBL). The approach solves the task of choosing from a candidate set of existing algorithms adaptively based on their estimated contributions to the learning performance on a given data set. Our design is based on a well- known adaptive learning problem called the multi-armed bandit problem (Robbins 1985; Vermorel and Mohri 2005).

Next, we introduce the multi-armed bandit problem and connect it to the task above.

The multi-armed bandit problem simulates what a gambler would do in a casino. Assume that the gambler is given K bandit machines and a budget of T iterations. The gambler is then asked to sequentially decide which machine to pull in each iteration t = 1, . . . , T . On being pulled, the bandit machine randomly provides a reward from a machine-specific distribution unknown to the gambler. The goal of the gambler is to maximize the total rewards earned through the sequence of decisions. To earn the most rewards, the gambler typically has to consider the trade-off between exploitation (choosing the “luckier” machines) and explo- ration (checking which machines are the “lucky” ones).

Our key idea is to draw an analogy between our task and the multi-armed bandit problem. Since we hope to ex- plore the performance of existing algorithms while exploit- ing the one with the best performance, it is intuitive to make each algorithm represent one bandit machine in the multi- armed bandit problem. The analogy faces two immediate difficulties: how to identify an appropriate multi-armed bandit method to solve the problem, and how to design a reward scheme that connects the goal of active learning to the goal of the multi-armed bandit problem.

Next, we resolve the two difficulties and explain our proposed approach in detail. Then, we introduce one state-of- the-art approach that not only works for the task but also is based on the multi-armed bandit problem, and discuss our key differences to the state-of-the-art approach.

3.1 Choice of Multi-armed Bandit Method

Our first task is to identify an appropriate multi-armed bandit method for ALBL. We solve the task by looking at the characteristics of our assumed rewards, which are associated with the learning performance. First, it is intuitive that the rewards are not independent random variables across the iterations, because the learning performance generally grows

(3)

as Dlbecomes larger. Second, the contributions to the learning performance can be time-varying because different algorithms may perform differently in different iterations (Don- mez, Carbonell, and Bennett 2007). Thus, making statistical assumptions about the rewards may be difficult.

The scenario above matches the so-called adversarial setting in the multi-armed bandit problem (Auer et al.

2002). One state-of-the-art method that comes with a strong theoretical guarantee for the adversarial setting is called EXP4.P (Beygelzimer et al. 2011), which is an improve- ment on an earlier EXP4 (Auer et al. 2002) method. We thus consider EXP4.P as the core solver for our proposed ALBL

approach. Both EXP4 and EXP4.P define the concept of experts, which can be viewed as soft mixtures of active learning algorithms in our analogy. For simplicity, in this paper, we only consider special experts that correspond to single algorithms instead of soft mixtures.

Next, let us look at the skeleton of ALBLwith EXP4.P as the core solver. EXP4.P adaptively maintains a weight vector w(t) in iteration t, where the k-th component wk(t) is the non-negative weight of the k-th expert that simply corresponds to the k-th active learning algorithm. The weight vector w(t) is then scaled to a probability vector p(t) ∈ [pmin, 1]^Kwith some parameter pmin > 0. EXP4.P randomly chooses an expert (active learning algorithm in ALBL) based on p(t), and obtains the reward r of the choice.

Without loss of generality, assuming that the k-th algorithm ak is chosen by EXP4.P in ALBL. Then, the query request of ak should be followed to query from Du. To ac- commodate the possibility that ak would want to make a probabilistic query, we introduce the query vector ψ^k(t) ∈ [0, 1]ⁿ^u, where its j-th component ψ_j^k(t) indicates the pref- erence of the k-th algorithm on querying the label of x_j ∈ Du in iteration t. The query vector should represent a probability distribution of querying from Du; that is, Pnu

j=1ψ_j^k(t) = 1. Deterministic active learning algorithms could simply return a degenerate query vector that contains a single 1 on its most preferred instance, and 0 elsewhere.

There are two probabilistic decision steps above. First, EXP4.P uses p(t) to choose an active learning algorithm, and then, ALBL takes ψ^k(t) to query the label of some x_∗∈ Du. The two steps can be combined to directly sample x_∗∈ Dubased on qj(t) =PK

k=1p_k(t)ψ_j^k(t), the probability of querying the j-th instance in the t-th iteration.

Note that different algorithms ak may actually suggest querying the same instance. Thus, following one algorithm in ALBLis virtually akin to following the other algorithms that make the same suggestion. In our analogy, the situation would correspond to getting the rewards from multiple bandit machines at the same time, which is something that EXP4.P does not consider. Thus, the original EXP4.P only updates the wk(t) on the chosen expert k with a re-scaled reward _p^r

k(t). Considering the special situation above, we modify EXP4.P and take ^rψ_q^k^∗^(t)

∗(t) to update the wk(t) on all the k algorithms that make the same suggestion on querying x_∗. When only one algorithm suggests x_∗, our update formula is equivalent to the one in EXP4.P.

3.2 Choice of Reward Function

After modifying EXP.4 as the core solver within the proposed ALBL approach, the remaining task is to design a proper reward function. The ideal reward function shall be the test accuracy of ft, because the cumulative reward that EXP.4 targets at would then correspond to the average test accuracy achieved during the iterations of active learning.

However, the test accuracy is impossible to obtain in the real world because a test set is generally not available for active learning due to the costliness of labeling.

Another possibility is the training accuracy of f_t. How- ever, training accuracy may not be the best choice for two reasons. First, it suffers from the inevitable training bias when selecting the best ftbased on the labeled data. Second, it suffers from the sampling bias when using active learning to strategically query the unlabeled instances.

Because ALBL samples from the unlabeled pool prob- abilistically based on the values of qj(t), it is actually possible to correct the sampling bias using those values. One correction technique, called importance weighting, was originally designed for stream-based active learning (Beygelzimer, Dasgupta, and Langford 2009). The technique is also utilized in a pool-based active learning algorithm (Ganti and Gray 2012) to select a proper ft. Here, we extend the technique for a different purpose: providing a proper reward function, called IMPORTANCE-WEIGHTED- ACCURACY, for the modified EXP.4 within ALBL. In particular, we apply established results for estimating the expected loss with importance weighting (Ganti and Gray 2012) on the 0/1 loss, which is simply the negative accuracy.

To understand the key idea within the IMPORTANCE- WEIGHTED-ACCURACYtechnique, let us first assume that the data pool D is fully labeled and each example in D is generated i.i.d. from some distribution that will also be used for testing. Then, it is well-known that for a fixed classifier f , the average accuracy _n¹Pn

i=1Jyi = f (xi)K on D is an unbiased estimator of the test accuracy of f , where i is used to index instances in the entire pool.

The average accuracy requires all examples in D to be labeled. The IMPORTANCE-WEIGHTED-ACCURACY technique utilizes sampling to form another unbiased estimator that does not need all the yi(Ganti and Gray 2012). Attach a probability value q_i> 0 for each example (xi, y_i) ∈ D, take those values to sample one (x∗, y∗), and use binary random variables si∈ {0, 1} to denote the outcome of the sampling.

Then, for each (xi, yi), let ci =Jyi = f (xi)K, and the expected value of sic_i

q_i over the sampling process is simply ci. Thus, _n¹Pn

i=1s_i^c_qⁱ

i = _n¹^c_q^∗

∗ is also an unbiased estimator of the test accuracy of f . That is, the accuracy c∗of the sampled example can be re-weighted by _nq¹

∗ to form a simple unbiased estimator of the test accuracy of f .

Recall that the proposed ALBLeffectively takes qj(t) to sample one instance from Du, which does not fully match the discussions above. In particular, sampling from only Du

means instances (xi, y_i) ∈ D_l are attached with qi = 0.

Thus, Ganti and Gray (2012) propose to allow re-querying labeled examples with a non-zero probability. Similarly, we design ALBL by incorporating a special algorithm (bandit

(4)

machine) RANDOMthat randomly selects one instance from the entire data pool. The design strengthens ALBL in two aspects. First, no modification of other active learning algorithms is needed, making it possible to re-use existing algorithms easily. Second, RANDOM serves as na¨ıve sampling strategy that is sometimes competitive (see Section 4) to active learning algorithms. Incorporating RANDOMpro- vides ALBLwith an alternative when other human-designed strategies fail. Note that RANDOM is sufficient for serving the need, but not necessary. One interesting future direction is to study other possibilities than RANDOM.

Because instances in Dl can now be re-queried, we now assume ψ^k(t) to be of length n rather than nu. Assume that instance it is queried in iteration t of ALBL, and let Wt = (qi_t(t))⁻¹. For any fixed classifier f , define the IMPORTANCE-WEIGHTED-ACCURACY (IW-ACC) after τ iterations to be

IW-ACC(f, τ ) = 1 nT

τ

X

t=1

WtJyⁱt = f (xi_t)K.

Then, the following theorem shows that IW-ACC(f, τ ) is an unbiased estimator of the test accuracy of f if (xi, y_i) are i.i.d.-generated from the test distribution.

Theorem 1. For any τ , E [IW-A^CC(f, τ )] =_n¹Pn i=1Jyⁱ= f (x_i)K, where the expectation is taken over the randomness of sampling independently in iteration1, 2, . . ., τ .

Proof. The theorem can be proved by averaging the simple estimator obtained from each iteration, and is a special case of Theorem 1 made by Ganti and Gray (2012).

The proposed ALBL simply takes IW-ACC(ft, t) as the reward in the t-th iteration to evaluate how much the chosen algorithm ak helps getting a better ft. Combining the ideas of modifying EXP4.P, incorporating RANDOM, and taking IW-ACCas the reward, we list the full ALBLapproach in Algorithm 1. The probabilistic nature of EXP4.P and the use of RANDOMallows the reward to be an unbiased estimator of the test performance, making ALBLtruly by learning—

connected with the performance measure of interest.

3.3 A Related Blending Approach

COMB(Baram, El-Yaniv, and Luz 2004) is a state-of-the-art adaptive blending approach that applies EXP4 to solve the task of adaptively choosing active learning. At first glance, ALBLseems similar to COMBin applying multi-armed bandit methods for the task. A closer look reveals two key differences, as discussed below.

Whereas ALBLtakes the candidate active learning algorithms as the bandit machines, COMBdraws an analogy that takes the unlabeled examples as the bandit machines instead.

As a consequence, COMBhas to choose from a large and varying numbers of bandit machines, and also has to cope with the restriction that each bandit machine can only be pulled once. The properties contradict the original settings of EXP4, which considers a moderate and fixed number of bandit machines, each of which can be pulled for multiple times. Thus, COMBrelies on quite a few heuristic modifica- tions of the original EXP4 to work properly.

Further, the COMBapproach takes a human-designed criterion called CLASSIFICATIONENTROPYMAXIMIZATION

(CEM) as the reward. CEM is defined as the entropy of ft-predicted labels in Du. While some empirical evidence shows that CEM can sometimes match the test accuracy, there is no formal connection between CEM and the test accuracy. The ALBLapproach, on the other hand, takes an unbiased estimator for the test accuracy as the reward, which is directly related to the performance measure of interest and can hence be called by learning.

4 Experiments

We incorporate four algorithms within our proposed ALBL

approach. The first algorithm is called RANDOM, which was previously introduced in Section 3.2. The other three are existing active learning algorithms introduced in Section 2:

UNCERTAIN (Tong and Koller 2002), PSDS (Donmez and Carbonell 2008), and QUIRE(Huang, Jin, and Zhou 2010).

Each of the algorithms covers a different design philosophy used by humans in active learning. In addition, as we shall show next, each algorithm performs strongly on some data sets but can be relatively worse on the others. That is, no algorithm is an overall winner, which suggests that choosing and blending the algorithms subject to different data sets is important. We take SVM (Vapnik 1998) as the underlying classifier and use LibSVM (Chang and Lin 2011) with all the default parameters to train the classifier.

We first compare ALBLwith the four algorithms it incor- porates. Next, we examine whether ALBLcan compete with a na¨ıve blending approach that uses a fixed ratio. Finally, we demonstrate the benefits of using the unbiased estimator in ALBL by comparing it with two related approaches:

the state-of-the-art blending approach COMB (Baram, El- Yaniv, and Luz 2004) and a modified version of ALBLcalled ALBL-TRAINthat takes the training accuracy as the reward.

We take six real-world data sets, liver, sonar, vehicle, breast, diabetes, heart) from the UCI Repository (Bache and Lichman 2013). For each data set, we reserve 80% of the instances as the training set, and retain the other 20% as the test set to evaluate the test accuracy (see Section 2). Then, from the training set, we randomly select one instance of each class as the initial labeled pool Dl. Each experiment is averaged over ten runs.

4.1 ALBLversus Underlying Algorithms

Figure 1 shows comparison between ALBLwith the underlying algorithms on test accuracy, which confirms our state- ment that no single algorithm can provide consistently superior performance. For instance, QUIRE(the purple curve) performs strongly on diabetes; PSDS (the blue curve) is promising on sonar; UNCERTAIN (the green curve) domi- nates on vehicle and liver.

From Figure 1, we see that ALBLis usually close to the best curves of the four underlying algorithms, except in liver which harder to learn (accuracy close to 50%). In Table 1, we further compare ALBLwith the four algorithms with a two-sample t-test at 95% significance level when querying different percentage of instances from the unlabeled pool.

(5)

Algorithm 1 ACTIVELEARNING BYLEARNING

Input: D = (Du, Dl): data pool; T : query budget; A = {a1, . . . , aK}: active learning algorithms, one of which is RANDOM

Initialize:

1: set t = 1, budget used = 0 2: while budget used < T do

3: run EXP4.P for one iteration and obtain the choice vector p(t)

4: for all xiin D, calculate qi(t) using p(t) from EXP4.P and ψ^k(t) from all the ak’s 5: sample an instance xit based on qi(t), and record Wt= (q_i_t(t))⁻¹

6: if xi_t ∈ Du(i.e., has not been queried) then

7: query y_i_t, move (x_i_t, y_i_t) from Duto D_l, and train a new classifier f_twith the new D_l 8: budget used = budget used + 1

9: else

10: f_t= f_t−1because Dlis not changed 11: end if

12: calculate reward r = IW-ACC(f_t, t)

13: feed r to a modified EXP4.P that updates the weights of all the algorithms (experts) that suggest xi_t

14: t = t + 1 15: end while

Table 1: Win/tie/loss counts of ALBLversus underlying algorithms based on a two-sample t-test

rank percentage of queried instances

5% 10% 15% 20% 30% 40% 50% total

1st 1/5/0 1/5/0 1/4/1 0/5/1 0/6/0 0/6/0 0/6/0 3/37/2 2nd 2/4/0 1/5/0 1/5/0 0/5/1 1/5/0 1/5/0 0/6/0 6/35/1 3rd 2/4/0 1/5/0 1/5/0 1/5/0 2/4/0 1/5/0 2/4/0 10/32/0 4th 4/2/0 4/2/0 3/3/0 2/4/0 4/2/0 4/2/0 4/2/0 25/17/0 total 9/15/0 7/17/0 6/17/1 3/19/2 7/17/0 6/18/0 6/18/0 44/121/3

The four algorithms are ranked in terms of their mean accuracy on the data set under the particular percentage of queried instances. The results demonstrate that ALBLoften yields comparable performance with the better of the four algorithms and can sometimes perform even better. Note that the comparable performance to the better algorithms readily demonstrates that ALBL can solve the challenging task of making reasonable choices from several different algorithms in a data-dependent and adaptive manner.

4.2 ALBLversus Fixed Combination

One question that may be asked is whether it is necessary to use dynamic sampling weights, p(t), such as used by ALBL. To answer the question, we compare ALBL with a na¨ıve blending approach, named FIXEDCOMB, that takes fixed sampling weights. The performance of ALBLwas compared to that of FIXEDCOMBwhen incorporating two active learning algorithms, one of which reaches the best performance and the other reaches the worst performance on each data set. Further, we consider sampling weight ratios: 10:0, 8:2, 6:4, 5:5, 4:6, 2:8, and 0:10 in FIXEDCOMB. Owing to space limitations and readability, only selected curves for two data sets, breast and diabetes are shown. The two underlying algorithms are UNCERTAINand QUIREfor breast, and PSDS

and UNCERTAIN for diabetes. Experiments on other data sets have shown similar results.

Figure 2 reveals two drawbacks of FIXEDCOMB. First, similar to the difficulty of choosing the underlying al-

5 10 15 20 25 30 35 40 45 50 55 60 0.7

0.75 0.8 0.85 0.9 0.95

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

(a) breast

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75

Accuracy ALBL

(b) diabetes

5 10 15 20 25 30 35 40 45 50 55 60 0.55

0.6 0.65 0.7 0.75 0.8

Accuracy ALBL

(c) heart

5 10 15 20 25 30 35 40 45 50 55 60 0.46

0.48 0.5 0.52 0.54 0.56 0.58 0.6 0.62

Accuracy ALBL

(d) liver

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8

Accuracy ALBL

(e) sonar

5 10 15 20 25 30 35 40 45 50 55 60 0.55

0.6 0.65 0.7 0.75 0.8 0.85 0.9

Accuracy ALBL

(f) vehicle

Figure 1: Test accuracy of ALBLand underlying algorithms

gorithms, deciding the best weight ratio beforehand is a very challenging endeavor. For instance, on breast, the best weight ratio is 6:4 for querying 10% of the instances, whereas on diabetes, the best weight ratio is 4:6. The second drawback is that FIXEDCOMBcannot capture the time- varying behavior of the underlying algorithms. For instance,

(6)

5 10 15 20 25 30 35 40 45 50 55 0.96

0.965 0.97 0.975

Accuracy ALBL

10−0 6−4 5−5 0−10

(a) breast

5 10 15 20 25 30 35 40 45 50 55 0.66

0.68 0.7 0.72 0.74 0.76

Accuracy ALBL

10−0 6−4 4−6 0−10

(b) diabetes Figure 2: Test accuracy of ALBLversus FIXEDCOMB

on breast, weight ratio 0:10 is the least favored in the begin- ning, but surpasses the weight ratio 5:5 in the end. ALBLre- solves the drawbacks by dynamically adjusting the weights towards a better ratio based on the estimated learning performance of the algorithms. Thus, ALBLachieves competitive or even superior performance to the best ratio in the FIXED- COMBfamily. The results justify adoption of EXP4.P to dynamically decide the sampling weights.

4.3 ALBLversus Two Adaptive Approach

Next, we compare ALBLwith COMB, another blending approach based on a modified EXP4 algorithm and a human- designed reward function called CEM. To make the comparison more fair, we show the results on a sibling version of COMBbased on EXP4.P (as in ALBL) coupled with CEM.

Some side experiments show that the sibling version performs very similarly to the original COMBapproach. In addition, as a baseline adaptive active learning approach, we also include in the comparison a modified version of ALBL, called ALBL-TRAIN, that takes the training accuracy of the classifier as the reward.

Figure 3 shows the comparison between ALBL, COMB, and ALBL-TRAIN on test accuracy. The three algorithm sometimes reach comparable performance, such as on diabetes. On most of the other data sets, ALBLachieves superior performance to those of the COMBand ALBL-TRAIN. We further analyze the superior performance by evaluating IW-ACC, CEM, the training accuracy, and the true test accuracy at each iteration of ALBL, and depict two representative results in Figure 4. For diabetes, all of the three estima- tors are quite close in tracing the test accuracy in Figure 4(a), which explains the comparable performance in Figure 3(b).

On the other hand, for heart in Figure 4(b), CEM is rather inaccurate in estimating the test accuracy, and the training accuracy overestimates the test accuracy when there are only a few instances. The inaccurate estimations results in worse performance of COMBand ALBL-TRAIN than ALBL. The results confirm the benefits of using a sound estimator of the learning performance (IW-ACC) as the reward instead of a human-designed criterion such as CEM.

5 Conclusion

We propose a pool-based active learning approach ALBL

that allows the machines to conduct active learning by learning. ALBLadaptively and intelligently chooses among vari-

5 10 15 20 25 30 35 40 45 50 55 60 0.88

0.9 0.92 0.94 0.96 0.98

Accuracy

ALBL COMB ALBL−Train

(a) breast

5 10 15 20 25 30 35 40 45 50 55 60 0.6

0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76

Accuracy

(b) diabetes

5 10 15 20 25 30 35 40 45 50 55 60 0.6

0.65 0.7 0.75 0.8

Accuracy

(c) heart

5 10 15 20 25 30 35 40 45 50 55 60 0.35

0.4 0.45 0.5 0.55 0.6

Accuracy

(d) liver

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8

Accuracy

(e) sonar

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

Accuracy

(f) vehicle

Figure 3: Test accuracy of ALBL, COMB, and ALBL-TRAIN

0 5 10 15 20 25 30 35 40 45 50 55 60 0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Accuracy

True ALBL COMB ALBL−Train

(a) diabetes

0 5 10 15 20 25 30 35 40 45 50 55 60 0.4

0.5 0.6 0.7 0.8 0.9 1

Accuracy

True ALBL COMB ALBL−Train

(b) heart Figure 4: Different estimations of true test accuracy

ous existing active learning strategies by using their learning performance as feedback. We utilize the famous EXP4.Pal- gorithm from the multi-armed bandit problem for the adaptive choice, and estimate the learning performances with IMPORTANCE-WEIGHTED-ACCURACY. Extensive empirical studies lead to three key findings. First, ALBLis effective in making intelligent choices, and is often comparable to or even superior to the best of the existing strategies. Second, ALBLis effective in making adaptive choices, and is often superior to na¨ıve blending approaches that randomly choose the strategies based on a fixed ratio. Third, ALBLis effective in utilizing the learning performance, and is often superior to the human-criterion-based blending approach COMB. These findings indicate that the proposed ALBLis a favorable and promising approach in practice.

(7)

6 Acknowledgments

We thank the anonymous reviewers and the members of the NTU Computational Learning Lab for valuable suggestions.

This work is partially supported by the Ministry of Science and Technology of Taiwan via the grant MOST 103-2221- E-002-148-MY3.

References

Auer, P.; Cesa-Bianchi, N.; Freund, Y.; and Schapire, R. E.

2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing32(1):48–77.

Bache, K., and Lichman, M. 2013. UCI machine learning repository.

Baram, Y.; El-Yaniv, R.; and Luz, K. 2004. Online choice of active learning algorithms. The Journal of Machine Learn- ing Research5:255–291.

Beygelzimer, A.; Langford, J.; Li, L.; Reyzin, L.; and Schapire, R. E. 2011. Contextual bandit algorithms with su- pervised learning guarantees. In Proceedings of the 14th In- ternational Conference on Artificial Intelligence and Statis- tics, 19–26.

Beygelzimer, A.; Dasgupta, S.; and Langford, J. 2009. Im- portance weighted active learning. In Proceedings of the 26th International Conference on Machine Learning, 49–56.

Chang, C.-C., and Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology2:27:1–27:27.

Chu, W.; Zinkevich, M.; Li, L.; Thomas, A.; and Tseng, B.

2011. Unbiased online active learning in data streams. In Proceedings of the 17th ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining, 195–

203.

Cohn, D.; Atlas, L.; and Ladner, R. 1994. Improving gener- alization with active learning. Machine learning 15(2):201–

221.

Dasgupta, S., and Hsu, D. 2008. Hierarchical sampling for active learning. In Proceedings of the 25th International Conference on Machine learning, 208–215.

Donmez, P., and Carbonell, J. G. 2008. Paired-sampling in density-sensitive active learning. In Proceedings of the In- ternational Symposium on Artificial Intelligence and Math- ematics.

Donmez, P.; Carbonell, J. G.; and Bennett, P. N. 2007. Dual strategy active learning. In Proceedings of the 18th Euro- pean Conference on Machine Learning, 116–127.

Ganti, R., and Gray, A. 2012. UPAL: Unbiased pool based active learning. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 422–

431.

Huang, S.-J.; Jin, R.; and Zhou, Z.-H. 2010. Active learning by querying informative and representative examples. In Advances in Neural Information Processing Systems, vol- ume 23, 892–900.

Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; and Hinton, G. E.

1991. Adaptive mixtures of local experts. Neural computa- tion3(1):79–87.

Lewis, D. D., and Gale, W. A. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval, 3–12.

Nguyen, H. T., and Smeulders, A. 2004. Active learning using pre-clustering. In Proceedings of the 21st International Conference on Machine Learning, 623–630.

Robbins, H. 1985. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers. Springer.

169–177.

Settles, B. 2010. Active learning literature survey. Technical report, University of Wisconsin, Madison.

Tong, S., and Koller, D. 2002. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research2:45–66.

Vapnik, V. 1998. Statistical learning theory. Wiley.

Vermorel, J., and Mohri, M. 2005. Multi-armed bandit algorithms and empirical evaluation. In Proceedings of the 16th European Conference on Machine Learning, 437–448.