• 沒有找到結果。

Active Learning by Learning

N/A
N/A
Protected

Academic year: 2022

Share "Active Learning by Learning"

Copied!
7
0
0

加載中.... (立即查看全文)

全文

(1)

Active Learning by Learning

Wei-Ning Hsu

Department of Electrical Engineering, National Taiwan University

mhng1580@gmail.com

Hsuan-Tien Lin

Department of Computer Science and Information Engineering, National Taiwan University

htlin@csie.ntu.edu.tw

Abstract

Pool-based active learning is an important technique that helps reduce labeling efforts within a pool of unlabeled instances. Currently, most pool-based ac- tive learning strategies are constructed based on some human-designed philosophy; that is, they reflect what human beings assume to be “good labeling questions.”

However, while such human-designed philosophies can be useful on specific data sets, it is often difficult to es- tablish the theoretical connection of those philosophies to the true learning performance of interest. In addition, given that a single human-designed philosophy is un- likely to work on all scenarios, choosing and blending those strategies under different scenarios is an important but challenging practical task. This paper tackles this task by letting the machines adaptively “learn” from the performance of a set of given strategies on a particular data set. More specifically, we design a learning algo- rithm that connects active learning with the well-known multi-armed bandit problem. Further, we postulate that, given an appropriate choice for the multi-armed ban- dit learner, it is possible to estimate the performance of different strategies on the fly. Extensive empirical stud- ies of the resulting ALBLalgorithm confirm that it per- forms better than state-of-the-art strategies and a lead- ing blending algorithm for active learning, all of which are based on human-designed philosophy.

1 Introduction

Active learning is a machine learning setup that enables ma- chines to cleverly “ask questions” to reduce the amount of labeling efforts (Settles 2010). The vast majority of research work on active learning is focused on transforming the human philosophy of asking questions into programmable strategies (Tong and Koller 2002; Donmez and Carbonell 2008; Huang, Jin, and Zhou 2010). When the philosophy happens to match the characteristics of the data set on hand, the corresponding strategy can result in promising practi- cal performance. However, there are many different philoso- phies behind different “human-asked” questions, and no sin- gle philosophy is likely to satisfy the characteristics of every data set. For any given data set, properly choosing the strate- gies is thus an important practical task.

Copyright c 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Consider the scenario when we were children. We were not commanded to ask questions based on a single philos- ophy. Instead, it is our nature to experiment with several different (perhaps given) philosophies, and gradually de- termine the advantages and disadvantages of each philos- ophy in different situations by evaluating our learning per- formance along the way. That is, we learn our own powerful strategy as we grow, rather than merely implementing one fixed strategy. In other words, we conduct active learning by learning, as opposed to active learning by acting.

In this paper, we study how the machines can likewise conduct active learning by learning, instead of merely act- ing with a single human-designed strategy. We consider a practical task that mimics our childhood learning scenario:

letting the machine learn to adaptively choose among some existing strategies based on their estimated contributions to the performance measure of interest. We connect the task to the multi-armed bandit problem for adaptive decision mak- ing (Beygelzimer et al. 2011). Consequently, we propose a novel approach for active learning that involves modifying a state-of-the-art method to solve the problem, and deriving a reward scheme that closely relates to the performance mea- sure of interest. The proposed approach, shorthanded ALBL

for active learning by learning, essentially results in a clever probabilistic blending of the strategies subject to their time- varying performance on the given data set, and echoes adap- tive blending approaches for other machine learning prob- lems (Jacobs et al. 1991).

We also empirically compare ALBLwith several human- designed strategies, and demonstrate that ALBL is indeed able to use the derived reward to adaptively choose the best strategy and therefore achieve the best performance. Fur- thermore, experimental results show that ALBLis not only significantly better than a naive blending approach, but is also often better than the state-of-the-art adaptive blending approach COMB(Baram, El-Yaniv, and Luz 2004), which is based on a human-designed criterion during blending. The results indicate that ALBLis a superior choice for blending human-designed strategies adaptively.

The remainder of this paper is organized as follows. Sec- tion 2 outlines the active learning problem and reviews re- lated work. Section 3 presents and illustrates the proposed ALBL approach. We discuss experimental results in Sec- tion 4 and conclude in Section 5.

(2)

2 Related Work

Active learning can be divided into two categories: stream- based and pool-based. In stream-based active learning, each instance is drawn from some distribution in a streaming manner and the learner has to decide immediately whether to query the label of this instance or not. Although their data access is more restricted, stream-based active learning al- gorithms (Cohn, Atlas, and Ladner 1994; Chu et al. 2011;

Beygelzimer, Dasgupta, and Langford 2009) generally come with more solid theoretical performance guarantees.

Pool-based active learning (Settles 2010) is a more realistic setup that allows more flexible access to data and will be the focus of this paper. In pool-based active learning problems, a learner is presented with an unlabeled pool and a labeled pool in the begin- ning. We denote the whole data pool of n instances by D = {x1, x2, . . . , xnu, (xnu+1, ynu+1), . . . , (xn, yn)}, where the input instances xi ∈ Rdand yiis the label of xi. The pool D is the union of the unlabeled pool Duthat con- tains the first nu instances, and the labeled pool Dl of the other n − nu instances along with their labels. The initial D is assumed to be generated i.i.d. from some distribution P (x, y) before the labels yiare hidden to form Du. Note that in general, we can only access a small Dlinitially, while the unlabeled pool Duis relatively large.

Given the initial Dl, the learner trains a classifier f0. Dur- ing the iterations t = 1, . . . , T , by considering Dl, Du, and ft−1, the learner selects one instance xj ∈ Duto query its label. This instance is then moved to the labeled pool, and the learner can train a new classifier ftwith the updated Dl. With a small query budget T , the goal of active learning is to maximize the average test accuracy of those ft, where the test accuracy can be defined from a separate test set that is also sampled from P (x, y) like D.

Existing works on pool-based active learning predomi- nantly focus on establishing reasonable criteria for selecting which instance to label. One popular criterion is called un- certainty sampling (Lewis and Gale 1994), which queries the instance that is most uncertain to the classifier ft−1. For example, Tong and Koller (2002) propose to query the instance closest to the decision boundary of the SVM- trained ft−1(Vapnik 1998). Uncertainty sampling assumes that ft−1only needs fine-tuning around the boundary, and thus can be less satisfactory if ft−1is not good enough.

Representative sampling resolves the caveats of uncer- tainty sampling by taking both uncertainty and representa- tiveness of an instance into account. Researchers have pro- posed a wide variety of measurements for the representative- ness. For example, Nguyen and Smeulders (2004) and Don- mez, Carbonell, and Bennett (2007) claim that an instance around the boundary is more representative if it resides in a denser neighborhood, and propose a density-weighted cri- terion. Another route for measuring representativeness is through clustering. The PSDS approach (Donmez and Car- bonell 2008) establishes a distance function through cluster- ing to estimate the representativeness of instances. On the other hand, Dasgupta and Hsu (2008) propose a more so- phisticated hierarchical clustering view, and use the cluster information to calculate representativeness. Some another

work like the QUIREapproach (Huang, Jin, and Zhou 2010) measures the representativeness by estimating the possible label-assignments for unlabeled instances.

However, there are usually no strong connections between the human-designed criterion and the performance measure of interest. In addition, what a human believes to be good questions may not work well on every data set and every situation. This deficiency hints the need for choosing from several different algorithms in a data-dependent and adap- tive manner. One existing solution is called COMB(Baram, El-Yaniv, and Luz 2004), which will be further discussed later in Section 3.3 after we introduce our solution.

3 The Proposed Approach

Our solution is a novel approach called Active Learning by Learning (ALBL). The approach solves the task of choos- ing from a candidate set of existing algorithms adaptively based on their estimated contributions to the learning per- formance on a given data set. Our design is based on a well- known adaptive learning problem called the multi-armed bandit problem (Robbins 1985; Vermorel and Mohri 2005).

Next, we introduce the multi-armed bandit problem and con- nect it to the task above.

The multi-armed bandit problem simulates what a gam- bler would do in a casino. Assume that the gambler is given K bandit machines and a budget of T iterations. The gambler is then asked to sequentially decide which machine to pull in each iteration t = 1, . . . , T . On being pulled, the bandit machine randomly provides a reward from a machine-specific distribution unknown to the gambler. The goal of the gambler is to maximize the total rewards earned through the sequence of decisions. To earn the most rewards, the gambler typically has to consider the trade-off between exploitation (choosing the “luckier” machines) and explo- ration (checking which machines are the “lucky” ones).

Our key idea is to draw an analogy between our task and the multi-armed bandit problem. Since we hope to ex- plore the performance of existing algorithms while exploit- ing the one with the best performance, it is intuitive to make each algorithm represent one bandit machine in the multi- armed bandit problem. The analogy faces two immediate difficulties: how to identify an appropriate multi-armed ban- dit method to solve the problem, and how to design a reward scheme that connects the goal of active learning to the goal of the multi-armed bandit problem.

Next, we resolve the two difficulties and explain our pro- posed approach in detail. Then, we introduce one state-of- the-art approach that not only works for the task but also is based on the multi-armed bandit problem, and discuss our key differences to the state-of-the-art approach.

3.1 Choice of Multi-armed Bandit Method

Our first task is to identify an appropriate multi-armed ban- dit method for ALBL. We solve the task by looking at the characteristics of our assumed rewards, which are associated with the learning performance. First, it is intuitive that the rewards are not independent random variables across the it- erations, because the learning performance generally grows

(3)

as Dlbecomes larger. Second, the contributions to the learn- ing performance can be time-varying because different algo- rithms may perform differently in different iterations (Don- mez, Carbonell, and Bennett 2007). Thus, making statistical assumptions about the rewards may be difficult.

The scenario above matches the so-called adversarial setting in the multi-armed bandit problem (Auer et al.

2002). One state-of-the-art method that comes with a strong theoretical guarantee for the adversarial setting is called EXP4.P (Beygelzimer et al. 2011), which is an improve- ment on an earlier EXP4 (Auer et al. 2002) method. We thus consider EXP4.P as the core solver for our proposed ALBL

approach. Both EXP4 and EXP4.P define the concept of ex- perts, which can be viewed as soft mixtures of active learn- ing algorithms in our analogy. For simplicity, in this paper, we only consider special experts that correspond to single algorithms instead of soft mixtures.

Next, let us look at the skeleton of ALBLwith EXP4.P as the core solver. EXP4.P adaptively maintains a weight vector w(t) in iteration t, where the k-th component wk(t) is the non-negative weight of the k-th expert that sim- ply corresponds to the k-th active learning algorithm. The weight vector w(t) is then scaled to a probability vector p(t) ∈ [pmin, 1]Kwith some parameter pmin > 0. EXP4.P randomly chooses an expert (active learning algorithm in ALBL) based on p(t), and obtains the reward r of the choice.

Without loss of generality, assuming that the k-th algo- rithm ak is chosen by EXP4.P in ALBL. Then, the query request of ak should be followed to query from Du. To ac- commodate the possibility that ak would want to make a probabilistic query, we introduce the query vector ψk(t) ∈ [0, 1]nu, where its j-th component ψjk(t) indicates the pref- erence of the k-th algorithm on querying the label of xj ∈ Du in iteration t. The query vector should repre- sent a probability distribution of querying from Du; that is, Pnu

j=1ψjk(t) = 1. Deterministic active learning algorithms could simply return a degenerate query vector that contains a single 1 on its most preferred instance, and 0 elsewhere.

There are two probabilistic decision steps above. First, EXP4.P uses p(t) to choose an active learning algorithm, and then, ALBL takes ψk(t) to query the label of some x∈ Du. The two steps can be combined to directly sample x∈ Dubased on qj(t) =PK

k=1pk(t)ψjk(t), the probabil- ity of querying the j-th instance in the t-th iteration.

Note that different algorithms ak may actually suggest querying the same instance. Thus, following one algorithm in ALBLis virtually akin to following the other algorithms that make the same suggestion. In our analogy, the situa- tion would correspond to getting the rewards from multiple bandit machines at the same time, which is something that EXP4.P does not consider. Thus, the original EXP4.P only updates the wk(t) on the chosen expert k with a re-scaled reward pr

k(t). Considering the special situation above, we modify EXP4.P and take qk(t)

(t) to update the wk(t) on all the k algorithms that make the same suggestion on query- ing x. When only one algorithm suggests x, our update formula is equivalent to the one in EXP4.P.

3.2 Choice of Reward Function

After modifying EXP.4 as the core solver within the pro- posed ALBL approach, the remaining task is to design a proper reward function. The ideal reward function shall be the test accuracy of ft, because the cumulative reward that EXP.4 targets at would then correspond to the average test accuracy achieved during the iterations of active learning.

However, the test accuracy is impossible to obtain in the real world because a test set is generally not available for active learning due to the costliness of labeling.

Another possibility is the training accuracy of ft. How- ever, training accuracy may not be the best choice for two reasons. First, it suffers from the inevitable training bias when selecting the best ftbased on the labeled data. Second, it suffers from the sampling bias when using active learning to strategically query the unlabeled instances.

Because ALBL samples from the unlabeled pool prob- abilistically based on the values of qj(t), it is actually possible to correct the sampling bias using those val- ues. One correction technique, called importance weight- ing, was originally designed for stream-based active learn- ing (Beygelzimer, Dasgupta, and Langford 2009). The tech- nique is also utilized in a pool-based active learning algo- rithm (Ganti and Gray 2012) to select a proper ft. Here, we extend the technique for a different purpose: providing a proper reward function, called IMPORTANCE-WEIGHTED- ACCURACY, for the modified EXP.4 within ALBL. In partic- ular, we apply established results for estimating the expected loss with importance weighting (Ganti and Gray 2012) on the 0/1 loss, which is simply the negative accuracy.

To understand the key idea within the IMPORTANCE- WEIGHTED-ACCURACYtechnique, let us first assume that the data pool D is fully labeled and each example in D is generated i.i.d. from some distribution that will also be used for testing. Then, it is well-known that for a fixed classi- fier f , the average accuracy n1Pn

i=1Jyi = f (xi)K on D is an unbiased estimator of the test accuracy of f , where i is used to index instances in the entire pool.

The average accuracy requires all examples in D to be labeled. The IMPORTANCE-WEIGHTED-ACCURACY tech- nique utilizes sampling to form another unbiased estimator that does not need all the yi(Ganti and Gray 2012). Attach a probability value qi> 0 for each example (xi, yi) ∈ D, take those values to sample one (x, y), and use binary random variables si∈ {0, 1} to denote the outcome of the sampling.

Then, for each (xi, yi), let ci =Jyi = f (xi)K, and the ex- pected value of sici

qi over the sampling process is simply ci. Thus, n1Pn

i=1sicqi

i = n1cq

is also an unbiased estimator of the test accuracy of f . That is, the accuracy cof the sam- pled example can be re-weighted by nq1

to form a simple unbiased estimator of the test accuracy of f .

Recall that the proposed ALBLeffectively takes qj(t) to sample one instance from Du, which does not fully match the discussions above. In particular, sampling from only Du

means instances (xi, yi) ∈ Dl are attached with qi = 0.

Thus, Ganti and Gray (2012) propose to allow re-querying labeled examples with a non-zero probability. Similarly, we design ALBL by incorporating a special algorithm (bandit

(4)

machine) RANDOMthat randomly selects one instance from the entire data pool. The design strengthens ALBL in two aspects. First, no modification of other active learning al- gorithms is needed, making it possible to re-use existing algorithms easily. Second, RANDOM serves as na¨ıve sam- pling strategy that is sometimes competitive (see Section 4) to active learning algorithms. Incorporating RANDOMpro- vides ALBLwith an alternative when other human-designed strategies fail. Note that RANDOM is sufficient for serving the need, but not necessary. One interesting future direction is to study other possibilities than RANDOM.

Because instances in Dl can now be re-queried, we now assume ψk(t) to be of length n rather than nu. Assume that instance it is queried in iteration t of ALBL, and let Wt = (qit(t))−1. For any fixed classifier f , define the IMPORTANCE-WEIGHTED-ACCURACY (IW-ACC) after τ iterations to be

IW-ACC(f, τ ) = 1 nT

τ

X

t=1

WtJyit = f (xit)K.

Then, the following theorem shows that IW-ACC(f, τ ) is an unbiased estimator of the test accuracy of f if (xi, yi) are i.i.d.-generated from the test distribution.

Theorem 1. For any τ , E [IW-ACC(f, τ )] =n1Pn i=1Jyi= f (xi)K, where the expectation is taken over the randomness of sampling independently in iteration1, 2, . . ., τ .

Proof. The theorem can be proved by averaging the simple estimator obtained from each iteration, and is a special case of Theorem 1 made by Ganti and Gray (2012).

The proposed ALBL simply takes IW-ACC(ft, t) as the reward in the t-th iteration to evaluate how much the chosen algorithm ak helps getting a better ft. Combining the ideas of modifying EXP4.P, incorporating RANDOM, and taking IW-ACCas the reward, we list the full ALBLapproach in Algorithm 1. The probabilistic nature of EXP4.P and the use of RANDOMallows the reward to be an unbiased estimator of the test performance, making ALBLtruly by learning—

connected with the performance measure of interest.

3.3 A Related Blending Approach

COMB(Baram, El-Yaniv, and Luz 2004) is a state-of-the-art adaptive blending approach that applies EXP4 to solve the task of adaptively choosing active learning. At first glance, ALBLseems similar to COMBin applying multi-armed ban- dit methods for the task. A closer look reveals two key dif- ferences, as discussed below.

Whereas ALBLtakes the candidate active learning algo- rithms as the bandit machines, COMBdraws an analogy that takes the unlabeled examples as the bandit machines instead.

As a consequence, COMBhas to choose from a large and varying numbers of bandit machines, and also has to cope with the restriction that each bandit machine can only be pulled once. The properties contradict the original settings of EXP4, which considers a moderate and fixed number of bandit machines, each of which can be pulled for multiple times. Thus, COMBrelies on quite a few heuristic modifica- tions of the original EXP4 to work properly.

Further, the COMBapproach takes a human-designed cri- terion called CLASSIFICATIONENTROPYMAXIMIZATION

(CEM) as the reward. CEM is defined as the entropy of ft-predicted labels in Du. While some empirical evidence shows that CEM can sometimes match the test accuracy, there is no formal connection between CEM and the test accuracy. The ALBLapproach, on the other hand, takes an unbiased estimator for the test accuracy as the reward, which is directly related to the performance measure of interest and can hence be called by learning.

4 Experiments

We incorporate four algorithms within our proposed ALBL

approach. The first algorithm is called RANDOM, which was previously introduced in Section 3.2. The other three are ex- isting active learning algorithms introduced in Section 2:

UNCERTAIN (Tong and Koller 2002), PSDS (Donmez and Carbonell 2008), and QUIRE(Huang, Jin, and Zhou 2010).

Each of the algorithms covers a different design philosophy used by humans in active learning. In addition, as we shall show next, each algorithm performs strongly on some data sets but can be relatively worse on the others. That is, no al- gorithm is an overall winner, which suggests that choosing and blending the algorithms subject to different data sets is important. We take SVM (Vapnik 1998) as the underlying classifier and use LibSVM (Chang and Lin 2011) with all the default parameters to train the classifier.

We first compare ALBLwith the four algorithms it incor- porates. Next, we examine whether ALBLcan compete with a na¨ıve blending approach that uses a fixed ratio. Finally, we demonstrate the benefits of using the unbiased estima- tor in ALBL by comparing it with two related approaches:

the state-of-the-art blending approach COMB (Baram, El- Yaniv, and Luz 2004) and a modified version of ALBLcalled ALBL-TRAINthat takes the training accuracy as the reward.

We take six real-world data sets, liver, sonar, vehicle, breast, diabetes, heart) from the UCI Repository (Bache and Lichman 2013). For each data set, we reserve 80% of the in- stances as the training set, and retain the other 20% as the test set to evaluate the test accuracy (see Section 2). Then, from the training set, we randomly select one instance of each class as the initial labeled pool Dl. Each experiment is averaged over ten runs.

4.1 ALBLversus Underlying Algorithms

Figure 1 shows comparison between ALBLwith the under- lying algorithms on test accuracy, which confirms our state- ment that no single algorithm can provide consistently su- perior performance. For instance, QUIRE(the purple curve) performs strongly on diabetes; PSDS (the blue curve) is promising on sonar; UNCERTAIN (the green curve) domi- nates on vehicle and liver.

From Figure 1, we see that ALBLis usually close to the best curves of the four underlying algorithms, except in liver which harder to learn (accuracy close to 50%). In Table 1, we further compare ALBLwith the four algorithms with a two-sample t-test at 95% significance level when querying different percentage of instances from the unlabeled pool.

(5)

Algorithm 1 ACTIVELEARNING BYLEARNING

Input: D = (Du, Dl): data pool; T : query budget; A = {a1, . . . , aK}: active learning algorithms, one of which is RANDOM

Initialize:

1: set t = 1, budget used = 0 2: while budget used < T do

3: run EXP4.P for one iteration and obtain the choice vector p(t)

4: for all xiin D, calculate qi(t) using p(t) from EXP4.P and ψk(t) from all the ak’s 5: sample an instance xit based on qi(t), and record Wt= (qit(t))−1

6: if xit ∈ Du(i.e., has not been queried) then

7: query yit, move (xit, yit) from Duto Dl, and train a new classifier ftwith the new Dl 8: budget used = budget used + 1

9: else

10: ft= ft−1because Dlis not changed 11: end if

12: calculate reward r = IW-ACC(ft, t)

13: feed r to a modified EXP4.P that updates the weights of all the algorithms (experts) that suggest xit

14: t = t + 1 15: end while

Table 1: Win/tie/loss counts of ALBLversus underlying al- gorithms based on a two-sample t-test

rank percentage of queried instances

5% 10% 15% 20% 30% 40% 50% total

1st 1/5/0 1/5/0 1/4/1 0/5/1 0/6/0 0/6/0 0/6/0 3/37/2 2nd 2/4/0 1/5/0 1/5/0 0/5/1 1/5/0 1/5/0 0/6/0 6/35/1 3rd 2/4/0 1/5/0 1/5/0 1/5/0 2/4/0 1/5/0 2/4/0 10/32/0 4th 4/2/0 4/2/0 3/3/0 2/4/0 4/2/0 4/2/0 4/2/0 25/17/0 total 9/15/0 7/17/0 6/17/1 3/19/2 7/17/0 6/18/0 6/18/0 44/121/3

The four algorithms are ranked in terms of their mean ac- curacy on the data set under the particular percentage of queried instances. The results demonstrate that ALBLoften yields comparable performance with the better of the four al- gorithms and can sometimes perform even better. Note that the comparable performance to the better algorithms readily demonstrates that ALBL can solve the challenging task of making reasonable choices from several different algorithms in a data-dependent and adaptive manner.

4.2 ALBLversus Fixed Combination

One question that may be asked is whether it is necessary to use dynamic sampling weights, p(t), such as used by ALBL. To answer the question, we compare ALBL with a na¨ıve blending approach, named FIXEDCOMB, that takes fixed sampling weights. The performance of ALBLwas compared to that of FIXEDCOMBwhen incorporating two active learn- ing algorithms, one of which reaches the best performance and the other reaches the worst performance on each data set. Further, we consider sampling weight ratios: 10:0, 8:2, 6:4, 5:5, 4:6, 2:8, and 0:10 in FIXEDCOMB. Owing to space limitations and readability, only selected curves for two data sets, breast and diabetes are shown. The two underlying al- gorithms are UNCERTAINand QUIREfor breast, and PSDS

and UNCERTAIN for diabetes. Experiments on other data sets have shown similar results.

Figure 2 reveals two drawbacks of FIXEDCOMB. First, similar to the difficulty of choosing the underlying al-

5 10 15 20 25 30 35 40 45 50 55 60 0.7

0.75 0.8 0.85 0.9 0.95

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

(a) breast

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

(b) diabetes

5 10 15 20 25 30 35 40 45 50 55 60 0.55

0.6 0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

(c) heart

5 10 15 20 25 30 35 40 45 50 55 60 0.46

0.48 0.5 0.52 0.54 0.56 0.58 0.6 0.62

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

(d) liver

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

(e) sonar

5 10 15 20 25 30 35 40 45 50 55 60 0.55

0.6 0.65 0.7 0.75 0.8 0.85 0.9

% of unlabelled data

Accuracy ALBL

RAND UNCERTAIN PSDS QUIRE

(f) vehicle

Figure 1: Test accuracy of ALBLand underlying algorithms

gorithms, deciding the best weight ratio beforehand is a very challenging endeavor. For instance, on breast, the best weight ratio is 6:4 for querying 10% of the instances, whereas on diabetes, the best weight ratio is 4:6. The sec- ond drawback is that FIXEDCOMBcannot capture the time- varying behavior of the underlying algorithms. For instance,

(6)

5 10 15 20 25 30 35 40 45 50 55 0.96

0.965 0.97 0.975

% of unlabelled data

Accuracy ALBL

10−0 6−4 5−5 0−10

(a) breast

5 10 15 20 25 30 35 40 45 50 55 0.66

0.68 0.7 0.72 0.74 0.76

% of unlabelled data

Accuracy ALBL

10−0 6−4 4−6 0−10

(b) diabetes Figure 2: Test accuracy of ALBLversus FIXEDCOMB

on breast, weight ratio 0:10 is the least favored in the begin- ning, but surpasses the weight ratio 5:5 in the end. ALBLre- solves the drawbacks by dynamically adjusting the weights towards a better ratio based on the estimated learning perfor- mance of the algorithms. Thus, ALBLachieves competitive or even superior performance to the best ratio in the FIXED- COMBfamily. The results justify adoption of EXP4.P to dy- namically decide the sampling weights.

4.3 ALBLversus Two Adaptive Approach

Next, we compare ALBLwith COMB, another blending ap- proach based on a modified EXP4 algorithm and a human- designed reward function called CEM. To make the compar- ison more fair, we show the results on a sibling version of COMBbased on EXP4.P (as in ALBL) coupled with CEM.

Some side experiments show that the sibling version per- forms very similarly to the original COMBapproach. In ad- dition, as a baseline adaptive active learning approach, we also include in the comparison a modified version of ALBL, called ALBL-TRAIN, that takes the training accuracy of the classifier as the reward.

Figure 3 shows the comparison between ALBL, COMB, and ALBL-TRAIN on test accuracy. The three algorithm sometimes reach comparable performance, such as on di- abetes. On most of the other data sets, ALBLachieves supe- rior performance to those of the COMBand ALBL-TRAIN. We further analyze the superior performance by evaluating IW-ACC, CEM, the training accuracy, and the true test ac- curacy at each iteration of ALBL, and depict two representa- tive results in Figure 4. For diabetes, all of the three estima- tors are quite close in tracing the test accuracy in Figure 4(a), which explains the comparable performance in Figure 3(b).

On the other hand, for heart in Figure 4(b), CEM is rather inaccurate in estimating the test accuracy, and the training accuracy overestimates the test accuracy when there are only a few instances. The inaccurate estimations results in worse performance of COMBand ALBL-TRAIN than ALBL. The results confirm the benefits of using a sound estimator of the learning performance (IW-ACC) as the reward instead of a human-designed criterion such as CEM.

5 Conclusion

We propose a pool-based active learning approach ALBL

that allows the machines to conduct active learning by learn- ing. ALBLadaptively and intelligently chooses among vari-

5 10 15 20 25 30 35 40 45 50 55 60 0.88

0.9 0.92 0.94 0.96 0.98

% of unlabelled data

Accuracy

ALBL COMB ALBL−Train

(a) breast

5 10 15 20 25 30 35 40 45 50 55 60 0.6

0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76

% of unlabelled data

Accuracy

ALBL COMB ALBL−Train

(b) diabetes

5 10 15 20 25 30 35 40 45 50 55 60 0.6

0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy

ALBL COMB ALBL−Train

(c) heart

5 10 15 20 25 30 35 40 45 50 55 60 0.35

0.4 0.45 0.5 0.55 0.6

% of unlabelled data

Accuracy

ALBL COMB ALBL−Train

(d) liver

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8

% of unlabelled data

Accuracy

ALBL COMB ALBL−Train

(e) sonar

5 10 15 20 25 30 35 40 45 50 55 60 0.5

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

% of unlabelled data

Accuracy

ALBL COMB ALBL−Train

(f) vehicle

Figure 3: Test accuracy of ALBL, COMB, and ALBL-TRAIN

0 5 10 15 20 25 30 35 40 45 50 55 60 0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

% of unlabelled data

Accuracy

True ALBL COMB ALBL−Train

(a) diabetes

0 5 10 15 20 25 30 35 40 45 50 55 60 0.4

0.5 0.6 0.7 0.8 0.9 1

% of unlabelled data

Accuracy

True ALBL COMB ALBL−Train

(b) heart Figure 4: Different estimations of true test accuracy

ous existing active learning strategies by using their learning performance as feedback. We utilize the famous EXP4.Pal- gorithm from the multi-armed bandit problem for the adap- tive choice, and estimate the learning performances with IMPORTANCE-WEIGHTED-ACCURACY. Extensive empiri- cal studies lead to three key findings. First, ALBLis effective in making intelligent choices, and is often comparable to or even superior to the best of the existing strategies. Second, ALBLis effective in making adaptive choices, and is often superior to na¨ıve blending approaches that randomly choose the strategies based on a fixed ratio. Third, ALBLis effective in utilizing the learning performance, and is often superior to the human-criterion-based blending approach COMB. These findings indicate that the proposed ALBLis a favorable and promising approach in practice.

(7)

6 Acknowledgments

We thank the anonymous reviewers and the members of the NTU Computational Learning Lab for valuable suggestions.

This work is partially supported by the Ministry of Science and Technology of Taiwan via the grant MOST 103-2221- E-002-148-MY3.

References

Auer, P.; Cesa-Bianchi, N.; Freund, Y.; and Schapire, R. E.

2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing32(1):48–77.

Bache, K., and Lichman, M. 2013. UCI machine learning repository.

Baram, Y.; El-Yaniv, R.; and Luz, K. 2004. Online choice of active learning algorithms. The Journal of Machine Learn- ing Research5:255–291.

Beygelzimer, A.; Langford, J.; Li, L.; Reyzin, L.; and Schapire, R. E. 2011. Contextual bandit algorithms with su- pervised learning guarantees. In Proceedings of the 14th In- ternational Conference on Artificial Intelligence and Statis- tics, 19–26.

Beygelzimer, A.; Dasgupta, S.; and Langford, J. 2009. Im- portance weighted active learning. In Proceedings of the 26th International Conference on Machine Learning, 49–56.

Chang, C.-C., and Lin, C.-J. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology2:27:1–27:27.

Chu, W.; Zinkevich, M.; Li, L.; Thomas, A.; and Tseng, B.

2011. Unbiased online active learning in data streams. In Proceedings of the 17th ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining, 195–

203.

Cohn, D.; Atlas, L.; and Ladner, R. 1994. Improving gener- alization with active learning. Machine learning 15(2):201–

221.

Dasgupta, S., and Hsu, D. 2008. Hierarchical sampling for active learning. In Proceedings of the 25th International Conference on Machine learning, 208–215.

Donmez, P., and Carbonell, J. G. 2008. Paired-sampling in density-sensitive active learning. In Proceedings of the In- ternational Symposium on Artificial Intelligence and Math- ematics.

Donmez, P.; Carbonell, J. G.; and Bennett, P. N. 2007. Dual strategy active learning. In Proceedings of the 18th Euro- pean Conference on Machine Learning, 116–127.

Ganti, R., and Gray, A. 2012. UPAL: Unbiased pool based active learning. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, 422–

431.

Huang, S.-J.; Jin, R.; and Zhou, Z.-H. 2010. Active learn- ing by querying informative and representative examples. In Advances in Neural Information Processing Systems, vol- ume 23, 892–900.

Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; and Hinton, G. E.

1991. Adaptive mixtures of local experts. Neural computa- tion3(1):79–87.

Lewis, D. D., and Gale, W. A. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval, 3–12.

Nguyen, H. T., and Smeulders, A. 2004. Active learning us- ing pre-clustering. In Proceedings of the 21st International Conference on Machine Learning, 623–630.

Robbins, H. 1985. Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers. Springer.

169–177.

Settles, B. 2010. Active learning literature survey. Technical report, University of Wisconsin, Madison.

Tong, S., and Koller, D. 2002. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research2:45–66.

Vapnik, V. 1998. Statistical learning theory. Wiley.

Vermorel, J., and Mohri, M. 2005. Multi-armed bandit algo- rithms and empirical evaluation. In Proceedings of the 16th European Conference on Machine Learning, 437–448.

參考文獻

相關文件

• Information on learners’ performance in the learning task is collected throughout the learning and teaching process so as to help teachers design post-task activities

• following up the recommendation of offering vocational English as a new Applied Learning (ApL) course, as proposed by the Task Force on Review of School Curriculum with a view

Based on Cabri 3D and physical manipulatives to study the effect of learning on the spatial rotation concept for second graders..

This study aims at evaluating the learning effects on the part of students from the teacher-designed concrete vector classroom-teaching approach in Plane Vector course on the basis

• elearning pilot scheme (Four True Light Schools): WIFI construction, iPad procurement, elearning school visit and teacher training, English starts the elearning lesson.. 2012 •

• To consider the purpose of the task-based approach and the inductive approach in the learning and teaching of grammar at the secondary level.. • To take part in demonstrations

DVDs, Podcasts, language teaching software, video games, and even foreign- language music and music videos can provide positive and fun associations with the language for

• use Chapter 4 to: a) develop ideas of how to differentiate the classroom elements based on student readiness, interest and learning profile; b) use the exemplars as guiding maps to