• 沒有找到結果。

6.4 Experiments with Adaptive Boosting (*)

N/A
N/A
Protected

Academic year: 2022

Share "6.4 Experiments with Adaptive Boosting (*)"

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning (NTU, Fall 2008) instructor: Hsuan-Tien Lin

Homework #6

TA in charge: Hanhsing Tu, Room 536 RELEASE DATE: 12/11/2008 DUE DATE: 12/18/2008, 4:00 pm IN CLASS TA SESSION: 12/17/2008, noon to 2:00 pm IN R106

Unless granted by the instructor in advance, you must turn in a hard copy of your solutions (without the source code) for all problems. For problems marked with (*), please follow the guidelines on the course website and upload your source code to designated places.

Discussions on course materials and homework solutions are encouraged. But you should write the final solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but not copied from.

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will be punished according to the honesty policy.

You should write your solutions in English with the common math notations introduced in class or in the problems. We do not accept solutions written in any other languages.

6.1 Bayesian Universe

(1) (15%) ASSUME that the universe generates an example (x, y) by the following procedure:

(a) generate x from some probability density function P (x) (b) use some fixed (w, θ) to evaluate ρ = hw, xi − θ

(c) generate y ∈ R from ρ by the probability density function P (y| ρ) =1exp −(y − ρ)2 If each (xn, yn) within Z = {(xn, yn)}Nn=1is generated i.i.d from the procedure above, what is the likelihood P Z| (w, θ)? Prove that linear regression (see Problem 2.3-(1)) equivalently gives the maximum likelihood estimate of (w, θ).

(2) (20%) ASSUME that the universe generates an example (x, y) by the following procedure:

(a) generate (w, θ) from P (w, θ) = 1

(

2π)d+1·σd+1 · exp

kwk22 2

 (b) generate x from some probability density function P (x)

(c) use the “fixed” (w, θ) to evaluate ρ = hw, xi − θ

(d) generate y ∈ R from ρ by the probability density function P (y| ρ) = 1exp −(y − ρ)2 If each (xn, yn) within Z = {(xn, yn)}Nn=1is generated i.i.d from the procedure above, and assume that the constant P (Z) = Q, what is the posterior P (w, θ)| Z? Prove that regularized linear regression (see Problem 2.3.(3)) equivalently gives the maximum a posteriori estimate of (w, θ)).

In particular, what is the relationship between λ (in Problem 2.3(3)) and σ (here)?

(3) (15%) ASSUME that the universe generates an example (x, y) by the following procedure:

(a) generate x from some probability density function P (x) (b) use some fixed (w, θ) to evaluate ρ = hw, xi − θ

(c) evaluate Q+= exp ρ2 and Q= exp −ρ2

(d) generate y ∈ {+, −} with the probability distribution Qy/(Q++ Q)

If each (xn, yn) within Z = {(xn, yn)}Nn=1 is generated i.i.d from the procedure above, what is the likelihood P Z| (w, θ)? Prove that logistic regression (see Problem 2.4) equivalently gives the maximum likelihood estimate of (w, θ).

1 of 3

(2)

Machine Learning (NTU, Fall 2008) instructor: Hsuan-Tien Lin

6.2 Power of Adaptive Boosting

The adaptive boosting (AdaBoost) algorithm, as shown in the class slides, is as follows:

• Input: Z = {(xn, yn)}Nn=1.

• Set un= N1 for all n.

• For t = 1, 2, · · · , T ,

– Learn a simple rule ht such that ht solves

ht= argminh

N

X

n=1

un· I[yn6= h(xn)].

with the help of some base learner Ab. – Compute the weighted error t= 1

PN m=1um

N

X

n=1

un· I[yn 6= ht(xn)] and the confidence

αt=1

2ln1 − t

t

– Emphasize the training examples that do not agree with ht: un= un· exp

−αtynht(xn) .

• Output: combined function H(x) = sign

T

X

t=1

αtht(x)

!

In this problem, we will prove that AdaBoost can reach ν(H) = 0 if T is large enough and every hypothesis htsatisfies t≤  < 12.

(1) (5%) Let U(t−1)=

N

X

n=1

un at the beginning of the t-th iteration. What is U(0)? (2) (10%) According to the AdaBoost algorithm above, for t ≥ 1, prove that

U(t)= 1 N

N

X

n=1

exp −yn

t

X

τ =1

ατhτ(xn)

! .

(3) (5%) By the result in (2), prove that ν(H) ≤ U(T ).

(4) (10%) According to the AdaBoost algorithm above, for t ≥ 1, prove that U(t)= U(t−1)· 2p

t(1 − t).

(5) (5%) Using 0 ≤ t≤  < 12, for t ≥ 1, prove that pt(1 − t) ≤p

(1 − ).

(6) (5%) Using  < 12, prove that

p(1 − ) ≤ 1 2exp



−2(1 2− )2

 .

(7) (5%) Using the results above, prove that U(T )≤ exp



−2T (1 2 − )2

 .

(8) (5%) Using the results above, argue that after T = O(log N ) iterations, ν(H) = 0.

2 of 3

(3)

Machine Learning (NTU, Fall 2008) instructor: Hsuan-Tien Lin

6.3 Experiments with Bootstrap Aggregation (*)

(1) (20%) Implement the decision stump learning algorithm Ads. That is, let hs,i,θ(x) = sign

s · (x)i− θ ,

where s ∈ {−1, +1}, i ∈ {1, 2, . . . , d} , and θ ∈ R. Given a weighted training set Z = {(xn, yn, un)}Nn=1,

Ads(Z) = argmin

hs,i,θ N

X

n=1

un· Iyn6= hs,i,θ(xn).

Run the algorithm on the following set for training (with un =N1 for all N ):

http://www.csie.ntu.edu.tw/~htlin/course/ml08fall/data/hw6_train.dat and the following set for testing:

http://www.csie.ntu.edu.tw/~htlin/course/ml08fall/data/hw6_test.dat

Let g be the decision function returned from Ads. Report ν(g) and ˆπ(g). Briefly state your findings.

(2) (30%) Implement the bootstrap aggregation (bagging) algorithm with decision stumps (i.e., use Ads as Ab below):

• Input: Z = {(xn, yn)}Nn=1.

• for t = 1, 2, . . . , T ,

– generate Z(t) from Z by bootstrapping—uniformly sampling N examples from Z with replacement

– let ht= Ab(Z(t)) and αt= 1.

• Output: combined function H(x) = sign

T

X

t=1

αtht(x)

!

Use a total of T = 100 iterations. Let Ht(x) = sign

t

X

τ =1

ατhτ(x)

!

. Plot ν(Ht) and ˆπ(Ht) as functions of t on the same figure. Briefly state your findings.

(3) (Bonus 5%) Prove that you can implement an Ads that runs in time O(N log N ) instead of the brute-force implementation that takes O(N2).

6.4 Experiments with Adaptive Boosting (*)

Implement the AdaBoost algorithm (as in Problem 6.2 above) with decision stumps (i.e., use Adsas Ab).

Run the algorithm on the following set for training:

http://www.csie.ntu.edu.tw/~htlin/course/ml08fall/data/hw6_train.dat and the following set for testing:

http://www.csie.ntu.edu.tw/~htlin/course/ml08fall/data/hw6_test.dat

(1) (30%) Use a total of T = 100 iterations. Let Ht(x) = sign

t

X

τ =1

ατhτ(x)

!

. Plot ν(Ht), ˆπ(Ht), AND U(t)(see the definition above) as functions of t on the same figure. Briefly state your findings.

(2) (20%) Compare your plots in Problem 6.3 and Problem 6.4. Briefly state your findings.

3 of 3

參考文獻

相關文件

(Class): Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top. Hsuan-Tien

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time.. In order

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time.. In order

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time.. In order

(Class): Apples are somewhat circular, somewhat red, possibly green, and may have stems at the top. Hsuan-Tien

In this homework, you are asked to implement k-d tree for the k = 1 case, and the data structure should support the operations of querying the nearest point, point insertion, and

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time.. In order

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time.. In order