2.4 Adaptive Perceptron Learning (*)

(1)

Machine Learning (NTU, Fall 2009) instructor: Hsuan-Tien Lin

Homework #2

TA in charge: Ken-Yi Lin RELEASE DATE: 10/05/2009 DUE DATE: 10/12/2009, 4:20 pm IN CLASS

TA SESSION: 10/09/2009 (Friday), noon to 2:00 pm IN R106

Unless granted by the instructor in advance, you must turn in a written/printed copy of your solutions (without the source code) for all problems. For problems marked with (*), please follow the guidelines on the course website and upload your source code to designated places.

Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.

Discussions on course materials and homework solutions are encouraged. But you should write the final solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but not copied from.

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will be punished according to the honesty policy.

You should write your solutions in English with the common math notations introduced in class or in the problems. We do not accept solutions written in any other languages.

2.1 Marbles and Coins

(1) (5%) Do Exercise 1.9 of LFD.

(3) (5%) Do Exercise 1.11-1 of LFD.

2.2 Learning Games

2.3 Probably Approximately Correct

Read the derivation that links Equation (1.6) to Equation (2.1) on LFD Page 2-2. In particular, let

(M, N, δ) = r 1

2N ln2M δ .

(1) (5%) Take δ = 0.03 and M = 1, how many examples do we need to make (M, N, δ) ≤ 0.05?

1 of 2

(2)

Machine Learning (NTU, Fall 2009) instructor: Hsuan-Tien Lin

The title of this problem, Probably Approximately Correct, states what we can interpret from (2.1) if we have enough training examples. “Probably” means the statement is true with a high probability (≥ 1 − δ). “Approximately” means that every Eout(g) is close to Ein(g) (within ). “Correct” means that we can guarantee Eout(g) to be small (by getting some decision function g with small Ein(g)).

2.4 Adaptive Perceptron Learning (*)

We know that the perceptron learning rule works like this: In each iteration, pick a random (x^(t), y^(t)) and compute ρ^(t)= w^(t)^•x^(t). If y^(t)· ρ^(t)≤ 0, update w by

w^(t+1)←− w^(t)+ y^(t)· x^(t) ;

One may argue that the algorithm did not take the “closeness” between ρ^(t)and y^(t)into consideration.

Let’s look at another perceptron learning algorithm: In each iteration, pick a random (x^(t), y^(t)) and compute ρ^(t)= w^(t)^•x^(t). If y^(t)· ρ^(t)≤ 1, update w by

w^(t+1)←− w^(t)+ η ·

y^(t)− ρ^(t)

· x^(t),

where η is some constant. That is, if ρ^(t) agrees with y^(t) a lot (their product is > 1), the algorithm does nothing. On the other hand, if ρ^(t) is further from y^(t), the algorithm changes w^(t) more. In this problem, you are asked to implement this algorithm and check its performance.

(1) (6%) Generate a training data set of size 100 as directed in Homework Problem 1.4. Generate a test data set of size 10000 from the same process. Run the algorithm above with η = 100 on the training data set until it converges (no more possible updates) or a maximum of 1000 updates has been reached to get g. Plot the training data set, the target function f , and the final hypothesis g on the same figure. Estimate the out-of-sample error with the test set.

(2) (6%) Use the data set in (1) and redo everything with η = 1.

(3) (6%) Use the data set in (1) and redo everything with η = 0.01.

(4) (6%) Use the data set in (1) and redo everything with η = 0.0001.

(5) (6%) Compare the results that you get from (1) to (4).

The algorithm above is a variant of the so-called Adaline (Adaptive Linear Neuron) algorithm for perceptron learning.

2 of 2