2.2AConcreteLearningCase 2.1SimpliﬁedNo-Free-LunchTheorem Homework#2

(1)

Machine Learning (NTU, Fall 2012) instructor: Hsuan-Tien Lin

Homework #2

RELEASE DATE: 09/27/2012

DUE DATE: 10/11/2012, BEFORE THE END OF CLASS

QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.

Unless granted by the instructor in advance, you must turn in a printed/written copy of your solutions (without the source code) for all problems. For problems marked with (*), please follow the guidelines on the course website and upload your source code to designated places.

Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.

Discussions on course materials and homework solutions are encouraged. But you should write the final solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but not copied from.

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will be punished according to the honesty policy.

You should write your solutions in English with the common math notations introduced in class or in the problems. We do not accept solutions written in any other languages.

2.1 Simplified No-Free-Lunch Theorem

(1) (8%) Do Problem 1.10(a) of LFD.

(2) (8%) Do Problem 1.10(b) of LFD.

(3) (8%) Do Problem 1.10(c) of LFD.

(4) (8%) Do Problem 1.10(d) of LFD.

(5) (8%) Do Problem 1.10(e) of LFD.

2.2 A Concrete Learning Case

In this homework, we will use a concrete case of learning to help understand the whole framework. We will take the following settings:

• X = [−1, 1] × [−1, 1], Y = {−1, +1}

• f (x) = sign(x1− 2x2), with sign(0) = +1 for simplicity

• H = {h₁, h₂}. That is, M = 2.

• h1(x) = sign(−x2)

• h₂(x) = sign(x₁)

• P (x) is a uniform distribution within X .

(1) (10%) Calculate Eout(h1) and Eout(h2). Which is worse? (Hint: h2 shall be worse! )

(2) (10%) Consider generating N = 10 examples by sampling i.i.d. from P (x) and evaluating through f (x). Numerically calculate the exact probability that E_in(h₁) = 0.

(3) (10%) Let = 0.99Eout(h1). Calculate the Hoeffding’s bound 2 exp(−2²N ) for N = 10. Argue that the choice of allows the event [Ein(h1) = 0] to be included in the BAD event [|Ein(h1) − Eout(h1)| > ]

in Hoeffding’s inequality. Compare with the probability you just calculated and briefly state your findings.

1 of 2

(2)

Machine Learning (NTU, Fall 2012) instructor: Hsuan-Tien Lin

(4) (10%) Consider a learning algorithm that always returns the hypothesis h ∈ H with a smaller Ein as the final hypothesis g. When there is a tie, the hypothesis h1will be returned.

Illustrate a data set in which h₂(Hint: the worse hypothesis) will be returned instead of h₁(Hint:

the better hypothesis) by the algorithm.

(5) (10%) Numerically calculate the exact probability of generating a size-10 data set (with P ) such that h2 will be returned instead of h1 by the algorithm.

(6) (10%) Let = 0.5 E_out(h₂)−E_out(h₁). Calculate the extended Hoeffding’s bound 2·2 exp(−2²N ) for N = 10. Argue that the choice of allows the event [the algorithm returns h₂] to be included in the BAD eventmaxh∈{h₁,h₂}|E_in(h) − E_out(h)| > in extended Hoeffding’s inequality. Compare with the probability you just calculated and briefly state your findings.

2.3 Multiple Coins and Multiple Bins

(1) (7%) Do Exercise 1.10(a) of LFD.

(2) (7%) Do Exercise 1.10(b) of LFD.

(3) (7%) Do Exercise 1.10(c) of LFD.

(4) (7%) Do Exercise 1.10(d) of LFD.

(5) (7%) Do Exercise 1.10(e) of LFD.

2.4 Probably Approximately Correct

Read the derivation that links Equation (1.6) to Equation (2.1) on Page 40 of LFD. Then, (1) (5%) Do Problem 2.1(a) of LFD.

The title of this problem, Probably Approximately Correct, states what we can interpret from (2.1) if we have enough training examples. “Probably” means the statement is true with a high probability (≥ 1 − δ). “Approximately” means that every Eout(g) is close to Ein(g) (within ). “Correct” means that we can guarantee Eout(g) to be small (by getting some hypothesis g with small Ein(g)).

2.5 Experiments with Another Perceptron Algorithm (*)

(1) (10%) Do Problem 1.5(a) of LFD.

(4) (10%) Do Problem 1.5(d) of LFD.

(5) (10%) Do Problem 1.5(e) of LFD.

2.6 Friend or Foe

(1) (Bonus 5%) Do Exercise 1.12 of LFD. You need to provide a convincing reason to get the points.

(2) (Bonus 5%) If we replace your “friend” in the exercise with your “foe”, you need to be extremely careful in the guarantee you can provide. Do you see any traps in the setup and/or the choices?

Write down your own version of the guarantee that you’d provide to your foe, and state the convincing reasons behind your own version.

2 of 2