4.4 Linear Programming for Classification (*)

(1)

Machine Learning (NTU, Fall 2009) instructor: Hsuan-Tien Lin

Homework #4

TA in charge: Te-Kang Jan RELEASE DATE: 11/02/2009 DUE DATE: 11/16/2009, 4:20 pm IN CLASS

TA SESSION: 11/13/2009 (Friday), noon to 2:00 pm IN R106

Unless granted by the instructor in advance, you must turn in a written/printed copy of your solutions (without the source code) for all problems. For problems marked with (*), please follow the guidelines on the course website and upload your source code to designated places.

Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.

Discussions on course materials and homework solutions are encouraged. But you should write the final solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but not copied from.

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will be punished according to the honesty policy.

You should write your solutions in English with the common math notations introduced in class or in the problems. We do not accept solutions written in any other languages.

4.1 More on Growth Function and VC Dimension

(1) (4%) Show that m_H(2N ) ≤ (m_H(N ))², and hence change equation (2.11) of LFD to another generalization bound which only involves m_H(N ).

(2) (4%) Let H = {h1, h2, . . . , hM} with some finite M . Prove that dVC(H) ≤ log₂M.

(3) (6%) For hypothesis sets H1, H2, · · · , HKwith finite VC-dimensions dVC(Hk), derive and prove the tightest lower bound and the tightest upper bound that you can get on dVC ∩^K_k=1Hk.

(4) (6%) For hypothesis sets H1, H2, · · · , HKwith finite VC-dimensions dVC(Hk), derive and prove the tightest lower bound and the tightest upper bound that you can get on dVC ∪^K_k=1Hk.

4.2 The Hat of Linear Regression

(1) (5%) Do Exercise 3.3-1 of LFD.

4.3 The Feature Transforms

(1) (5%) Do Exercise 3.6 of LFD.

1 of 2

(2)

Machine Learning (NTU, Fall 2009) instructor: Hsuan-Tien Lin

4.4 Linear Programming for Classification (*)

In this problem we consider a linear programming formulation classification. The linear program is slightly different from the one introduced in class. For this problem, you can use any linear programming package in the platform of your choice (for example linprog in MATLAB).

(1) (3%) The data is linearly separable when there exists some u such that sign(u^•xn) = yn for all n. If the data is linearly separable, prove that there exists some w such that yn(w ^•xn) ≥ 1 for all n.

(2) (3%) Consider the optimization problem

min

w_i,ξ_n N

X

n=1

ξn,

s.t. y_n(w^•x_n) ≥ 1 − ξ_n, n = 1, 2, · · · , N, ξn≥ 0, n = 1, 2, · · · , N.

Argue that the data is linearly separable if and only if the optimal (w^∗₀, w₁^∗, · · · , w^∗_d, ξ₁^∗, ξ₂^∗, · · · , ξ_n^∗) satisfiesPN

n=1ξ_n^∗ = 0.

(3) (7%) Generate a linearly separable training set and the associated test set as directed by Prob- lem 2.4(1). Implement the linear program above and run it on the sets above. Report the following numbers: PN

n=1ξ^∗_n, Ein(w^∗), Eout(w^∗). Briefly state your findings.

(4) (7%) Generate training set that is not linearly separable and the associated test set as directed by Problem 3.3(1). Implement the linear program above and run it on the sets above. Report the following numbers: PN

n=1ξ^∗_n, E_in(w^∗), E_out(w^∗). Briefly state your findings.

(5) (Bonus 5%) Does the optimal w^∗ change if you replace 1 with 10? Does the line represented by the optimal w^∗change? Why or why not?

4.5 Polynomial Regression and Overfitting (*)

(1) (20%) Do a variant of Exercise 4.2 with the following changes:

• Change H₂ to H₁ (i.e., the original linear regression).

• Change H10to H6 (sixth order polynomials).

Plot your results with figures like those in Figure 4.2. Compare yours with Figure 4.2 and briefly state your findings.

2 of 2