• 沒有找到結果。

This homework set comes with 160 points and 40 bonus points. In general, every home- work set would come with a full credit of 160 points, with some possible bonus points.

N/A
N/A
Protected

Academic year: 2022

Share "This homework set comes with 160 points and 40 bonus points. In general, every home- work set would come with a full credit of 160 points, with some possible bonus points."

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Techniques (NTU, Spring 2017) instructor: Hsuan-Tien Lin

Homework #4

RELEASE DATE: 05/23/2017 DUE DATE: 06/20/2017, BEFORE 14:00

QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FACEBOOK FORUM.

Unless granted by the instructor in advance, you must turn in a printed/written copy of your solutions (without the source code) for all problems.

For problems marked with (*), please follow the guidelines on the course website and upload your source code to designated places. You are encouraged to (but not required to) include a README to help the TAs check your source code. Any programming language/platform is allowed.

Any form of cheating, lying, or plagiarism will not be tolerated. Students can get zero scores and/or fail the class and/or be kicked out of school and/or receive other punishments for those kinds of misconducts.

Discussions on course materials and homework solutions are encouraged. But you should write the final solutions alone and understand them fully. Books, notes, and Internet resources can be consulted, but not copied from.

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time. In order to maximize the level of fairness in this class, lending and borrowing homework solutions are both regarded as dishonest behaviors and will be punished according to the honesty policy.

You should write your solutions in English or Chinese with the common math notations introduced in class or in the problems. We do not accept solutions written in any other languages.

This homework set comes with 160 points and 40 bonus points. In general, every home- work set would come with a full credit of 160 points, with some possible bonus points.

Random Forest

1.

If bootstrapping is used to sample N0 = pN examples out of N examples and N is very large, argue that approximately e−p· N of the examples will not be sampled at all.

2.

Consider a Random Forest G that consists of three binary classification trees {gk}3k=1, where each tree is of test 0/1 error Eout(g1) = 0.15, Eout(g2) = 0.25, Eout(g3) = 0.35. What is the possible range of Eout(G)? Justify your answer.

3.

Consider a Random Forest G that consists of K binary classification trees {gk}Kk=1, where K is an odd integer. Each gk is of test 0/1 error Eout(gk) = ek. Prove or disprove that K+12 PK

k=1ek upper bounds Eout(G).

Gradient Boosting

4.

For the gradient boosted decision tree, if a tree with only one constant node is returned as g1, and if g1(x) = 2, then after the first iteration, all sn is updated from 0 to a new constant α1g1(xn).

What is sn? Prove your answer.

5.

For the gradient boosted decision tree, after updating all sn in iteration t using the steepest η as αt, what is the value ofPN

n=1sngt(xn)? Prove your answer.

6.

If gradient boosting is coupled with linear regression (without regularization) instead of decision trees. Prove or disprove that the optimal α1 = 1. (A 10% bonus can be given if your proof for either case is rigorous and works for general polynomial regression.)

7.

If gradient boosting is coupled with linear regression (without regularization) instead of decision trees. Prove or disprove that the optimal g2(x) = 0. (A 10% bonus can be given if your proof for either case is rigorous and works for general polynomial regression.)

1 of 3

(2)

Machine Learning Techniques (NTU, Spring 2017) instructor: Hsuan-Tien Lin

Neural Network

8.

Consider Neural Network with sign(s) instead of tanh(s) as the transformation functions. That is, consider Multi-Layer Perceptrons. In addition, we will take +1 to mean logic TRUE, and −1 to mean logic FALSE. Assume that all xi below are either +1 or −1. Write down the weights wi for the following perceptron

gA(x) = sign

d

X

i=0

wixi

! . to implement

OR (x1, x2, . . . , xd) . Explain your answer.

9.

Continuing from Question 8, among the following choices of D, write down the smallest D for some 5-D-1 Neural Network to implement XOR (x)1, (x)2, (x)3, (x)4, (x5). Explain your implementation.

(It is not so easy to prove the smallest choice, so let’s leave the proof for the bonus.)

10.

For a Neural Network with at least one hidden layer and tanh(s) as the transformation functions on all neurons (including the output neuron), when all the initial weights wij(`) are set to 0, what gradient components are also 0? Justify your answer.

11.

For a Neural Network with one hidden layer and tanh(s) as the transformation functions on all neu- rons (including the output neuron), prove that for the backprop algorithm (with gradient descent), when all the initial weights w(`)ij are set to 1, then wij(1)= wi(j+1)(1) for all i and 1 ≤ j < d(1). Experiments with Random Forest

Implement the Bagging algorithm with N0 = N and couple it with your decision tree in HW3 to make a preliminary random forest GRF. Produce T = 30000 trees with bagging. Compute Ein and Eout using the 0/1 error.

Run the algorithm on the following set for training (i.e. re-use HW3 datasets):

hw3_train.dat and the following set for testing:

hw3_test.dat

12.

(*) Plot a histogram of Ein(gt) over the 30000 trees.

13.

(*) Let Gt= “the random forest with the first t trees”. Plot a curve of t versus Ein(Gt).

14.

(*) Continuing from Question 13, and plot a curve of t versus Eout(Gt). Briefly compare with the curve in Question 13 and state your findings.

Now, ‘prune’ your decision tree algorithm by restricting it to have one branch only. That is, the tree is simply a decision stump determined by Gini index. Make a random ‘forest’ GRS with those decision stumps with Bagging like Questions12-14with T = 30000. Compute Ein and Eout using the 0/1 error.

15.

(*) Again, let Gt= “the random forest with the first t decision stumps”. Plot a curve of t versus Ein(Gt).

16.

(*) Continuing from Question 15, and plot a curve of t versus Eout(Gt). Briefly compare with the curve in Question 15 and state your findings.

2 of 3

(3)

Machine Learning Techniques (NTU, Spring 2017) instructor: Hsuan-Tien Lin

Bonus: Crazy XOR

17.

(10%) Continuing from Question 8, prove or disprove that D = d is the smallest D that allows for implementing XOR (x)1, (x)2, . . . , (xd) with a d-D-1 feed-forward neural network with sign(s) as the transformation function (such a neural network is also called a Linear Threshold Circuit).

18.

(10%) Continuing from Question 8, if you are allowed to use D neurons (including the one for output) to implement XOR (x)1, (x)2, . . . , (xd), but can connect the neurons in whatever way as long as it is feed-forward (such as connecting the input directly to neurons in other “layers”), what is the smallest D (that you can find) for implementing the function? Explain your implementation.

You can refer to

http://www.nature.com/nature/journal/v475/n7356/fig_tab/nature10262_F2.html for a possible construction using two neurons for d = 3.

3 of 3

參考文獻

相關文件

Generate a data set of size 2 by the procedure above and run the decision stump algorithm on the data set to get g.. Repeat the experiment 10000 times, each with a different

◦ GitHub code, Project document. ◦ Bonus points for the

(Note: The choices here “hint” you that the expected value is related to the matrix being inverted for regularized linear regression—see page 10 of Lecture 14. That is, data

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time.. In order

(BFQ, *) Implement the fixed learning rate gradient descent algorithm below for logistic regression.. Run the algorithm with η = 0.001 and T = 2000 on the following set

As shown in class, for one-dimensional data, the VC dimension of the decision stump model is 2.. In fact, the decision stump model is one of the few models that we could easily

Since everyone needs to write the final solutions alone, there is absolutely no need to lend your homework solutions and/or source codes to your classmates at any time.. In order

First, write a program to implement the (linear) ridge regression algorithm for classification (i.e. use 0/1 error for evaluation)?. Use the first 400 examples for training to get g