Machine Learning (NTU, Fall 2011) instructor: Hsuan-Tien Lin
Homework #0
RELEASE DATE: 09/13/2011 DUE DATE: NONE
1 Probability and Statistics
(1) (combinatorics)
Let C(N; K) = 1 for K = 0 or K = N, and C(N; K) = C(N 1; K) + C(N 1; K 1) for N 1.
Prove that C(N; K) =K!(N K)!N! for N 1 and 0 K N.
(2) (counting)
What is the probability of getting exactly 6 heads when ipping 10 fair coins?
What is the probability of getting a full house (XXXYY) when randomly drawing 5 cards out of a deck of 52 cards?
(3) (conditional probability)
If your friend ipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads?
(4) (Bayes theorem)
A program selects a random integer X like this: a random bit is rst generated uniformly. If the bit is 0, X is drawn uniformly from f0; 1; : : : ; 7g; otherwise, X is drawn uniformly from f0; 1; 2; 3g.
If we get an X from the program with jXj = 1, what is the probability that X is negative?
(5) (union/intersection)
If P (A) = 0:3 and P (B) = 0:4,
what is the maximum possible value of P (A \ B)?
what is the minimum possible value of P (A \ B)?
what is the maximum possible value of P (A [ B)?
what is the minimum possible value of P (A [ B)?
(6) (mean/variance) Let mean X = 1
N XN n=1
Xn and variance X2 = 1
N 1
XN n=1
(Xn X)2. Prove that
X2 = N
N 1
1 N
XN n=1
Xn2 X2
! :
(7) (Gaussian distribution)
If X1and X2are independent random variables, where p(X1) is Gaussian with mean 2 and variance 1, p(X2) is Gaussian with mean 3 and variance 4. Let Z = X1+ X2. Prove p(Z) is Gaussian, and determine its mean and variance.
2 Linear Algebra
(1) (rank)
What is the rank of 0
@ 1 2 1 1 0 3 1 1 2
1 A ?
(2) (inverse)
What is the inverse of 0
@ 0 2 4 2 4 2 3 3 1
1 A ?
1 of 3
Machine Learning (NTU, Fall 2011) instructor: Hsuan-Tien Lin
(3) (eigenvalues/eigenvectors)
What are the eigenvalues and eigenvectors of 0
@ 3 1 1
2 4 2
1 1 1
1 A ? (4) (singular value decomposition)
For a real matrix M, let M = UVT be its singular value decomposition. Dene My= VyUT, where
y[i][j] =[i][j]1 when [i][j] is nonzero, and 0 otherwise. Prove that MMyM = M.
(5) (PD/PSD)
A symmetric real matrix A is positive denite (PD) i xTAx > 0 for all x 6= 0, and positive semi- denite (PSD) if \>" is changed to \". Prove:
(a) For any real matrix Z, ZZT is PSD.
(b) A is PD i all eigenvalues of A are strictly positive.
(6) (inner product)
Consider x 2 Rd and some u 2 Rd with kuk = 1.
What is the maximum value of uTx?
What is the minimum value of uTx?
What is the minimum value of juTxj?
(7) (distance)
Consider two parallel hyperplanes in Rd:
H1: wTx = +3;
H2: wTx = 2;
where w is the norm vector. What is the distance between H1 and H2?
3 Calculus
(1) (dierential and partial dierential) Let f(x) = ln(1 + e 2x). What is df(x)
dx ? Let g(x; y) = ex+ e2y+ e3xy2. What is @g(x; y)
@y ? (2) (chain rule)
Let f(x; y) = xy, x(u; v) = cos(u + v), y(u; v) = sin(u v). What is @f
@v? (3) (integral)
What is Z 10
5
2 x 3dx?
(4) (gradient and Hessian)
Let E(u; v) = (uev 2ve u)2. Calculate the gradient rE and the Hessian r2E at u = 1 and v = 1.
(5) (Taylor's expansion)
Let E(u; v) = (uev 2ve u)2. Write down the second-order Taylor's expansion of E around u = 1 and v = 1.
(6) (optimization)
For some given A > 0; B > 0, solve
min Ae+ Be 2: (7) (vector calculus)
Let w be a vector in Rd and E(w) = 12wTAw + bTw for some symmetric matrix A and vector b.
Prove that the gradient rE(w) = Aw + b and the Hessian r2E(w) = A.
2 of 3
Machine Learning (NTU, Fall 2011) instructor: Hsuan-Tien Lin
(8) (quadratic programming)
Following the previous question, if A is not only symmetric but also positive denite (PD), prove that the solution of argminwE(w) is A 1b.
(9) (optimization with linear constraint) Consider
w1min;w2;w3
1
2(w21+ 2w22+ 3w23) subject to w1+ w2+ w3= 11:
Refresh your memory on \Lagrange multipliers" and show that the optimal solution must happen on w1= , 2w2= , 3w3= . Use the property to solve the problem.
(10) (optimization with linear constraints)
Let w be a vector in Rd and E(w) be a convex dierentiable function of w. Prove that the optimal solution to
minw E(w) subject to Aw + b = 0:
must happen at rE(w) + TA = 0 for some vector . (Hint: If not, let u be the residual when projecting rE(w) to the span of the rows of A. Show that for some very small , w u is a feasible solution that improves E.)
3 of 3