Machine Learning (NTU, Fall 2009) instructor: Hsuan-Tien Lin
Homework #0
TA in charge: Chun-Wei Liu RELEASE DATE: 09/14/2009
DUE DATE: NONE
1 Probability and Statistics
(1) (combinatorics)
Let C(N, K) = 1 for K = 0 or K = N , and C(N, K) = C(N − 1, K) + C(N − 1, K − 1) for N ≥ 1.
Prove that C(N, K) = K!(N −K)!N ! for N ≥ 1 and 0 ≤ K ≤ N . (2) (counting)
What is the probability of getting exactly 6 heads when flipping 10 fair coins?
What is the probability of getting a full house (XXXYY) when randomly drawing 5 cards out of a deck of 52 cards?
(3) (conditional probability)
If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads?
(4) (Bayes theorem)
A program selects a random integer X like this: a random bit is first generated uniformly. If the bit is 0, X is drawn uniformly from {0, 1, . . . , 7}; otherwise, X is drawn uniformly from {0, −1, −2, −3}.
If we get an X from the program with |X| = 1, what is the probability that X is negative?
(5) (union/intersection)
If P (A) = 0.3 and P (B) = 0.4,
what is the maximum possible value of P (A ∩ B)?
what is the minimum possible value of P (A ∩ B)?
what is the maximum possible value of P (A ∪ B)?
what is the minimum possible value of P (A ∪ B)?
(6) (mean/variance) Let mean X = 1
N
N
X
n=1
Xn and variance σ2X= 1 N − 1
N
X
n=1
(Xn− X)2. Prove that
σX2 = N N − 1
1 N
N
X
n=1
Xn2− X2
! .
(7) (Gaussian distribution)
If X1 and X2 are independent random variables, where p(X1) is Gaussian with mean 2 and vari- ance 1, p(X2) is Gaussian with mean −3 and variance 4. Let Z = X1+X2. Prove p(Z) is Gaussian, and determine its mean and variance.
2 Linear Algebra
(1) (rank)
What is the rank of
1 2 1 1 0 3 1 1 2
?
1 of 3
Machine Learning (NTU, Fall 2009) instructor: Hsuan-Tien Lin
(2) (inverse)
What is the inverse of
0 2 4 2 4 2 3 3 1
?
(3) (eigenvalues/eigenvectors)
What are the eigenvalues and eigenvectors of
3 1 1
2 4 2
−1 −1 1
?
(4) (singular value decomposition)
For a real matrix M , let M = U ΣVT be its singular value decomposition. Define M† = V Σ†UT, where Σ†[i][j] =Σ[i][j]1 when Σ[i][j] is nonzero, and 0 otherwise. Prove that M M†M = M . (5) (PD/PSD)
A symmetric real matrix A is positive definite (PD) iff xTAx > 0 for all x 6= 0, and positive semi-definite (PSD) if “>” is changed to “≥”. Prove:
(a) For any real matrix Z, ZZT is PSD.
(b) A is PD iff all eigenvalues of A are strictly positive.
(6) (inner product)
Consider x ∈ Rd and some u ∈ Rd with kuk = 1.
What is the maximum value of uTx?
What is the minimum value of uTx?
What is the minimum value of |uTx|?
(7) (distance)
Consider two parallel hyperplanes in Rd:
H1: wTx = +3, H2: wTx = −2,
where w is the norm vector. What is the distance between H1 and H2?
3 Calculus
(1) (differential)
Let f (x) = ln(1 + e−2x). What is df (x) dx ? (2) (partial differential)
Let f (x, y) = ex+ e2y+ e3xy2. What is ∂f (x, y)
∂y ? (3) (chain rule)
Let f (x, y) = xy, x(u, v) = cos(u + v), y(u, v) = sin(u − v). What is ∂f
∂v? (4) (integral)
What is Z 10
5
2 x − 3dx?
(5) (gradient and Hessian)
Let E(u, v) = (uev−2ve−u)2. Calculate the gradient ∇E and the Hessian ∇2E at u = 1 and v = 1.
2 of 3
Machine Learning (NTU, Fall 2009) instructor: Hsuan-Tien Lin
(6) (optimization)
For some given A > 0, B > 0, solve
minα Aeα+ Be−2α.
3 of 3