Machine Learning Foundations (NTU, November 2016) instructor: Hsuan-Tien Lin
Homework #0
1 Probability and Statistics
(1) (combinatorics)
Let C(N, K) = 1 for K = 0 or K = N , and C(N, K) = C(N − 1, K) + C(N − 1, K − 1) for N ≥ 1.
Prove that C(N, K) =K!(N −K)!N ! for N ≥ 1 and 0 ≤ K ≤ N . (2) (counting)
What is the probability of getting exactly 4 heads when flipping 10 fair coins?
What is the probability of getting a full house (XXXYY) when randomly drawing 5 cards out of a deck of 52 cards?
(3) (conditional probability)
If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads?
(4) (Bayes theorem)
A program selects a random integer X like this: a random bit is first generated uniformly. If the bit is 0, X is drawn uniformly from {0, 1, . . . , 7}; otherwise, X is drawn uniformly from {0, −1, −2, −3}.
If we get an X from the program with |X| = 1, what is the probability that X is negative?
(5) (union/intersection)
If P (A) = 0.3 and P (B) = 0.4,
what is the maximum possible value of P (A ∩ B)?
what is the minimum possible value of P (A ∩ B)?
what is the maximum possible value of P (A ∪ B)?
what is the minimum possible value of P (A ∪ B)?
2 Linear Algebra
(1) (rank)
What is the rank of
1 2 1 1 0 3 1 1 2
?
(2) (inverse)
What is the inverse of
0 2 4 2 4 2 3 3 1
?
(3) (eigenvalues/eigenvectors)
What are the eigenvalues and eigenvectors of
3 1 1
2 4 2
−1 −1 1
?
(4) (singular value decomposition)
(a) For a real matrix M, let M = UΣVT be its singular value decomposition. Define M†= VΣ†UT, where Σ†[i][j] =Σ[i][j]1 when Σ[i][j] is nonzero, and 0 otherwise. Prove that MM†M = M.
(b) If M is invertible, prove that M† = M−1.
1 of 2
Machine Learning Foundations (NTU, November 2016) instructor: Hsuan-Tien Lin
(5) (PD/PSD)
A symmetric real matrix A is positive definite (PD) iff xTAx > 0 for all x 6= 0, and positive semi- definite (PSD) if “>” is changed to “≥”. Prove:
(a) For any real matrix Z, ZZT is PSD.
(b) A symmetric A is PD iff all eigenvalues of A are strictly positive.
(6) (inner product)
Consider x ∈ Rd and some u ∈ Rd with kuk = 1.
What is the maximum value of uTx? What u results in the maximum value?
What is the minimum value of uTx? What u results in the minimum value?
What is the minimum value of |uTx|? What u results in the minimum value?
3 Calculus
(1) (differential and partial differential) Let f (x) = ln(1 + e−2x). What is df (x)
dx ? Let g(x, y) = ex+ e2y+ e3xy2. What is ∂g(x, y)
∂y ? (2) (chain rule)
Let f (x, y) = xy, x(u, v) = cos(u + v), y(u, v) = sin(u − v). What is ∂f
∂v? (3) (gradient and Hessian)
Let E(u, v) = (uev− 2ve−u)2. Calculate the gradient ∇E and the Hessian ∇2E at u = 1 and v = 1.
(4) (Taylor’s expansion)
Let E(u, v) = (uev− 2ve−u)2. Write down the second-order Taylor’s expansion of E around u = 1 and v = 1.
(5) (optimization)
For some given A > 0, B > 0, solve
minα Aeα+ Be−2α. (6) (vector calculus)
Let w be a vector in Rd and E(w) = 12wTAw + bTw for some symmetric matrix A and vector b.
Prove that the gradient ∇E(w) = Aw + b and the Hessian ∇2E(w) = A.
2 of 2