Machine Learning Foundations/Techniques (NTU, Fall 2020) instructor: Hsuan-Tien Lin
Homework #0
RELEASE DATE: 09/15/2020 DUE DATE: NONE
1 Probability and Statistics
(1) (foundations: combinatorics)
Let C(N, K) = 1 for K = 0 or K = N , and C(N, K) = C(N − 1, K) + C(N − 1, K − 1) for N ≥ 1.
Prove that C(N, K) =K!(N −K)!N ! for N ≥ 1 and 0 ≤ K ≤ N . (2) (foundations: counting)
What is the probability of getting exactly 4 heads when flipping 10 fair coins?
What is the probability of getting a full house (XXXYY) when randomly drawing 5 cards out of a deck of 52 cards?
(3) (foundations: conditional probability)
If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads?
(4) (foundations: Bayes theorem)
A program selects a random integer X like this: a random bit is first generated uniformly. If the bit is 0, X is drawn uniformly from {0, 1, . . . , 7}; otherwise, X is drawn uniformly from {0, −1, −2, −3}.
If we get an X from the program with |X| = 1, what is the probability that X is negative?
(5) (foundations: union/intersection) If P (A) = 0.3 and P (B) = 0.4,
what is the maximum possible value of P (A ∩ B)?
what is the minimum possible value of P (A ∩ B)?
what is the maximum possible value of P (A ∪ B)?
what is the minimum possible value of P (A ∪ B)?
(6) (techniques: mean/variance) Let mean X = 1
N
N
X
n=1
Xn and variance σX2 = 1 N − 1
N
X
n=1
(Xn− X)2. Prove that
σX2 = N N − 1
1 N
N
X
n=1
Xn2− X2
! .
2 Linear Algebra
(1) (foundations: rank) What is the rank of
1 2 1 1 0 3 1 1 2
?
(2) (foundations: inverse) What is the inverse of
0 2 4 2 4 2 3 3 1
?
(3) (foundations: eigenvalues/eigenvectors) What are the eigenvalues and eigenvectors of
3 1 1
2 4 2
−1 −1 1
?
1 of 3
Machine Learning Foundations/Techniques (NTU, Fall 2020) instructor: Hsuan-Tien Lin
(4) (foundations: singular value decomposition)
(a) For a real matrix M, let M = UΣVT be its singular value decomposition. Define M†= VΣ†UT, where Σ†[i][j] =Σ[i][j]1 when Σ[i][j] is nonzero, and 0 otherwise. Prove that MM†M = M.
(b) If M is invertible, prove that M† = M−1. (5) (foundations: PD/PSD)
A symmetric real matrix A is positive definite (PD) iff xTAx > 0 for all x 6= 0, and positive semi- definite (PSD) if “>” is changed to “≥”. Prove:
(a) For any real matrix Z, ZZT is PSD.
(b) A symmetric A is PD iff all eigenvalues of A are strictly positive.
(6) (foundations: inner product)
Consider x ∈ Rd and some u ∈ Rd with kuk = 1.
What is the maximum value of uTx? What u results in the maximum value?
What is the minimum value of uTx? What u results in the minimum value?
What is the minimum value of |uTx|? What u results in the minimum value?
(7) (foundations: distance)
Consider two parallel hyperplanes in Rd:
H1: wTx = +3, H2: wTx = −2,
where w is thenormalvector. What is the distance between H1and H2?
3 Calculus
(1) (foundations: differential and partial differential) Let f (x) = ln(1 + e−2x). What is df (x)
dx ? Let g(x, y) = ex+ e2y+ e3xy2. What is ∂g(x, y)
∂y ? (2) (foundations: chain rule)
Let f (x, y) = xy, x(u, v) = cos(u + v), y(u, v) = sin(u − v). What is ∂f
∂v? (3) (foundations: integral)
What is Z 10
5
2 x − 3dx?
(4) (foundations: gradient and Hessian)
Let E(u, v) = (uev− 2ve−u)2. Calculate the gradient
∇E(u, v) =
∂E
∂E∂u
∂v
and the Hessian
H(u, v) =
∂2E
∂u∂u
∂2E
∂u∂v
∂2E
∂v∂u
∂2E
∂v∂v
!
at u = 1 and v = 1.
(5) (foundations: Taylor’s expansion)
Let E(u, v) = (uev− 2ve−u)2. Write down the second-order Taylor’s expansion of E around u = 1 and v = 1.
2 of 3
Machine Learning Foundations/Techniques (NTU, Fall 2020) instructor: Hsuan-Tien Lin
(6) (foundations: optimization) For some given A > 0, B > 0, solve
minα Aeα+ Be−2α. (7) (foundations: vector calculus)
Let w be a vector in Rd and E(w) = 12wTAw + bTw for some symmetric matrix A and vector b.
Prove that the gradient ∇E(w) = Aw + b and the Hessian ∇2E(w) = A.
(8) (foundations: quadratic programming)
Following the previous question, if A is not only symmetric but also positive definite (PD), prove that the solution of argminwE(w) is −A−1b.
(9) (techniques: optimization with linear constraint) Consider
w1min,w2,w3
1
2(w21+ 2w22+ 3w23) subject to w1+ w2+ w3= 11.
Refresh your memory on “Lagrange multipliers” and show that the optimal solution must happen on w1= λ, 2w2= λ, 3w3= λ. Use the property to solve the problem.
(10) (techniques: optimization with linear constraints)
Let w be a vector in Rd and E(w) be a convex differentiable function of w. Prove that the optimal solution to
minw E(w) subject to Aw + b = 0.
must happen at ∇E(w) + λTA = 0 for some vector λ. (Hint: If not, let u be the residual when projecting ∇E(w) to the span of the rows of A. Show that for some very small η, w − η · u is a feasible solution that improves E.)
3 of 3