2 Linear Algebra

(1)

Machine Learning Foundations/Techniques (NTU, Fall 2020) instructor: Hsuan-Tien Lin

Homework #0

RELEASE DATE: 09/15/2020 DUE DATE: NONE

1 Probability and Statistics

(1) (foundations: combinatorics)

Let C(N, K) = 1 for K = 0 or K = N , and C(N, K) = C(N − 1, K) + C(N − 1, K − 1) for N ≥ 1.

Prove that C(N, K) =_{K!(N −K)!}^{N !} for N ≥ 1 and 0 ≤ K ≤ N . (2) (foundations: counting)

What is the probability of getting exactly 4 heads when flipping 10 fair coins?

What is the probability of getting a full house (XXXYY) when randomly drawing 5 cards out of a deck of 52 cards?

(3) (foundations: conditional probability)

If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads?

(4) (foundations: Bayes theorem)

A program selects a random integer X like this: a random bit is first generated uniformly. If the bit is 0, X is drawn uniformly from {0, 1, . . . , 7}; otherwise, X is drawn uniformly from {0, −1, −2, −3}.

If we get an X from the program with |X| = 1, what is the probability that X is negative?

(5) (foundations: union/intersection) If P (A) = 0.3 and P (B) = 0.4,

what is the maximum possible value of P (A ∩ B)?

what is the minimum possible value of P (A ∩ B)?

what is the maximum possible value of P (A ∪ B)?

what is the minimum possible value of P (A ∪ B)?

(6) (techniques: mean/variance) Let mean X = 1

N

X

n=1

Xn and variance σ_X² = 1 N − 1

N

X

n=1

(Xn− X)². Prove that

σ_X² = N N − 1

1 N

N

X

n=1

X_n²− X²

! .

2 Linear Algebra

(1) (foundations: rank) What is the rank of





1 2 1 1 0 3 1 1 2



?

(2) (foundations: inverse) What is the inverse of





0 2 4 2 4 2 3 3 1



?

(3) (foundations: eigenvalues/eigenvectors) What are the eigenvalues and eigenvectors of





3 1 1

2 4 2

−1 −1 1



?

1 of 3

(2)

(4) (foundations: singular value decomposition)

(a) For a real matrix M, let M = UΣV^T be its singular value decomposition. Define M^†= VΣ^†U^T, where Σ^†[i][j] =_Σ[i][j]¹ when Σ[i][j] is nonzero, and 0 otherwise. Prove that MM^†M = M.

(b) If M is invertible, prove that M^† = M⁻¹. (5) (foundations: PD/PSD)

A symmetric real matrix A is positive definite (PD) iff x^TAx > 0 for all x 6= 0, and positive semi- definite (PSD) if “>” is changed to “≥”. Prove:

(a) For any real matrix Z, ZZ^T is PSD.

(b) A symmetric A is PD iff all eigenvalues of A are strictly positive.

(6) (foundations: inner product)

Consider x ∈ R^d and some u ∈ R^d with kuk = 1.

What is the maximum value of u^Tx? What u results in the maximum value?

What is the minimum value of u^Tx? What u results in the minimum value?

What is the minimum value of |u^Tx|? What u results in the minimum value?

(7) (foundations: distance)

Consider two parallel hyperplanes in R^d:

H1: w^Tx = +3, H₂: w^Tx = −2,

where w is thenormalvector. What is the distance between H1and H2?

3 Calculus

(1) (foundations: differential and partial differential) Let f (x) = ln(1 + e^−2x). What is df (x)

dx ? Let g(x, y) = e^x+ e^2y+ e^3xy². What is ∂g(x, y)

∂y ? (2) (foundations: chain rule)

Let f (x, y) = xy, x(u, v) = cos(u + v), y(u, v) = sin(u − v). What is ∂f

∂v? (3) (foundations: integral)

What is Z 10

5

2 x − 3dx?

(4) (foundations: gradient and Hessian)

Let E(u, v) = (ue^v− 2ve^−u)². Calculate the gradient

∇E(u, v) =

_∂E

∂E∂u

∂v

and the Hessian

H(u, v) =

∂²E

∂u∂u

∂²E

∂u∂v

∂²E

∂v∂u

∂²E

∂v∂v

!

at u = 1 and v = 1.

(5) (foundations: Taylor’s expansion)

Let E(u, v) = (ue^v− 2ve^−u)². Write down the second-order Taylor’s expansion of E around u = 1 and v = 1.

2 of 3

(3)

(6) (foundations: optimization) For some given A > 0, B > 0, solve

minα Ae^α+ Be^−2α. (7) (foundations: vector calculus)

Let w be a vector in R^d and E(w) = ¹₂w^TAw + b^Tw for some symmetric matrix A and vector b.

Prove that the gradient ∇E(w) = Aw + b and the Hessian ∇²E(w) = A.

(8) (foundations: quadratic programming)

Following the previous question, if A is not only symmetric but also positive definite (PD), prove that the solution of argmin_wE(w) is −A⁻¹b.

(9) (techniques: optimization with linear constraint) Consider

w₁min,w₂,w₃

1

2(w²₁+ 2w₂²+ 3w²₃) subject to w1+ w2+ w3= 11.

Refresh your memory on “Lagrange multipliers” and show that the optimal solution must happen on w₁= λ, 2w₂= λ, 3w₃= λ. Use the property to solve the problem.

(10) (techniques: optimization with linear constraints)

Let w be a vector in R^d and E(w) be a convex differentiable function of w. Prove that the optimal solution to

minw E(w) subject to Aw + b = 0.

must happen at ∇E(w) + λ^TA = 0 for some vector λ. (Hint: If not, let u be the residual when projecting ∇E(w) to the span of the rows of A. Show that for some very small η, w − η · u is a feasible solution that improves E.)

3 of 3