• 沒有找到結果。

2 Linear Algebra

N/A
N/A
Protected

Academic year: 2022

Share "2 Linear Algebra"

Copied!
3
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Foundations/Techniques (NTU, Fall 2020) instructor: Hsuan-Tien Lin

Homework #0

RELEASE DATE: 09/15/2020 DUE DATE: NONE

1 Probability and Statistics

(1) (foundations: combinatorics)

Let C(N, K) = 1 for K = 0 or K = N , and C(N, K) = C(N − 1, K) + C(N − 1, K − 1) for N ≥ 1.

Prove that C(N, K) =K!(N −K)!N ! for N ≥ 1 and 0 ≤ K ≤ N . (2) (foundations: counting)

What is the probability of getting exactly 4 heads when flipping 10 fair coins?

What is the probability of getting a full house (XXXYY) when randomly drawing 5 cards out of a deck of 52 cards?

(3) (foundations: conditional probability)

If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads?

(4) (foundations: Bayes theorem)

A program selects a random integer X like this: a random bit is first generated uniformly. If the bit is 0, X is drawn uniformly from {0, 1, . . . , 7}; otherwise, X is drawn uniformly from {0, −1, −2, −3}.

If we get an X from the program with |X| = 1, what is the probability that X is negative?

(5) (foundations: union/intersection) If P (A) = 0.3 and P (B) = 0.4,

what is the maximum possible value of P (A ∩ B)?

what is the minimum possible value of P (A ∩ B)?

what is the maximum possible value of P (A ∪ B)?

what is the minimum possible value of P (A ∪ B)?

(6) (techniques: mean/variance) Let mean X = 1

N

N

X

n=1

Xn and variance σX2 = 1 N − 1

N

X

n=1

(Xn− X)2. Prove that

σX2 = N N − 1

1 N

N

X

n=1

Xn2− X2

! .

2 Linear Algebra

(1) (foundations: rank) What is the rank of

1 2 1 1 0 3 1 1 2

?

(2) (foundations: inverse) What is the inverse of

0 2 4 2 4 2 3 3 1

?

(3) (foundations: eigenvalues/eigenvectors) What are the eigenvalues and eigenvectors of

3 1 1

2 4 2

−1 −1 1

?

1 of 3

(2)

Machine Learning Foundations/Techniques (NTU, Fall 2020) instructor: Hsuan-Tien Lin

(4) (foundations: singular value decomposition)

(a) For a real matrix M, let M = UΣVT be its singular value decomposition. Define M= VΣUT, where Σ[i][j] =Σ[i][j]1 when Σ[i][j] is nonzero, and 0 otherwise. Prove that MMM = M.

(b) If M is invertible, prove that M = M−1. (5) (foundations: PD/PSD)

A symmetric real matrix A is positive definite (PD) iff xTAx > 0 for all x 6= 0, and positive semi- definite (PSD) if “>” is changed to “≥”. Prove:

(a) For any real matrix Z, ZZT is PSD.

(b) A symmetric A is PD iff all eigenvalues of A are strictly positive.

(6) (foundations: inner product)

Consider x ∈ Rd and some u ∈ Rd with kuk = 1.

What is the maximum value of uTx? What u results in the maximum value?

What is the minimum value of uTx? What u results in the minimum value?

What is the minimum value of |uTx|? What u results in the minimum value?

(7) (foundations: distance)

Consider two parallel hyperplanes in Rd:

H1: wTx = +3, H2: wTx = −2,

where w is thenormalvector. What is the distance between H1and H2?

3 Calculus

(1) (foundations: differential and partial differential) Let f (x) = ln(1 + e−2x). What is df (x)

dx ? Let g(x, y) = ex+ e2y+ e3xy2. What is ∂g(x, y)

∂y ? (2) (foundations: chain rule)

Let f (x, y) = xy, x(u, v) = cos(u + v), y(u, v) = sin(u − v). What is ∂f

∂v? (3) (foundations: integral)

What is Z 10

5

2 x − 3dx?

(4) (foundations: gradient and Hessian)

Let E(u, v) = (uev− 2ve−u)2. Calculate the gradient

∇E(u, v) =

 ∂E

∂E∂u

∂v



and the Hessian

H(u, v) =

2E

∂u∂u

2E

∂u∂v

2E

∂v∂u

2E

∂v∂v

!

at u = 1 and v = 1.

(5) (foundations: Taylor’s expansion)

Let E(u, v) = (uev− 2ve−u)2. Write down the second-order Taylor’s expansion of E around u = 1 and v = 1.

2 of 3

(3)

Machine Learning Foundations/Techniques (NTU, Fall 2020) instructor: Hsuan-Tien Lin

(6) (foundations: optimization) For some given A > 0, B > 0, solve

minα Aeα+ Be−2α. (7) (foundations: vector calculus)

Let w be a vector in Rd and E(w) = 12wTAw + bTw for some symmetric matrix A and vector b.

Prove that the gradient ∇E(w) = Aw + b and the Hessian ∇2E(w) = A.

(8) (foundations: quadratic programming)

Following the previous question, if A is not only symmetric but also positive definite (PD), prove that the solution of argminwE(w) is −A−1b.

(9) (techniques: optimization with linear constraint) Consider

w1min,w2,w3

1

2(w21+ 2w22+ 3w23) subject to w1+ w2+ w3= 11.

Refresh your memory on “Lagrange multipliers” and show that the optimal solution must happen on w1= λ, 2w2= λ, 3w3= λ. Use the property to solve the problem.

(10) (techniques: optimization with linear constraints)

Let w be a vector in Rd and E(w) be a convex differentiable function of w. Prove that the optimal solution to

minw E(w) subject to Aw + b = 0.

must happen at ∇E(w) + λTA = 0 for some vector λ. (Hint: If not, let u be the residual when projecting ∇E(w) to the span of the rows of A. Show that for some very small η, w − η · u is a feasible solution that improves E.)

3 of 3

參考文獻

相關文件

If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads. (4)

If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads. (4)

If your friend flipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads. (4)

(3%) (c) Given an example shows that (a) may be false if E has a zero divisors. Find the invariant factors of A and φ and their minimal polynomial. Apply

Once you get down to a purely business level, your influence is gone and the true light of your life isdimmed. You must work in the missionary spirit, with a breadth of charity

Case 2: If  6= 0 but any one of , ,  is zero, the first three equations imply that all three coordinates must be zero, contradicting the fourth equation.. All

Example 1.5.9 (identically distributed random variables) Consider the experiment of toss- ing a fair coin

(18%) Determine whether the given series converges or diverges... For what values of x does the series