2LinearAlgebra 1ProbabilityandStatistics Homework#0

(1)

Machine Learning (NTU, Fall 2011) instructor: Hsuan-Tien Lin

Homework #0

RELEASE DATE: 09/13/2011 DUE DATE: NONE

1 Probability and Statistics

(1) (combinatorics)

Let C(N; K) = 1 for K = 0 or K = N, and C(N; K) = C(N 1; K) + C(N 1; K 1) for N 1.

Prove that C(N; K) =_{K!(N K)!}^N! for N 1 and 0 K N.

(2) (counting)

What is the probability of getting exactly 6 heads when ipping 10 fair coins?

What is the probability of getting a full house (XXXYY) when randomly drawing 5 cards out of a deck of 52 cards?

(3) (conditional probability)

If your friend ipped a fair coin three times, and tell you that one of the tosses resulted in head, what is the probability that all three tosses resulted in heads?

(4) (Bayes theorem)

A program selects a random integer X like this: a random bit is rst generated uniformly. If the bit is 0, X is drawn uniformly from f0; 1; : : : ; 7g; otherwise, X is drawn uniformly from f0; 1; 2; 3g.

If we get an X from the program with jXj = 1, what is the probability that X is negative?

(5) (union/intersection)

If P (A) = 0:3 and P (B) = 0:4,

what is the maximum possible value of P (A \ B)?

what is the minimum possible value of P (A \ B)?

what is the maximum possible value of P (A [ B)?

what is the minimum possible value of P (A [ B)?

(6) (mean/variance) Let mean X = 1

N XN n=1

Xn and variance _X² = 1

N 1

XN n=1

(Xn X)². Prove that

_X² = N

N 1

1 N

XN n=1

X_n² X²

! :

(7) (Gaussian distribution)

If X₁and X₂are independent random variables, where p(X₁) is Gaussian with mean 2 and variance 1, p(X₂) is Gaussian with mean 3 and variance 4. Let Z = X₁+ X₂. Prove p(Z) is Gaussian, and determine its mean and variance.

2 Linear Algebra

(1) (rank)

What is the rank of 0

@ 1 2 1 1 0 3 1 1 2

1 A ?

(2) (inverse)

What is the inverse of 0

@ 0 2 4 2 4 2 3 3 1

1 A ?

1 of 3

(2)

(3) (eigenvalues/eigenvectors)

What are the eigenvalues and eigenvectors of 0

@ 3 1 1

2 4 2

1 1 1

1 A ? (4) (singular value decomposition)

For a real matrix M, let M = UV^T be its singular value decomposition. Dene M^y= V^yU^T, where

^y[i][j] =_[i][j]¹ when [i][j] is nonzero, and 0 otherwise. Prove that MM^yM = M.

(5) (PD/PSD)

A symmetric real matrix A is positive denite (PD) i x^TAx > 0 for all x 6= 0, and positive semi- denite (PSD) if \>" is changed to \". Prove:

(a) For any real matrix Z, ZZ^T is PSD.

(b) A is PD i all eigenvalues of A are strictly positive.

(6) (inner product)

Consider x 2 R^d and some u 2 R^d with kuk = 1.

What is the maximum value of u^Tx?

What is the minimum value of u^Tx?

What is the minimum value of ju^Txj?

(7) (distance)

Consider two parallel hyperplanes in R^d:

H1: w^Tx = +3;

H₂: w^Tx = 2;

where w is the norm vector. What is the distance between H₁ and H₂?

3 Calculus

(1) (dierential and partial dierential) Let f(x) = ln(1 + e ^2x). What is df(x)

dx ? Let g(x; y) = e^x+ e^2y+ e^3xy². What is @g(x; y)

@y ? (2) (chain rule)

Let f(x; y) = xy, x(u; v) = cos(u + v), y(u; v) = sin(u v). What is @f

@v? (3) (integral)

What is Z ₁₀

5

2 x 3dx?

(4) (gradient and Hessian)

Let E(u; v) = (ue^v 2ve ^u)². Calculate the gradient rE and the Hessian r²E at u = 1 and v = 1.

(5) (Taylor's expansion)

Let E(u; v) = (ue^v 2ve ^u)². Write down the second-order Taylor's expansion of E around u = 1 and v = 1.

(6) (optimization)

For some given A > 0; B > 0, solve

min Ae+ Be ²: (7) (vector calculus)

Let w be a vector in R^d and E(w) = ¹₂w^TAw + b^Tw for some symmetric matrix A and vector b.

Prove that the gradient rE(w) = Aw + b and the Hessian r²E(w) = A.

2 of 3

(3)

(8) (quadratic programming)

Following the previous question, if A is not only symmetric but also positive denite (PD), prove that the solution of argmin_wE(w) is A ¹b.

(9) (optimization with linear constraint) Consider

w1min;w2;w3

1

2(w²₁+ 2w₂²+ 3w²₃) subject to w₁+ w₂+ w₃= 11:

Refresh your memory on \Lagrange multipliers" and show that the optimal solution must happen on w1= , 2w2= , 3w3= . Use the property to solve the problem.

(10) (optimization with linear constraints)

Let w be a vector in R^d and E(w) be a convex dierentiable function of w. Prove that the optimal solution to

minw E(w) subject to Aw + b = 0:

must happen at rE(w) + ^TA = 0 for some vector . (Hint: If not, let u be the residual when projecting rE(w) to the span of the rows of A. Show that for some very small , w u is a feasible solution that improves E.)

3 of 3