1
Financial Time Series I and Methods of Statistical Prediction
Homework 2: Review on Basics
1. (a) We know that about 40% of the mathematicians in the United States are women.
So the probability that there are 3 or fewer women in a randomly selected group of 11 mathematicians is
X3 x=0
Cx11· 0.4x× 0.611−x= 0.29628
(b) The command of this simulation is as follows.
> sum(rbinom(10000,11,0.4)<=3)/10000
2. (a) We need to do the test:
H0: the distribution of the two year is the same.
Ha: the distribution of the two year is different.
We can use chi-square test here.
X5 i=1
(oi− ei)2
ei = 9.7469 > χ24,0.95= 9.49
We have enough evidence to reject H0 that we can say the two distribution may be different.
(b) We need to test H0: µ1= µ2 vs. Ha: µ16= µ2
Z = 2.975 − 2.97 q
1.332
7646 +1.2422002
= 0.0561 < Z0.025 = 1.96
We don’t have enough evidence to reject H0. In this sample, the mean of 1998 may be the same as it in 1997.
(c) In (b), we conclude that the mean of 1997 is equal to the mean of 1998 statistically.
But by (a), the grade distribution of 1998 is somehow different from the grade distribution of 1997 in the following way. The performance of students in 1998 is more concentrated than what in 1998. There are less students who get poor grades, and more students get higher grades. At the same time, the proportion of students of getting the best grade decreases. This exercise just wants to remind you that the means of two populations are equal does not mean that their populations distribution are the same.
2
3. Those studies only tell us that there exists a negative correlation between hours spent watching TV and scored on reading tests. We can’t make sure that there really exists a causation between these two. Maybe there exists a factor that it is correlative to hours spent watching TV and it will let people get a poor score on reading test. Actually, the factor affect the score on reading test but we don’t know. The factor may be hours spent on reading. In order to find that watching TV would make people less able to read, we need to set up a well-done experiment that can control other effect.
4. If the height of a man’s height is denoted by X, the height of his wife is Y = 0.92X. The correlation of their hight must be 1 because there exists a linear relationship between X and Y .
Corr(X, Y ) = Cov(X, Y ) pV ar(X)p
V ar(Y ) = Cov(X, 0.92X) pV ar(X)p
V ar(0.92X) =0.92 × Cov(X, X) 0.92 × V ar(X) = 1 5. (a) In this problem, we only have 10 rates instead of 15 rates. Then we observed all
10 survival times, (14, 17, 27, 18, 12, 8, 22, 13, 19, 12). Suppose the survival times follow exponential distribution f (x) = 1θexp(−θx ), x ≥ 0.
i. Maximum likelihood estimate The likelihood function will be
L(θ) = Y10 i=1
f (xi) = Y10 i=1
µ1
θexp(−xi
θ)
¶
= (1
θ)10exp(−
P10
i=1xi
θ )
l(ˆθ) = d
dθL(θ)|θ=ˆθ= −10 log ˆθ − P10
i=1xi
θˆ = 0 θ =ˆ
P10
i=1xi
10 = 162 10 = 16.2 The MLE of mean survival time is ˆθ = 16.2.
ii. Method of moment
We need to calculate E(X) and sample mean.
E(X) = θ, P10
i=1xi
10 = 16.2
By the method of moment, the MME of mean survival time is ˜θ = 16.2.
(b) We assume that we observe 10 survival times and 5 truncated times, (14, 17, 27, 18, 12, 8, 22, 13, 19, 12, 30+, 30+, 30+, 30+, 30+). We don’t observe when these 5 rats die. Suppose the
survival times follow exponential distribution f (x) = 1θexp(xθ), x ≥ 0. And define ci is the index function of whether a data is truncated or not.
3
i. Maximum likelihood estimate The likelihood function will be
L(θ) = Y15 i=1
f (xi)(1−ci)[1 − F (xi)]ci
= Y15 i=1
(1
θexp(−xi
θ ))(1−ci)(exp(−xi
θ))ci
= (1
θ)10exp(−
P15
i=1xi
θ )
l(ˆθ) = d
dθL(θ)|θ=ˆθ= −10 log ˆθ − P15
i=1xi
θˆ = 0 θ =ˆ
P15
i=1xi
10 = 312 10 = 31.2 The MLE of mean survival time is ˆθ = 31.2.
ii. Method of moment
E(X) = θ = m1= m01= P15
i=1xi
15 = 20.8 The MME of mean survival time is ˜θ = 20.8.
6. (a) Assume a normal population. Because the sample size only is 20, we need to use t-distribution.
C.I =
·
¯
x ± t0.025,19× s
√n
¸
=
·
87.395 − 2.093 ×0.5175
√20 , 87.395 + 2.093 × 0.5175
√20
¸
= [87.1528, 87.6372]
(b) The command of normal probability plot is as follows.
> qqnorm(data)
7. (a) Base on the definition of confidence interval, the distribution of Y is Binomial distribution with n = 100 and p = 0.95.
(b)
P {Y ≥ 90} = P { Y − 95
√100 × 0.95 × 0.05 ≥ 90 − 0.5 − 95
√100 × 0.95 × 0.05} = P {Z ≥ −2.52} = 0.9941
P {Y ≥ 95} = P { Y − 95
√100 × 0.95 × 0.05 ≥ 95 − 0.5 − 95
√100 × 0.95 × 0.05} = P {Z ≥ −0.22} = 0.5871
8.
E(X) = Z ∞
0
x · 1 Γ(α)· 1
βα· xα−1exp[−x β]dx =
Z ∞
0
1 Γ(α)· 1
βα · x(α+1)−1exp[−x β]dx
4
= β
Γ(α)· Γ(α + 1) Z ∞
0
1
Γ(α + 1)· 1
βα+1· x(α+1)−1exp[−x
β]dx = αβ E(X2) =
Z ∞
0
x2· 1 Γ(α)· 1
βα · xα−1exp[−x β]dx =
Z ∞
0
1 Γ(α)· 1
βα · x(α+2)−1exp[−x β]dx
= β2
Γ(α)· Γ(α + 2) Z ∞
0
1
Γ(α + 2)· 1
βα+2· x(α+2)−1exp[−x
β]dx = α(α + 1)β V ar(X) = E(X2) − (E(X))2= α(α + 1)β − (αβ)2= αβ2
9. We know that fX(x) = (2π)−12·exp(−x22) and Y = g(X) = exp(X) which is a monotone increasing function.
fY(y) = fX(g−1(y))| d
dyg−1(y)|
= (2π)−12 · exp(−(log y)2 2 ) ·1
y