Topic 3: Tests in Parametric Models
Hypothesis Testing By Likelihood Methods
• Let H0 denote a null hypothesis to be tested. Typically, we may represent H0
as a specified family F0 of distributions for the data.
• For any test procedure T , we shall denote by Tnthe version based on a sample of size n.
• The function
βn(T, F ) = PF(Tn rejects H0),
defined for distribution function F , is called the power function of Tn (or of T ).
– For F ∈ F0, βn(T, F ) represents the probability of a Type I error.
– The quantity
αn(T, F ) = sup
F ∈F0
βn(T, F ) is called the size of the test.
– For F 6∈ F0, the quantity 1 − βn(T, F ) represents the probability of a Type II error.
• Usually, attention is confined to consistent tests: for fixed F 6∈ F0, βn(T, F ) → 0 as n → ∞.
• Also, usually attention is confined to unbiased tests: for F 6∈ F0, βn(T, F ) ≥ αn(T, F0).
A general way to compare two such test procedures is through their power func- tions. In this regard we shall use the concept of asymptotic relative efficiency (ARE).
• For two test procedures TA and TB, suppose that a performance criterion is tightened in such a way that the respective sample sizes nA and nB for TA and TB to perform “equivalently” tend to ∞ but have ratio nA/nB tending to some limit. Then the limit represents the ARE of procedure TB relative to procedure TA and is denoted by e(TB, TA).
• The earliest approach to ARE was introduced by Pitman (1949). In this ap- proach, two tests sequences T = {Tn} and U = {Un} are compared as the Type I and Type II error probabilities tend to positive limits α and 1 − β, respectively.
• In order that αn → α > 0 and simultaneously 1 − βn → 1 − β > 0, it is necessary to consider βn(·) evaluated at an alternative F(n) converging at a suitable rate to the null hypothesis F0.
• In justification of this approach, we might argue that large sample sizes would be relevant in practice only if the alternative of interest were close to the null hypothesis and thus hard to distinguish with only a small sample.
To demonstrate the above point, we consider the following example.
Example 3.11 Let X1, . . . , Xn be iid with X1 ∼ N (µ, 1).
• Test H0 : µ = 0 versus H1 : µ = µ0 > 0.
• Construct a test with α = 0.05 and β = 0.2005.
• Reject H0 if√
n ¯Xn > 1.645.
• Note that
β = P (√
n ¯Xn ≤ 1.645|µ = µ0) = Φ(1.645 −√ nµ0).
• If n → ∞ and µ0 is a fixed positive constant, β → 0.
• To ensure β = 0.2005, it requires that 1.645 −√
nµ0 = −0.84 or µ0 = 2.485n−1/2.
• Do you notice that µ0will change with n which is no longer a fixed alternative?
Test Statistics for A Simple Null Hypothesis
Although the theory of the following three tests are of most value for composite null hypotheses, it is convenient to begin with simple null hypothesis. Consider testing H0 : θ = θ0 ∈ Rs versus H1 : θ 6= θ0.
Likelihood Ratio Test
• A likelihood ratio statistic,
Λn = L(θ0; x) supθ∈ΘL(θ; x) was introduced by Neyman and Pearson (1928).
• Λn takes values in the interval [0, 1] and H0 is to be rejected for sufficiently small values of Λn.
• The rationale behind LR tests is that when H0 is true, Λn tends to be close to 1, whereas when H1 is true, Λn tends to be close to 0,
• The test may be carried out in terms of the statistic λn = −2 log Λn.
• For finite n, the null distribution of λn will generally depend on n and on the form of pdf of X.
• LR tests are closely related to MLE’s.
• Denote MLE by ˆθ. For asymptotic analysis, expanding λn at ˆθ in a Taylor series, we get
λn = −2
−
n X i=1
log f (Xi, ˆθ) +
n X i=1
log f (Xi, θ0)
= 2
1
2(θ0 − ˆθ)T
−
n X i=1
∂2
∂θj∂θk log f (x; θ)
θ=θ∗
(θ0 − ˆθ)
, where ˆθ lies between ˆθ and θ0.
• Since θ∗ is consistent, λn = n(ˆθ − θ0)T
−1 n
n X i=1
∂2
∂θj∂θkL(θ)
θ=θ0
(ˆθ − θ0) + oP(1).
By the asymptotic normality of ˆθ and
−n−1
n X i=1
∂2
∂θj∂θk L(θ)|
θ=θ0
→ I(θP 0),
λn has, under H0, a limiting chi-squared distribution on s degrees of freedom.
Example 3.12 Consider the testing problem H0 : θ = θ0 versus H1 : θ 6= θ0 based on iid X1, . . . , Xn from the uniform distribution U (0, θ).
• L(θ0; x) = θ0−n1{x(n)<θ0}
• ˆθ = x(n) (MLE) and supθ∈ΘL(θ; x) = x−n(n)1{x(n)<θ}
• We have
Λn =
(X(n)/θ0)n X(n) ≤ θ0 0 X(n) > θ0
• Reject H0 if X(n) > θ0 or X(n)/θ0 < c1/n.
• What is the asymptotic distribution of λn?
• What is P (n log(X(n)/θ0) ≤ c) where c < 0? It is not a χ2 distribution.
(Why???)
Example 3.13 Consider the testing problem H0 : σ2 = σ20 versus H1 : σ2 6= σ20 based on iid X1, . . . , Xn from the normal distribution N (µ0, σ2).
• L(θ0; x) = (2πσ20)−n/2exph−Pi(xi− µ0)2/2σ02i
• ˆσ2 = n−1 Pi(xi − µ0)2 (MLE) and sup
θ∈Θ
L(θ; x) = (2πˆσ2)−n/2exp(−n/2).
• We have
Λn =
ˆ σ2 σ02
n/2
exp
n 2 −
Pi(xi− µ0)2 2σ02
or under H0
λn = −n
ln
1 n
n X i=1
Zi2
−
1 −
1 n
n X i=1
Zi2
, where Z1, . . . , Zn are iid N (0, 1).
• Fact: Using CLT, we have
n−1 Pni=1Zi2 − 1
q2/n
→ N (0, 1)d
or
n 2
1 n
n X i=1
Zi2 − 1
2
→ χd 21.
• Note that ln u ≈ −(1 − u) − (1 − u)2/2 when u is near 1 and n−1 Pni=1Zi2 → 1 in probability by LLN.
• A common question to be asked in Taylor’s series approximation is that how many terms we should consider. In this example, it refers to the use of approx- imation ln u ≈ −(1 − u) as a contrast to the second order approximation we use. If we do use the first order approximation, we will end up the difficulty of finding limnanbn when limnan = ∞ and limnbn = 0.
• We conclude that λn has a limiting chi-squared distribution with 1 degree of freedom.
The Wald Test
• Let ˆθn denote a consistent, asymptotically normal, and asymptotically effi- cient sequence of solutions of the likelihood equations.
√n(ˆθn − θ) → N (0, Id −1(θ)) as n → ∞.
• Because I(θ) is continuous in θ, we have I( ˆθn) → I(θ)P as n → ∞.
• Replace the matrix
−n1 Pni=1 ∂θ∂2
j∂θk L(θ)|θ=θ0
by I(ˆθn) in large sample ap- proximation of λn, we get a second statistic,
Wn = n(ˆθn − θ0)TI(ˆθn)(ˆθn − θ0), which was introduced by Wald (1943).
• By Slutsky’s theorem, Wn converges in distribution to χ2s.
• For the construction of confidence region, one generates {θ0 : Wn ≤ χ2s,α} which is an ellipsoid in Rs.
• As a remark, for the construction of confidence region based on λn, one gen- erates {θ0 : λn ≤ χ2s,α} which is not necessary an ellipsoid in Rs.
The Rao Score Tests
• Both the Wald and likelihood ratio tests requires evaluation of ˆθn. Now we consider a test for which this is not necessary.
• Denote the likelihood score vector
q(x; θ) = (q1(x; θ), . . . , qs(x; θ))T where
qj(x; θ) = ∂
∂θj log f (x; θ).
• Write Q(θ) = Pni=1q(Xi; θ). By the central limit theorem, n−1/2Q(θ0) → N (0, I(θd 0)).
• A third statistic,
Vn = [n−1/2Q(θ0)]TI−1(θ0)[n−1/2Q(θ0)] = n−1Q(θ0)TI−1(θ0)Q(θ0), was introduced by Rao (1948).
Again, it has a limiting χ2s distribution.
Example 3.14 Consider a sample X1, . . . , Xn from the logistic distribution with density
fθ(x) = ex−θ (1 + ex−θ)2.
• q(x; θ) = −1 + 2ex−θ/(1 + ex−θ) and Q(θ0) = −n + 2
n X i=1
exi−θ0 1 + exi−θ0.
• I(θ) = 1/3 for all θ.
• The Rao scores test therefore rejects H0 with test statistic
v u u t
3 n
n X i=1
exi−θ0 − 1 1 + exi−θ0.
• In this case, the MLE does not have an explicit expression and therefore the Wald and likelihood ratio tests are less convenient.
The three test statistics we discuss are asymptotically equivalent under H0. How- ever, they do differ in computation and ease of interpretation.
• All three statistics have the same limit chi-squared distribution with degree of freedom s under the null hypothesis. The limiting distribution can be found by the following lemma.
Lemma 1 Under regularity conditions, (i) n1/2(ˆθn− θ0) → N (0, Id −1(θ0));
(ii) n(ˆθn − θ0)TI(θ0)(ˆθn− θ0) → χd 2s; (iii) n−1Q(θ0)TI−1(θ0)Q(θ0) → χd 2s;
(iv) λn− n(ˆθn − θ0)TI(θ0)(ˆθn − θ0) → 0.P
• Both the likelihood ratio test and the Wald test require calculating an efficient estimator ˆθn, while the Rao test does not and is therefore the most convenient from the computational point of view.
• The Wald test, being based on the studentized difference I1/2(ˆθn)[√
n(ˆθn − θ)T]
is more easily interpretable and has the advantage immediately yields confi- dence regions for θ.
• The Wald test has the drawback, not shared by the other two, that it is only asymptotically but not exactly invariant under reparametrization.
For simplicity, consider s = 1 and η = g(θ). Here we assume that g is differentiable and strictly increasing. The Wald statistic for testing η = η0(=
g(θ0)) is
[g(ˆθn) − g(θ0)]qnI(ˆηn) =
r
nI(ˆθn)(ˆθn − θ0)g(ˆθn) − g(θ0)
θˆn− θ0 · 1 g(1)(ˆθn). The product of the second and third factor tends to 1 as ˆθn → θ0 but typically will differ from 1 for finite n.
Example 3.15 Consider a sequence of n independent trials, with s possible out- comes for each trials.
• Let θj denote the probability of occurrence of the jth outcome in any given trial.
• Let Nj denote the number of occurrences of the jth outcome in the series of n trials.
• The MLE of θj’s are Nj/n.
• The three test statistics λn, Wn and Vn for testing H0 : θ = θ0 against H1 : θ 6= θ0 are easily seen to be
λn = 2
s X j=1
Njlog( Nj nθj0), Wn =
s X j=1
(Nj − nθj0)2 Nj , Vn =
s X j=1
(Nj − nθj0)2 nθj0 .
• Both Wn and Vn are referred to as chi-squared goodness of fit statistics; the latter often called the Pearson chi-squared distribution. The large sample prop- erties was first derived by Pearson (1900).
Pearson’s chi-square statistic is easily remembered as χ2 = sum(Observed − Expected)2
Expected .
• Let us now consider the behavior of λn, Wn and Vn under “local” alternatives, that is, for a sequence {θn} of the form
θn = θ0 + n−1/2∆, where ∆ = (∆1, . . . , ∆s)T.
• Suppose that the convergences expressed in the above lemma may be estab- lished uniformly in Θ for θ in a neighborhood of θ0.
• It then would follow that
n1/2(ˆθ − θ0) = n1/2(ˆθ − θn) + ∆ → N (∆, Id −1(θ0)), n−1/2Q(θ0) = n1/2(ˆθ − θn)I(θ0) + oP
θn
(1) → N (I(θd 0)∆, I(θ0)), and
λn− Wn
Pθn
→ 0,
• It then follow that the statistics λn, Wn and Vn each converge in distribution to χ2s(∆TI(θ0)∆).
• Therefore, under appropriate regularity conditions, the statistics λn, Wn and Vnare asymptotically equivalent in distribution, both under the null hypothesis and under local alternatives converging sufficiently fast.
• However, at fixed alternatives these equivalences are not anticipated to hold.
Example 3.16 (Testing a Genetic Theory)
• In experiments on pea breading, Mendel observed the different kinds of seeds obtained by crosses from peas with round yellow seeds and peas with wrinkled green seeds.
• Possible types of progeny were: (1) round yellow; (2) wrinkled yellow; (3) round green; and (4) wrinkled green.
• Assume the seeds are produced independently. We can think of each seed as being the outcome of a multinomial trial with possible outcomes numbered 1, 2, 3, 4 as above and associated probabilities of occurrence θ1, θ2, θ3, θ4.
• Mendel’s theory predicted that θ1 = 9/16, θ2 = θ3 = 3/16, θ4 = 1/16.
• Data: n = 556, n1 = 315, n2 = 101, n3 = 108, n4 = 32.
• Pearson’s chi-square statistic is (315 − 556 × 9/16)2
312.75 + (3.25)2
104.25 + (3.75)2
104.25 + (2.75)2
34.75 = 0.47, which has a p value of 0.9 when referred to a χ23 table.
There is insufficient evidence to reject Mendel’s hypothesis. (Why don’twe state that we accept Mendel’s hypothesis?)
Topic 5: Tests in Nonparametric Models
Sign, permutation, and rank tests
• In a nonparametric problem, a UMP, UMPU, or UMPI test usually does not exist.
• Nonparametric tests are derived using some intuitively appealing ideas. They are commonly referred to as distribution-free tests, since almost No assump- tion is imposed on the population under consideration.
• Sign test:
– Let X1, . . . , Xn be iid random variables from F , u be a fixed constant, and p = F (u).
– Consider the problem of testing H0 : p ≤ p0 versus H1 : p > p0, or testing H0 : p = p0 versus H1 : p 6= p0, where p0 is a fixed constant in (0, 1).
– Let 4I = 1Xi−u≤0, i = 1, . . . , n. Then 41, . . . , 4n are iid binary random variables with p = P (4i = 1).
– For testing H0 : p ≤ p0 versus H1 : p > p0, it follows from Neymann- Pearson lemma and monotone likelihood ratio that the test
T∗(Y ) =
1 Y > m γ Y = m 0 Y < m
is UMP among tests based on 4i’s, where Y = Pni=14i. – For testing H0 : p = p0 versus H1 : p 6= p0, the test
T∗(Y ) =
1 Y < c1 or Y > c2 γi Y = ci, i = 1, 2,
0 c1 < Y < c2
is UMPU among tests based on 4i’s.
– Since Y is equal to the number of nonnegative signs of (u − Xi)’s, tests based on T∗ are called sign tests.
– One can easily extend the sign tests to the case where p = P (X1 ∈ B).
• Let (X1, Y1), . . . , (Xn, Yn) (matched pairs) be iid random variables from F . By using 4i = Xi−Yi−u, one can obtain sign tests for hypotheses concerning P (X1 − Y1 ≤ u).
• Permutation tests:
– Let Xi1, . . . , Xini, i = 1, 2, be two independent samples iid from Fi, i = 1, 2, respectively. Here Fi’s are cdf’s on R.
– Think of two-sample problem in parametric setting (normal). Such type of problems arise from the comparison of two treatments.
– Remove the parametric assumption and assume that Fi’s are in t he non- parametric family F containing all continuous cdf’s on R.
– Consider the problem of testing
H0 : F1 = F2 versus H1 : F1 6= F2.
– Let X = (Xij, j = 1, . . . , ni, i = 1, 2), n = n1 + n2, and α be a given significance level. A test T (X) satisfying
1 n!
X z∈π(x)
T (z) = α
is called a permutation test, where π(x) is the set of n! points obtained from x ∈ Rn by permuting the components of x.
• For rank tests, we only consider Wilcoxon rank-sum test.
Rank Tests for Comparing Two Treatments
• For comparing a new treatment or procedure with the standard method, N subjects (patients, students, etc.) are divided at random into a group of n who will receive a new treatment and a control group of m who will be treated by the standard method.
• At the termination of the study, the subjects are ranked either directly or ac- cording to some response that measures the success of the treatment such as a test score in an educational or pyschological investigation.
• The hypothesis H0 of no treatment effect is rejected, and the superiority of the new treatment acknowledged, if the ranking the n treated subjects rank suffi- ciently high. (Here it is assumed that the success of the treatment is indicated by an increased response; if instead the aim is to decrease the response, H0 is rejected when the n treated subjects rank sufficiently low.)
• Let the ranks of the treated subjects be denoted by S1, . . . , Sn, where we shall assume that they are numbered in increasing order. Denote the sum of the treatment ranks WS = S1 + · · · + Sn.
• The hypothesis H0 is then rejected and the treatment judged to be effective when WS is sufficiently large, say, when WS ≥ c. Here the constant c is determined by the equation
PH0(WS ≥ c) = α.
• The test defined above is known as the Wilcoxon rank-sum test.
• Let X1, . . . , Xmand Y1, . . . , Ynbe independent, the X’s identically distributed with distribution F and the Y ’s identically distributed with distribution G.
Here the Y ’s are responses to a treatment.
• Then H0 : F = G and Ha : Y is stochastically larger than X, i.e., G(t) ≤ F (t) for all t but G 6= F .
• Let the ranks of the X0s be denoted by R1, . . . , Rm. If we substitute R0s for X’s and S’s for Y ’s in the two-sample t-test statistic, we obtain
nm N
!1/2 1
n Pn
i=1Si − m1 Pmj=1Rj
(N − 2)−1h{Pni=1(Si − N +12 )2 +Pmj=1(Rj − N +12 )2i}1/2.
• This statistic is equivalent to the Wilcoxon statistic WS, the sum of the ranks of the treatment group.
– Write WXY as the number of pairs (Xi, Yj) with Xi < Yj. – It can be shown that
WS − 1
2n(n + 1) = WXY. – WXY is usually known as the Mann-Whitney statistic.
– Let φ(Xi, Yj) = 1 if Xi < Yj, and 0 otherwise. Then WXY =
m X i=1
n X j=1
φ(Xi, Yj) (1)
– We shall prove that WXY is asymptotically normal as m and n tend to infinity.
• The method of proof consists in replacing the variable WXY by a sum of in- dependent random variables, which is asymptotically equivalent to WXY and to which the central limit theorem can then be applied.
• It is natural for this purpose to try a sum of the form S =
m X i=1
ai(Xi) +
n X j=1
bj(Yj) (2)
but how should one choose the functions ai and bj?
• The following “projection mathod” introduced in a different context by Hajek (1961), produces the ai and bj most likely to succeed in the sense of minimiz- ing E(WXY − S)2.
• This approach is due to Hoeffding (1948), and is applicable to a large class of statistics, the so-called U-statistics.
– Note that
θ(F, G) = Z F dG = P (X ≤ Y ).
– An unbiased estimator of θ(F, G) is
U = 1 nm
m X i=1
n X j=1
I(Xi ≤ Yj), which is the WXY.
– A statistic can be written in the form is called a U-statistics.
– Note that the popularity of this projection method is due to Hajek (1968), who gives the following result.
Lemma 2 (Hoeffding) Let Z1, . . . , Zn be independent random variables and S = S(Z1, . . . , Zn) any statistic satisfying E(S2) < ∞. Then the random variable
S∗ =
n X i=1
E(S|Zi) − (n − 1)E(S) satisfies E(S∗) = E(S) and
E(S − S∗)2 = V ar(S) − V ar(S∗).
Remarks:
1. The random variables S∗ is called the projection of S on Z1, . . . , Zn. 2. Note that it is conveniently a sum of independent and identically dis-
tributed random variables.
3. In cases that E(S − S∗)2 → 0 at a suitable rate as n → ∞, the asymptotic normality of S may be established by applying classical theory to S∗. Proof of Hoeffding’s Lemma.
• Without loss of generality, we can assume that E(S) = 0.
• Consider the problem of finding the sum T =
n X i=1
ki(Zi) (3)
for which E(S − T )2 is as small as possible; the minimizing T may be con- sidered the “projection” of S onto the linear space formed by the functions T .
• Let
ri(zi) = E(S|Zi = zi) (4) be the conditional expectation of S given Zi = zi, and let
S∗ =
n X i=1
ri(Zi). (5)
That S∗ is the desired minimizing function is an immediate consequence of the following identity, which holds for all statistics T and S with mean zero and satisfying (3) for which the required expectation exist:
E(S − T )2 = E(S − S∗)2 + E(S∗ − T )2. (6)
• To prove the above identity, write
E(S − T )2 = E[(S − S∗) + (S∗ − T )]2.
– Squaring the right-hand side proves (6) if it can be shown that
E[(S − S∗)(S∗ − T )] = 0. (7) – Since the left-hand side of (7) is the sum of the expectations of
[ri(Zi) − ki(Zi)](S − S∗) (8) it is enough to show that the expectation of (8) given Zi is zero for all i.
– We shall prove this by showing that the conditional expectation of (8) given Zi is zero.
– In the conditional expectation of this product, the first factor can be taken out of the expectation sign since it depends only on Zi, so that it is finally only necessary to show that the conditional expectation of S − S∗ given Zi is zero.
– Now
E[(S − S∗)|Zi] = E{S − ri(Zi) −X
j6=i
rj(Zj)|Zi}.
– From the definition of ri(Zi), it is seen that the conditional expectation of S − ri(Zi) given Zi is zero.
– On the other hand, since Zi and Zj are independent, the conditional ex- pectation of rj(Zj) given Zi is equal to the unconditional expectation of rj(Zj), which by the definition of rj is equal to E(S) and hence equal to zero.
– This completes the proof of (7) and therefore of (6).
• A useful special case of (6) is obtained by putting T = 0, which gives after arrangement
E(S − S∗)2 = E(S2) − E(S∗2) = V ar(S) − V ar(S∗). (9)
• Before we apply Hoeffeding lemma to the WXY-statistic (1), we will calculate the expectation and variance of WXY.
• Set θ = (F, G),
Eθ[φ(X, Y )] = Pθ[X < Y ] and we obtain
Eθ(WXY) = mnp (10)
where p = Pθ[X < Y ].
• Similarly, we have
V arθ(WXY) = nmp(1−p)+nm(n−1)(q1−p2)+nm(m−1)(q2−p2) (11) where q1 = Pθ[X1 < min(Y1, Y2)] and q2 = Pθ[Y1 > max(X1, X2)].
• Note that under H0, if F is continuous, p = 1/2 while q1 = q2 = 1/3, since, among three independent identically distributed variables, each one is equally likely to be the minimum or the maximum.
• We then have Eθ(WXY) = mn/2 and V arθ(WXY) = mn(N + 1)/12 under H0.
• Put
ψ(x, y) = φ(x, y) − p. (12)
Note that
E[ψ(Xα, Yβ)|Xi = x] =
Eψ(x, Yβ) if α = i
0 if α 6= i
and
E[ψ(Xα, Yβ)|Yj = y] =
Eψ(Xα, y) if β = j
0 if β 6= j
• Put ψ10(x) = EYψ(x, Y ) and ψ01(y) = EXψ(X, y).
• The projection of WXY − mnp by Hoeffeding Lemma is nPmi=1ψ10(Xi) + mPnj=1ψ01(Yj). Consider
U = √ m
1 m
m X i=1
ψ10(Xi) + 1 n
n X j=1
ψ01(Yj)
and S = √
m[(mn)−1WXY − p].
• Note that
V ar(S) → q1 − p2 + m
n(q2 − p2), V ar(U ) = V ar(ψ10(X)) + m
nV ar(ψ01(Y )), E(S − U )2 = V ar(S) − V ar(U ).
Observe that for j 6= k, V ar(ψ10(X)) = q1 − p2 and V ar(ψ01(Y ) = q2 − p2. (i.e. Eψ(x1, Yj)ψ(x1, Yk) = [ψ10(x1)]2 and
EX[ψ10(X)]2 = Eψ(X, Yj)ψ(X, Yk) = Cov(ψ(X, Yj), ψ(X, Yk)).
We then conclude that E(S − U )2 → 0.
Theorem 1 Suppose that F and G are continuous and that 0 < Pθ[X < Y ] < 1.
Then S − Eθ(S)
qV arθ(S)
→ N (0, 1) as min(n, m) → ∞.d
Remark. Reject H0 when
WXY − 12nm
q 1
12nm(N + 1) ≥ z(1 − α).
Pitman efficiency of the Wilcoxon rank-sum test to the two-sample t-test We turn now to the comparison of the performance of the Wilcoxon and two-sample t tests. At first sight it would appear that a good reason for using the Wilcoxon is that it has a guaranteed probability of type I error and a good reason against using the Wilcoxon is its inefficient use of the data.
• We assume that the X’s and Y ’s have the same variance σ2 and means µ1 and µ2.
• Although the t test does not have a guaranteed probability of type I error, if n and m are moderately large, H0 is true, and F has a finite second moment, then the probability of type I error of the t test is fairly close to that specified by the normal model.
• Recall that the two-sample t statistic is given by T =
snm N
Y − ¯¯ X
s2 (13)
where
s2 =
Pm
i=1(Xi − ¯X)2 + Pnj=1(Yj − ¯Y )2
N − 2 . (14)
• We start by obtaining an approximation to the critical value and power of the t test. Note that s22 → σP 2 as min(n, m) → ∞. It follows from Slutsky’s theorem and central limit theorem that when µ1 = µ2, T converges in law to a N (0, 1) random variable as min(n, m) → ∞.
• Then the t test that rejects H0 when T ≥ tN −2(1 − α) has approximately level α regardless of the shape of F and G and z(1 − α) is an approximate critical value as we claimed above.
• If µ1 6= µ2, let δ = (µ2 − µ1)/σ. Then, arguing as above, if qnm/N δ stays bounded T − qnm/N δ has approximately a N (0, 1) random distribution for all F and G with σ2 < ∞. We then can approximate the probability Pθ(T ≥ tN −2(1 − α)) by
βT = Pθ[T ≥ z(1−α)] = 1−Φ(z(1−α)−qnm/N δ) = Φ(z(α)+qnm/N δ).
• For Wilcoxon test, βN = Pθ
WXY ≥ 1
2nm + z(1 − α)
v u u t
1
12nm(N + 1)
= Pθ
WXY − Eθ(WXY)
qvarθ(WXY) ≥ nm(12 − p) + z(1 − α)q 112nm(N + 1)
qvarθ(WXY)
≈ Φ
nm(12 − p) + z(1 − α)q 112nm(N + 1)
qvarθ(WXY)
.
• Consider the case that X ∼ N (µ1, σ2), Y ∼ N (µ2, σ2), n = m and α = 0.05.
Note that δ = (µ2 − µ1)/σ = 0.5.
• Suppose we want to have β = 0.9.
For t-test, solve
−1.645 +
v u u t
N 2
N 2
N 0.5 = 1.282 and get N = 16 · (2.927)2 ≈ 140.
For Wilcoxon test:
p = Pθ(X < Y ) = Φ µ2 − µ1
√2 σ
!
, q1 = P Z1 < ∆
√2, Z2 < ∆
√2
!
, q2 = P Z1 < ∆
√2, Z3 < ∆
√2
!
, where Z1 = [X1 − Y1− (µ1− µ2)]/√
2σ, Z2 = [X1 − Y2− (µ1− µ2)]/√ 2σ, Z3 = [X2 − Y1 − (µ1 − µ2)]/√
2σ.
• Note that (Z1, Z2) ∼ N (0, 0, 1, 1, 1/2), (Z1, Z3) ∼ N (0, 0, 1, 1, 1/2). When
∆ = 0.5, p = 0.638, q1 = q2 = 0.483, we have βW ≈ Φ(−1.729 + 0.355qN/2) = 0.9. Hence, N ≈ 144.