Financial Time Series I Topic 4: Discrete Data: Contingency Tables Hung Chen Department of Mathematics National Taiwan University 10/30/2002

36  Download (0)

Full text

(1)

Financial Time Series I

Topic 4: Discrete Data:

Contingency Tables

Hung Chen

Department of Mathematics National Taiwan University

10/30/2002

(2)

OUTLINE 1. Probability Model

– Mean and Variance – Limiting Distribution

– Mining of association rules in Basket Analysis

2. Goodness of Fit Test

– Embedding and Nested Models – Test of Independence

3. Logistic Regression for Binary Response – Logit or Probit

– Likelihood Equation – Likelihood Ratio Test

4. Tests for A Simple Null Hypothesis – Likelihood Ratio Test

– Wald Test

– Rao’s Score test

(3)

Contingency Tables

We start with a probability model to describe the data summarized in terms of contingency table.

• Consider a sequence of n independent trials, with k possible outcomes for each trial.

For a 2 × 2 table, k = 4 and n is the total number of observations.

• Let pj denote the probability of occurrence of the jth outcome in any given trial (Pk1 pj = 1).

• Let nj denote the number of occurrences of the jth outcome in the series of n tri- als (Pki nj = n). (n1, . . . , nk) is called the

“cell frequency vector” associated with the n trials.

• The exact distribution of (n1, . . . , nk) is the multinomial distribution M N (n, p) where p = (p1, . . . , pk).

• E(ni) = npi, V ar(ni) = npi(1 − pi) and Cov(ni, nj) = −npipj, so that E(n1, . . . , nk) = np, Cov((n1, . . . , nk)) = n(Dp−ptp), where Dp = diag(p).

(4)

• Let ˆp = n−1(n1, . . . , nk) be the vector of sample proportions, and set Un = √

n(ˆp − p). Then E(Un) = 0, Cov(Un) = Dp − ptp.

We now use “Cramer-Wold device” to prove asymptotic multivariate normality of cell fre- quency vectors.

Theorem. The random vector Un converges in distribution to k-variate normal with mean 0 and covariance Dp − ptp.

• Compute the characteristic function of E exp(itPni=1 ui) where Un = (u1, . . . , uk).

• Observe that E

exp

it Xk

j=1λjuj

= E

exp

k

X

j=1itλj

nj

√n − √ npj

= exp

−it√

n Xk

j=1λjpj

· E

exp

√it n

Xk

j=1λjnj

= exp

−it√

n Xk

j=1λjpj

·

k

X

j=1pj exp

√it nλj

n

=

k

X

j=1pj · exp

√it n

λjXk

i=1λipi

n

(5)

=

k

X

j=1pj

1 + it

√n(λjXk

i=1 λipi) − t2

2n(λjXk

i=1λipi)2 + o(n−1)

n

=

1 − t2 2n

k

X

j=1pj

λjXk

i=1λipi

2

+ o(n−1)

n

→ exp

−1

2(λ1, . . . , λk)(Dp − ptp)(λ1, . . . , λk)t

.

• The limit being the ch.f. of the multivariate normal distribution with mean vector 0 and covariance matrix Dp − ptp.

Assumptions:

• Every individual in the population under study can be classified as falling into one and only one of k categories, we say that the cat- egories are mutually exclusive and exhaus- tive.

• A randomly selected member of the popu- lation will fall into one of the k categories with probability p, where p is the vector of cell probabilities

p = (p1, p2, . . . , pk) and Pki=1 pi = 1.

(6)

• Here the cells are strung out into a line for purposes of indexing only; their arrangement and ordering does not reflect anything about the characteristics of individuals falling into a particular cell.

• The pi reflect the relative frequency of each category in the population.

• Mining of association rules in Basket Anal- ysis:

– A basket bought at the food store con- sists of the following four items: Apples, Bread, Coke, Milk, Tissues.

– Data on all baskets is available (through cash registers)

– Goal: Discover association rules of the form

Bread&Milk =¿ Coke&Tissue

– This analysis is also called linkage anal- ysis or item analysis.

– Properties of association rules:

∗ The support of the rule is the Propor-

tion of baskets with Bread&Milk&Coke&Tissue.

(7)

∗ The confidence of the rule is the

Sup (Bread&Milk&Coke&Tissue)/Sup(Bread&Milk) which is simply the estimated condi-

tional probability in statistical terms.

∗ The lift of the rule is the

Sup (B&M&C&T)/Sup(B&M)Sup(C&T).

How do you connect it with P (A ∩ B)/P (A)P (B)?

– Search for rules with high confidence and support

∗ Will the results be affected by ran- domness?

∗ Add the requirement that the rule is statistically significant in the test against independence (i.e. against lift=1)

∗ The number of such tests to be per- formed in a moderate problem reaches tens of thousands

– You can put all of them in a huge con- tingency table.

(8)

2 × 2 Tables

• As an example, we might be interested in whether hair color is related to eye color.

Conduct a study by collecting a random sam- ple and get a count of the number of people who fall in this particular cross-classification determined by hair color and eye color.

• When the cells are defined in terms of the categories of two or more variables, a struc- ture relating to the nature of the data is imposed. The natural structure for two vari- ables is often a rectangular array with columns corresponding to the categories of one vari- able and rows to categories of the second variable; three variables creates layers of two- way tables, and so on.

• The simplest contingency table is based on four cells, and the categories depend on two variables. The four cells are arranged in a 2 × 2 table whose two rows correspond to the categorical variable A and whose two columns correspond to the second categor- ical variable B. Double subscripts refer to the position of the cells in our arrangement.

(9)

• The first subscript gives the category num- ber of variable A, the second of variable B, and the two-dimensional array is displayed as a grid with two rows and two columns.

• The probability pij is the probability of an individual being in category i of variable A and category j of variable B. Usually, we have some theory in mind which can be checked in terms of hypothesis testing such as

H0 : p = π (π a fixed value).

• Then the problem can phrased as n obser- vations from the k-cell multinomial distribu- tion with cell probabilities p1, . . . , pk. Then we encounter the problem of proving asymp- totic multivariate normality of cell fre- quency vectors.

• To test H0, it can be proceed by the Pearson chi square test, which is to reject H0 if X2 is too large, where

X2 = Xk

i=1

(ni − nπi)2i .

(10)

This test statistic was first derived by Pear- son (1900). Then we need to answer two questions. The first one is to determine what kind of the magnitude of X2 is the so-called too large. The second one is whether the Pearson chi-square test is a reasonable test- ing procedure. These questions will be tack- led by deriving the asymptotic distribution of the Pearson chi square statistic under H0 and a local alternative of H0.

• Using matrix notation, X2 can be written as

X2 = UnD−1π Utn, where

Un = √

n(ˆp−π), p = nˆ −1(n1, . . . , nk), and Dπ = diag(π).

• Let g(x) = xD−1π xt for x = (x1, . . . , xk).

Evidently, g is a continuous function of x.

It can be shown that Un → U, where U hasd the multivariate normal distribution N (0, Dπ−

πtπ). Then we have

UnD−1π Utn → UDd −1π Ut.

Thus the asymptotic distribution of X2 un- der H0, which is the distribution of UD−1π Ut,

(11)

where U has the N (0, Dπ − πtπ) distri- bution. This reduces the problem to finding the distribution of a quadratic form of a mul- tivariate normal random vector. The above process is the so-called δ method.

• Now we state without proof the following general result on the distribution of a quadratic form of a multivariate normal random vari- able. It can be found in Chapter 3b in Rao (1973) and Chapter 3.5 of Serfling (1980).

Theorem If X = (X1, . . . , Xd) has the multivariate normal distribution N (0, Σ) and Y = XAXt for some symmetric matrix A, then L[Y ] = L[Pdi=1 λiZi2], where Z12, . . . , Zd2 are independent chi square variables with one degree of freedom each and λ1, . . . , λd are the eigenvalues of A1/2Σ(A1/2)t.

• Apply the above theorem to the present prob- lem, we see that L[UD−1π Ut] = L[Pdi=1 λiZi2], where λi are the eigenvalues of

B = D−1/2π (Dπ−πtπ)D−1/2π = I−

πt√ π, where √

π = (√

π1, . . . ,√ πk).

• Now it remains to find the eigenvalues of

(12)

B. Since B2 = B and B is symmetric, the eigenvalues of B are all either 1 or 0.

Moreover,

k

X

i=1λi = tr(B) = k − 1.

• Therefore, we establish the result that un- der the simple hypothesis H0, Pearson’s chi- square statistic X2 has an asymptotic chi square distribution with k − 1 degrees of freedom.

Remarks:

• We already examined the limiting distribu- tion of the Pearson chi square statistic under H0 by employing δ method.

• In essence, the δ method requires two ingre- dients:

first, a random variable (which we denote here by ˆθn) whose distribution depends on a real-valued parameter θ in such a way that

L[√

n( ˆθn − θ)] → N (0, σ2(θ)); (1) and second, a function f (x) that can be dif- ferentiated at x = θ so that it possesses the

(13)

following expansion about θ:

f (x) = f (θ)+(x−θ)f0(θ)+o(|x−θ|) as x → θ.

(2)

• The δ method for finding approximate means and variances (asymptotic mean and asymp- totic variance) of a function of a random variable is justified by the following theo- rem.

Theorem (The one-dimensional δ method.) If ˆθn is a real-valued random variable and θ is a real-valued parameter such that (1) holds, and if f is a function satisfying (2), then the asymptotic distribution of f ( ˆθn) is given by

L[√

n(f ( ˆθn)−f (θ))] → N (0, σ2(θ)[f0(θ)]2).

(3) Proof. Set Ωn = R, Ω = Ω1 × Ω2 × · · · × Ωn× · · · = ×n=1n, and Pn to be the prob- ability distribution of ˆθn on R. Note that Ω is the set of all sequences {tn} such that tn ∈ Ωn. We define two subsets of Ω:

S = {{tn} ∈ Ω : tn − θ = O(n−1/2)},

T = {{tn} ∈ Ω : f (tn) − f (θ) − (tn − θ)f0(θ) = o(n−1/2)}.

(14)

Since f satisfies (2), then S ⊂ T . By (1), we have

n1/2( ˆθn−θ) = OP(1) and hence ˆθn − θ = OP(n−1/2).

(4) Note that S occurs in probability and hence T also occur in probability since S ⊂ T . Finally,

f ( ˆθn) − f (θ) − ( ˆθn − θ)f0(θ) = oP(n−1/2) (5) or

√n(f ( ˆθn)−f (θ)) = √

n( ˆθn−θ)f0(θ)+oP(1).

(6) Now let Vn = √

n(f ( ˆθn) − f (θ)), Un =

√n( ˆθn − θ), and g(x) = xf0(θ) for all real numbers x. Then (6) may be rewritten as

Vn = g(Un) + oP(1).

(15)

Goodness-of-Fit to Composite Multinomial Models

Consider a sample from a population in genetic equilibrium with respect to a single gene with two alleles. If we assume the three different genotypes are identifiable, we are led to suppose that there are three types of individuals whose frequencies are given by the so-called Hardy- Weinberg proportions

p1 = θ2, p2 = 2θ(1 − θ), p3 = (1 − θ)2, where 0 < θ < 1.

• In the Hardy-Weinberg model, the probabil- ity model to describe the data is multinomial with parameter falling in

Θ = {θ : θi ≥ 0, 1 ≤ i ≤ 3, X3

i=1θi = 1}.

• The theory we want to test can be described by a multinomial with parameter falling in Θ0 = {(η2, 2η(1 − η), (1 − η)2, 0 ≤ η ≤ 1}, which is a one-dimensional curve in the two- dimensional parameter space Θ.

• To test the adequancy of the Hardy-Weinberg model means testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1 where Θ1 = Θ − Θ0.

(16)

In general, we can describe Θ0 parametrically as

Θ0 = {(θ1(η), . . . , θk(η)) : θ(η ∈ Ξ},

where η = (η1, . . . , ηq)T, is a subset of q-dimensional space, and the map η → (θ1(η), . . . , θk(η))T

takes Ξ into Θ0. To avoid trivialities we assume q < k − 1.

Now we consider the likelihood ratio test for H0 versus H1.

• Let p(n1, . . . , nk, θ) denote the frequency function.

– Maximizing p(n1, . . . , nk, θ) for θ ∈ Θ0. – Denote the maximizer by ˆη = (ˆη1, . . . ,ηˆq).

– The logarithm of sup θ∈Θ0

L(θ; x)

is Pki=1 ni log θi( ˆη) up to a constant.

• The logarithm of sup θ∈Θ

L(θ; x)

is Pki=1 ni log(ni/n) up to a constant.

(17)

• Suppose that we can define θj0 = gj(θ), j = 1, . . . , r, where gj is chosen so that H0 be- comes equivalent to (θ10 , . . . , θq0)T ranges over an open subset of Rq and θj = θ0j, j = q + 1, . . . , r for specified θ0j.

For example, to test the Hardy-Weinberg model we set θ10 = θ1, θ02 = θ2 − 2√

θ1(1 −

√θ1) and test H1 : θ02 = 0.

• Apply the standard result on likelihood ratio test, under H0, λn approximately has a χ2r−q distribution for large n.

Example 1 Consider Hardy-Weinberg model.

• ˆη = (2n1 + n2)/2n

• Reject H0 if λn ≥ χ21(1 − α) with θ(ˆη) =

2n1 + n2 2n

2

, (2n1 + n2)(2n3 + n2)

2n2 ,

2n3 + n2 2n

2

T

. For the Wald statistic and Rao score statis-

tic, they are approximately χ2r−q distributed for large n under H0.

• Wald statistic:

Wn = Xk

j=1

[Nj − nθj( ˆη)]2j( ˆη) .

(18)

• Rao score statistic:

Sn = Xk

j=1

[Nj − nθj( ˆη)]2 θj( ˆη) .

• They are identical to Pearson’s χ2 statistic SU M(Observed − Expected)2

Expected .

Example 2 (The Fisher Linkage Model).

• A self-crossing of maize heterozygous on two characteristic (starchy versus sugary; green base leaf versus white base leaf) leads to four possible offspring types: (1) sugary- white; (2) sugary-green; (3) starchy-white;

(4) starchy-green.

• (N1, . . . , N4) has a M N (n, θ1, . . . , θ4) dis- tribution.

• Fisher (1958) specifies that θ1 = 1

4(2 + η), θ2 = θ3 = 1

4(1 − η), θ4 = 1 4η where η is an unknown number between 0 and 1.

(19)

• To test the validity of the linkage model we would take

Θ0 = {(1

4(2+η), 1

4(1−η), 1

4(1−η), 1

4) : 0 ≤ η ≤ 1}

a “one-dimensional curve” of the three-dimensional parameter space Θ.

• The likelihood equation under H0 becomes n1

2 + η − n2 + n3

1 − η + n4

η = 0.

• We obtain critical values from χ21 table.

(20)

Testing Independence of Classification in Contingency Tables

• Many important characteristics have only two categories.

• An individual either is or is not inoculated against a disease; is or is not a soker; is male or female; and so on.

• We often want to know whether such char- acteristics are linked or are independent.

For example, do smoking and lung cancer have any relation to each other?

• Let us call the possible categories or states of the first characteristic A and ¯A and of the second B and ¯B.

– A randomly selected individual from the population can be one of four types AB, A ¯B, ¯AB, ¯A ¯B.

– Denote the probabilities of these types by θ11, θ12, θ21, θ22, respectively.

• Independent classification means that the events (being an A) and (being a B) are independent or in terms of the θij,

θij = (θi1 + θi2)(θ1j + θ2j).

(21)

• The data are assembled in what is called a 2 × 2 contingency table.

• Testing independence can be put as H0 : θ ∈ Θ0 versus H1 : θ 6∈ Θ0 where Θ0 is a two-dimensional subset of Θ given by

Θ0 = {(η1η2, η1(1−η2), η2(1−η1), (1−η1)(1−η2)) : 0 ≤ η1, η2 ≤ 1}.

The degree of freedom of chi2 test is 1.

• For θ ∈ Θ0, ˆη1 = (n11 + n12)/n and ˆη2 = (n11 + n21)/n.

• Pearson’s statistic is n X2

i=1 2

X

j=1

[Nij − (Ni1 + Ni2)(N1j + N2j)/n]2 (Ni1 + Ni2)(N1j + N2j) .

• Pearson’s statistic can be rewritten as Z2 where

Z =

N11

N11 + N21 − N12 N12 + N22

v u u u u u t

(N11 + N21)(N12 + N22)n (N11 + N12)(N21 + N22) . Thus,

Z = √

n[ ˆP (A|B)− ˆP (A| ¯B)]

P (B)ˆ P (A)ˆ

P ( ¯ˆ B) P ( ¯ˆ A)

1/2

where ˆP is the empirical distribution.

(22)

• Z indicates what directions they deviate from independence.

a × b Contingency Tables

• Consider contingency tables for two nonnu- merical characteristics having a and b states, respectively, a, b ≥ 2.

• If we take a sample of size n from a popu- lation and classify them according to each characteristic we obtain a vector Nij, i = 1, . . . , a, j = 1, . . . , b where Nij is the num- ber of individuals of type i for characteristic 1 and j for characteristic 2.

• {Nij : 1 ≤ i ≤ a, 1 ≤ j ≤ b} are multi- nomially distributed with {θij : 1 ≤ i ≤ a, 1 ≤ j ≤ b} where

θij = P (A randomly selected individual is of type i for 1 and j for 2).

• The hypothesis that the characteristics are assigned independently becomes H0 : θij = ηi1ηj2 for 1 ≤ i ≤ a, 1 ≤ j ≤ b where the ηi1, ηj2 are nonnegative and Pai=1 ηi1 =

Pb

j=1 ηj2 = 1.

• Nij can be arranged in a a × b contingency table. Write Cj = Pai=1 Nij and Ri = Pbj=1Nij.

(23)

• Pearson’s χ2 for the hypothesis of indepen- dence is

n Xa

i=1 b

X

j=1

(Nij − RiCj/n)2 RiCj ,

which has approximately a χ2(a−1)(b−1) dis- tribution under H0.

(24)

Logistic Regression for Binary Response

• Consider Bernoulli responses Y that can only take on the values 0 and 1. Examples are

– medical trials where at the end of the trial the patient has either recovered (Y = 1) or has not recovered (Y = 0),

– election polls where a voter either sup- ports a proposition (Y = 1) or does not (Y = 0),

– market research where a potential cus- tomer either desires a new product (Y = 1) or does not (Y = 0)

– multiple-choice test where an examiner either gets a correct answer ordoes not

• Assume the distribution of Y depends on the known covariate vector z in Rp.

• Assume that the data are grouped or repli- cated so that for each fixed i, we observe the number of successes Xi = Pmj=1i Yij where Yij is the response on the jth of the mi trials in block i, 1 ≤ i ≤ k.

Thus, we observe independent X1, . . . , Xk with Xi binomial Bin(mi, π), where π =

(25)

π(z) is the probability of success for a case with covariate vector zi.

• Consider the logistic transform g(π), usu- ally called the logit, which is

η = g(π) = log[π/(1 − π)].

• We choose the following parametric model for π(z)

logit(π(z)) = zTβ.

This model will allow that each component of z takes values on R.

• The above model is called the logistic lin- ear regression model.

In practice, the probit g1(π) = Φ−1(π) where Φ is the N (0, 1) cdf and the log-log trans- form g2(π) = log[− log(1−π)]are also being used.

• The log likelihood `(π(β)) ≡ `N(β) of β = (β1, . . . , βp)T is, if N = Pki=1 mi,

`N(β) = Xp

j=1βjTjXk

i=1mi log(1+exp(ziβ))+ Xk

i=1log

mi Xi

where Tj = Pki=1 zijXi.

(26)

• The likelihood equations are ZT(X − µ) = 0, where Z = (zij)m×p by observing

µi = E(Xi) = miπi E(Tj) = Xk

i=1zijµi

• The MLE ˆβ of β solves Eβ(Tj) = Tj, j = 1, . . . , p.

• To solve the above nonlinear equations, we use the Newton-Raphson algorithm

• The Fisher information matrix is ZTWZ where W = diag{miπi(1 − πi)}k×k.

Testing

• Let ω = {η : ηi = zTi β, β ∈ Rp}. Consider two different kinds of tests.

– Let Ω = Rk. Test H0 : η ∈ ω versus H1 : η ∈ Ω \ ω.

– Let ω0 be a q-dimensional linear subspace of ω with q < r.. Test H0 : η ∈ ω0 ver- sus H1 : η ∈ ω \ ω0.

(27)

• For the first set of hypotheses, the MLEs of πi and µi are Xi/mi and Xi. The log- likelihood ratio test statistics is

2 Xk

i=1[Xi log(Xi/µˆi) + Xi0 log(Xi0/µˆ0i)]

where Xi0 = mi − Xi and µ0i = mi − ˆµi. – Note that it just measure the distance

between the fit ˆµ based on the model ω and the data.

– By the multivariate delta method, it has asymptotically a χ2k−p distribution for η ∈ ω as mi → ∞, i = 1, . . . , k < ∞.

• For the second set of hypotheses, the log- likelihood ratio test statistics is

2 Xk

i=1

Xi log

ˆ µi ˆ µ0i

+ Xi0 log

ˆ µ0i ˆ µ00i

where ˆµ0 is the MLE of µ under H0 and µ00i = mi − ˆµ0i.

It has an asymptotical χ2p−q distribution as mi → ∞, i = 1, . . . , k < ∞.

(28)

Tests for A Simple Null Hypothesis

• Let X1, . . . , Xn be iid with X1 ∼ N (θ, 1).

– Test H0 : θ = 0 versus H1 : θ = θ0 > 0.

– How do we find a good test for the above simple hypothesis?

• Consider testing H0 : θ = θ0 ∈ Rs versus H1 : θ 6= θ0.

• We consider three large sample tests.

Likelihood Ratio Test

• A likelihood ratio statistic, Λn = L(θ0; x)

supθ∈Θ L(θ; x)

was introduced by Neyman and Pearson (1928).

• Λn takes values in the interval [0, 1] and H0 is to be rejected for sufficiently small values of Λn.

• The rationale behind LR tests is that when H0 is true, Λn tends to be close to 1, whereas when H1 is true, Λn tends to be close to 0,

(29)

• The test may be carried out in terms of the statistic

λn = −2 log Λn.

• For finite n, the null distribution of λn will generally depend on n and on the form of pdf of X.

• LR tests are closely related to MLE’s.

• Denote MLE by ˆθ. For asymptotic analysis, expanding λn at ˆθ in a Taylor series, we get λn = −2

Xn

i=1 log f (Xi, ˆθ) + Xn

i=1log f (Xi, θ0)

= 2

1

2(θ0 − ˆθ)T

Xn

i=1

2

∂θj∂θk log f (x; θ)

θ=θ

0 − ˆθ)

, where ˆθ lies between ˆθ and θ0.

• Since θ is consistent, λn = n(ˆθ−θ0)T

−1 n

n

X

i=1

2

∂θj∂θkL(θ)

θ=θ0

(ˆθ−θ0)+oP(1).

By the asymptotic normality of ˆθ and

−n−1 nX

i=1

2

∂θj∂θk L(θ)|

θ=θ0

→ I(θP 0), λn has, under H0, a limiting chi-squared dis- tribution on s degrees of freedom.

(30)

Example 3 Consider the testing problem H0 : σ2 = σ02 versus H1 : σ2 6= σ02 based on iid X1, . . . , Xn from the normal distribution N (µ0, σ2).

• L(θ0; x) = (2πσ02)−n/2 exp Pi(xi − µ0)2/2σ02

• ˆσ2 = n−1 Pi(xi − µ0)2 (MLE) and sup

θ∈Θ

L(θ; x) = (2π ˆσ2)−n/2 exp(−n/2).

• We have Λn =

ˆ σ2 σ02

n/2

exp

n 2 −

Pi(xi − µ0)202

or under H0 λn = −n

ln

1 n

n

X

i=1Zi2

1 −

1 n

n

X

i=1Zi2

, where Z1, . . . , Zn are iid N (0, 1).

• Fact: Using CLT, we have n−1 Pni=1 Zi2 − 1

r2/n

→ N (0, 1)d

or n

2

1 n

n

X

i=1Zi2 − 1

2 d

→ χ21.

(31)

• Note that ln u ≈ −(1 − u) − (1 − u)2/2 when u is near 1 and n−1 Pni=1 Zi2 → 1 in probability by LLN.

• A common question to be asked in Tay- lor’s series approximation is that how many terms we should consider. In this exam- ple, it refers to the use of approximation ln u ≈ −(1 − u) as a contrast to the second order approximation we use. If we do use the first order approximation, we will end up the difficulty of finding limnanbn when limnan = ∞ and limnbn = 0.

• We conclude that λn has a limiting chi-squared distribution with 1 degree of freedom.

The Wald Test

• Let ˆθn denote a consistent, asymptotically normal, and asymptotically efficient sequence of solutions of the likelihood equations.

√n(ˆθn − θ) → N (0, Id −1(θ)) as n → ∞.

• Because I(θ) is continuous in θ, we have I( ˆθn) → I(θ)P

(32)

as n → ∞.

• Replace the matrix

n1 Pni=1 ∂θ2

j∂θk L(θ)|

θ=θ0

by I(ˆθn) in large sample approximation of λn, we get a second statistic,

Wn = n(ˆθn − θ0)TI(ˆθn)(ˆθn − θ0), which was introduced by Wald (1943).

• By Slutsky’s theorem, Wn converges in dis- tribution to χ2s.

• For the construction of confidence region, one generates {θ0 : Wn ≤ χ2s,α} which is an ellipsoid in Rs.

• As a remark, for the construction of con- fidence region based on λn, one generates {θ0 : λn ≤ χ2s,α} which is not necessary an ellipsoid in Rs.

The Rao Score Tests

• Both the Wald and likelihood ratio tests re- quires evaluation of ˆθn. Now we consider a test for which this is not necessary.

• Denote the likelihood score vector

q(x; θ) = (q1(x; θ), . . . , qs(x; θ))T

(33)

where

qj(x; θ) = ∂

∂θj log f (x; θ).

• Write Q(θ) = Pni=1 q(Xi; θ). By the central limit theorem,

n−1/2Q(θ0) → N (0, I(θd 0)).

• A third statistic,

Vn = [n−1/2Q(θ0)]TI−10)[n−1/2Q(θ0)]

= n−1Q(θ0)TI−10)Q(θ0), was introduced by Rao (1948).

Again, it has a limiting χ2s distribution.

Example 4. Consider a sample X1, . . . , Xn from the logistic distribution with density

fθ(x) = ex−θ

(1 + ex−θ)2.

• q(x; θ) = −1 + 2ex−θ/(1 + ex−θ) and Q(θ0) = −n + 2 Xn

i=1

exi−θ0 1 + exi−θ0.

• I(θ) = 1/3 for all θ.

(34)

• The Rao scores test therefore rejects H0 with test statistic

v u u u u t

3 n

n

X

i=1

exi−θ0 − 1 1 + exi−θ0.

• In this case, the MLE does not have an ex- plicit expression and therefore the Wald and likelihood ratio tests are less convenient.

Example 5. Consider a sequence of n in- dependent trials, with s possible outcomes for each trials.

• Let θj denote the probability of occurrence of the jth outcome in any given trial.

• Let Nj denote the number of occurrences of the jth outcome in the series of n trials.

• The MLE of θj’s are Nj/n.

• The three test statistics λn, Wn and Vn for testing H0 : θ = θ0 against H1 : θ 6= θ0 are easily seen to be

λn = 2 Xs

j=1Nj log( Njj0), Wn = Xs

j=1

(Nj − nθj0)2

Nj ,

(35)

Vn = Xs

j=1

(Nj − nθj0)2j0 .

• Both Wn and Vn are referred to as chi-squared goodness of fit statistics; the latter often called the Pearson chi-squared distribution.

The large sample properties was first derived by Pearson (1900).

Pearson’s chi-square statistic is easily remem- bered as

χ2 = sum(Observed − Expected)2

Expected .

Example 6. (Testing a Genetic Theory)

• In experiments on pea breading, Mendel ob- served the different kinds of seeds obtained by crosses from peas with round yellow seeds and peas with wrinkled green seeds.

• Possible types of progeny were: (1) round yellow; (2) wrinkled yellow; (3) round green;

and (4) wrinkled green.

• Assume the seeds are produced independently.

We can think of each seed as being the out- come of a multinomial trial with possible

(36)

outcomes numbered 1, 2, 3, 4 as above and associated probabilities of occurrence θ1, θ2, θ3, θ4.

• Mendel’s theory predicted that θ1 = 9/16, θ2 = θ3 = 3/16, θ4 = 1/16.

• Data: n = 556, n1 = 315, n2 = 101, n3 = 108, n4 = 32.

• Pearson’s chi-square statistic is (315 − 556 × 9/16)2

312.75 +(3.25)2

104.25+(3.75)2

104.25+(2.75)2

34.75 = 0.47, which has a p-

Figure

Updating...

References

Related subjects :