• 沒有找到結果。

11.3 The GMM Interpretation of the OLS Estimation

N/A
N/A
Protected

Academic year: 2022

Share "11.3 The GMM Interpretation of the OLS Estimation"

Copied!
16
0
0

加載中.... (立即查看全文)

全文

(1)

11 THE GMM ESTIMATION 2

11.1 Consistency and Asymptotic Normality . . . 3

11.2 Regularity Conditions and Identification . . . 4

11.3 The GMM Interpretation of the OLS Estimation . . . 5

11.4 The GMM Interpretation of the MLE . . . 6

11.5 The GMM Estimation in the Over-Identification Case . . . 7

11.6 The GMM Interpretation of the Instrumental Variable Estimation . . . 10

11.7 The Restricted GMM Estimation . . . 11

11.7.1 Comparing Restricted and Unrestricted GMM Estimators . . . 12

11.8 Hypothesis Testing . . . 13

11.8.1 Wald Test: . . . 13

11.8.2 The Minimumχ2Test: . . . 14

11.8.3 The Lagrange Multiplier Test: . . . 14

11.9 The GMM Interpretation of the Restricted OLS Estimation . . . 15

1

(2)

Chapter 11

THE GENERALIZED METHOD OF MOMENTS ESTIMATION

Given that a random sample x1, x2, . . . , xnare drawn from a population which is characterized by the parameterθθθ whose true value is θθθ. If we can identify a vector of functions g(x; θθθ) of the random variable x and the parameter θθθ such that the true parameter value θθθ uniquely solves the following population moment condition

Eg(x; θθθ) = 0, (11.1)

while the estimator ˆθθθ is the unique solution to the sample moment condition 1

n

n

X

i =1

g(xi;θθθ) = 0, (11.2)

then, under some regularity conditions, we can show that ˆθθθ is consistent and asymptotically normal

θθθˆ ∼A N(θθθ, 1

n G(θθθ)0−1(θθθ) G(θθθ)1), (11.3) whereθθθdenotes the true value of the parameterθθθ,

G(θθθ) = E

∂g(x; θθθ)

∂θθθ



, (11.4)

and

(θθθ) = Eg(x; θθθ)g(x; θθθ)0

. (11.5)

Any estimator defined in such a setup is referred to as a Generalized Method of Moment (GMM) estimator. The approach of first identifying some moment condition and then deriving the corresponding GMM estimator from its sample counterpart has become a very popular way of generating new estimators in econometrics.

A Simple Example Given the random sample x1, x2,. . ., xndrawn from an unspecified pop- ulation with a population meanµ and variance σ2, we have derived the asymptotic properties of the sample mean ¯x as an estimator ofµ in such a case by directly applying law of large numbers and the central limit theorem. We now show that the asymptotic analysis of the sample mean can fit into the GMM framework.

2

(3)

Let’s consider the function g(xi;µ) ≡ xi−µ which gives the following population moment condition:

E[g(xi;µ)] = E(xi) − µ = 0.

It is obvious that the only solution of µ is the true value of the population mean µ to which E(xi) is equal. The sample counterpart of the population moment condition is

1 n

n

X

i =1

g(xi;µ) = 1 n

n

X

i =1

xi−µ = 0,

and the solution of µ is nothing but the sample mean ¯x. So the sample mean ¯x is actually a GMM estimator ofµ. Consequently, we can apply the general results for the GMM estimator to establish the consistency and asymptotic normality of the GMM estimator ¯x. It is also easy to prove that the asymptotic variance of the GMM estimator ¯x is n−1σ2. Although the GMM argument here appears tedious, the idea is important and has wide applicability.

11.1 Consistency and Asymptotic Normality

The following argument for proving the consistency of the GMM estimator helps illustrate the key idea of the GMM approach.

We first note, if the second moment of g(x; θθθ) exists, then law of large numbers implies 1

n

n

X

i =1

g(xi;θθθ) −→p Eg(x; θθθ). (11.6) It means that the population moment condition (11.1) can be approximated by the sample mo- ment condition (11.2). If the estimator ˆθθθ solves the sample moment condition (11.2) irrespec- tive of the sample size, then its probability limit, say, θθθ must also solve the probability limit of (11.2) which is the population moment condition (11.1). But by definition the true param- eter valueθθθ uniquely solves the population moment condition (11.1), so the probability limit θθθ must be equal to the true parameter value θθθ. That is, ˆθθθ is a consistent estimator of θθθ. In other words, if we know the true parameter value is the solution to certain population moment condition, then the solution to its sample counterpart will be a consistent estimator.

The proof of asymptotic normality is based on Taylor expansion and the central limit theo- rem. Given that ˆθθθ converges in probability to θθθ and that g is differentiable with respect toθθθ, then for sufficiently large n the first-order Taylor expansion of (11.2) around the true valueθθθ gives the following approximation

0 = 1 n

n

X

i =1

g(xi; ˆθθθ) ≈ 1 n

n

X

i =1

g(xi;θθθ) + 1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ (ˆθθθ − θθθ) (11.7) or

n(ˆθθθ − θθθ) ≈ −

"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

#1

n 1

n

n

X

i =1

g(xi;θθθ). (11.8)

(4)

Provided that the second moment of∂g(xi;θθθ)/∂θθθ exists, law of large numbers again implies 1

n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

−→p G(θθθ), (11.9)

and the central limit theorem implies

n

(1 n

n

X

i =1

g(xi;θθθ) − Eg(x; θθθ)

)

−→d u ∼ N(0, (θθθ)), (11.10)

where Eg(x; θθθ) = 0. Consequently,

n(ˆθθθ − θθθ)−→d −G(θθθ)1·u ∼ N(0, G(θθθ)1(θθθ) G(θθθ)0−1), (11.11)

which implies (11.3).

The Estimator of the Asymptotic Variance-Covariance Matrix Based on law of large num- bers and (11.9), it is readily seen that the following statistic is a consistent estimator of the asymptotic variance-covariance matrix G(θθθ)1(θθθ) G(θθθ)0−1of the GMM estimator ˆθθθ:

1 n

"

1 n

n

X

i =1

∂g(xi; ˆθθθ)

∂θθθ

#1

1 n

n

X

i =1

g(xi; ˆθθθ)g(xi; ˆθθθ)0

"

1 n

n

X

i =1

∂g(xi; ˆθθθ)

∂θθθ

#0−1

. (11.12)

11.2 Regularity Conditions and Identification

In proving consistency and asymptotic normality of the GMM estimator, we have used law of large numbers and the central limit theorem. Obviously, certain assumptions are required before we can apply these theorems. The assumptions that ensure the validity of the GMM estimation are called regularity conditions and they can be divided into four categories:

1. Conditions that ensure the differentiability of g(x; θθθ) with respect to θθθ. For example, g(x; θθθ) is usually assumed to be twice continuously differentiable with respect to θθθ.

2. Conditions that restrict the moments of g(x; θθθ) and its derivatives with respect to θθθ. For example, the second moments of g(x; θθθ) and its first derivative are usually assumed to be finite.

3. Conditions that restrict the range of the possible values which the parameterθθθ can take.

For example,θθθ is not allowed to have infinite value and the true value θθθmay not be at the boundary of the permissible range ofθθθ (if θθθ is on the boundary of the permissible range ofθθθ, then convergence to θθθcannot take place freely from all directions).

4. The solution to the population moment condition Eg(x; θθθ) = 0 must be unique and the unique solution must be the true valueθθθof the parameter.

(5)

The first three categories of regularity conditions are somewhat technical and are routinely assumed. However, we do need to make special efforts to check the validity of the last one in each application. This last condition is referred to as the identification condition because it allows us to identify the true parameter valueθθθfor estimation.

An obvious necessary condition for identification is that the row number, say, m of the vector g(x; θθθ) is no less than the row number, say, k of the parameter vector θθθ. That is, the number of individual population moment conditions cannot be smaller than the number of parameters to be estimated. If m < k, then the population moment condition will have multiple solutions of which all but one can be the true value so that the resulting GMM estimator does not necessarily converge to the true parameter value. This is the so-called under-identification problem.

The identification condition is implicitly assumed in the previous analysis of the GMM estimation. In fact, we have made a stronger assumption that m = k so that the derivative G(x; θθθ) of g(x; θθθ) with respect to θθθ is a square matrix and invertible. This is the so-called just- identification case. In section 11.4 we will examine the over-identification case with m > k.

11.3 The GMM Interpretation of the OLS Estimation

For the linear regression model1

yi =x0iβββ + εi, (11.13)

let’s assume that the sample {yi, x0i}, i = 1, . . . , n, are i.i.d., and that E(εi) = 0. In the present framework, the explanatory variables xi are stochastic and, following the arguments in Chapter 10, we have to assume the following population moment condition:

E(xiεi) = E[xi(yix0iβββ)] = 0. (11.14) The dimensions of xi and the zero vector on the right hand side are both k. So we have in fact k population moment conditions which are just enough for us to estimate the k parameters inβββ, i.e., we have a just-identification case.2 The corresponding sample moment condition is

1 n

n

X

i =1

xi(yix0iβββ) = 0, (11.15)

which can be written as 1

nX0(y − Xβββ) = 0 or X0Xβββ = X0y. (11.16)

1We treatβββ not only as the notation for the regression coefficients but also as their true values. Such notational ambiguity has been existing throughout the earlier chapters. Better notations for the true values of the regression coefficients may beβββ.

2If the xi contains the constant term 1, then one of the moment conditions is E(yi x0iβββ) = 0 or E(yi) = E(xi)0βββ.

(6)

But this is equivalent to the first-order condition for the OLS estimation. Hence, the OLS estimator, which are solved from the above sample moment conditions, can be considered as a GMM estimator.

In order to apply the asymptotic theory for the GMM estimation, we need to first evaluate3

(βββ) ≡ E[xiεiεixi0] = EE(εi2|xi)xix0i = E(σ2xix0i) = σ2E(xix0i) (11.17) and

G(βββ) ≡ E∂xi(yix0iβββ)

∂βββ



= −E(xix0i). (11.18) It is important to note that one of the assumptions (Assumption 5) we made for a multiple linear regression model is

n→∞lim 1

nX0X = lim

n→∞

1 n

n

X

i =1

xix0i =Q,

with Q being a finite and p.d. matrix, we can equate E(xix0i) to Q. That is, we have (βββ) = σ2Q and G(βββ) = −Q.

Now, following the general asymptotic theory for the GMM estimation, we then have that the OLS estimator b is consistent:

b −→p βββ. (11.19)

and √

n(b − βββ) −→d N(0, σ2Q−1). (11.20)

11.4 The GMM Interpretation of the MLE

Suppose the sample {xi}, i = 1, . . . , n are i.i.d. with the density function f (x|θθθ), where θθθis an unknown k-dimensional parameter to be estimated, then we have shown in Chapter 9 that

E

∂ ln f (xi|θθθ)

∂θθθ



=0, (11.21)

which can be viewed as k population moment conditions that are just enough for us to estimate the k-dimensional parameterθθθ. The corresponding sample counterpart is

1 n

n

X

i =1

∂ ln f (xi|θθθ)

∂θθθ =0, (11.22)

and the solution, denoted by ˆθθθ, is certainly the MLE of θθθ. In other words, the MLE can be viewed as a GMM estimator.

3Here, we have further assumed that E2i|xi) = σ2(i.e.,εi is homoscedastic with respect to xi), which will be true if xi is assumed to be nonstochastic.

(7)

In order to apply the asymptotic theory for the GMM estimation, let’s first define

(θθθ) = E

∂ ln f (xi|θθθ)

∂θθθ

 ∂ ln f (xi|θθθ)

∂θθθ

0

(11.23) and

G(θθθ) = E

∂2ln f(xi|θθθ)

∂θθθ∂θθθ0



. (11.24)

It has also been shown in Chapter 9 that(θθθ) = −G(θθθ). Now following the general asymp- totic theory for the GMM estimation, we then have the well-known results that the MLE ˆθθθ is consistent and4

n(ˆθθθ − θθθ) −→d N(0, (θθθ)1). (11.25)

11.5 The GMM Estimation in the Over-Identification Case

If in the population moment condition

Eg(x; θθθ) = 0 (11.26)

the row number of g is strictly greater than the row number of the parameter vectorθθθ, then it is not possible to solve its sample counterpart

1 n

n

X

i =1

g(xi;θθθ) = 0, (11.27)

because the number of equations is greater than the number of parameters to be solved. What we could do in such a case is to find a value ofθθθ that makes the sample moment condition as close to zero as possible based on the following quadratic form:

minθθθ

"

1 n

n

X

i =1

g(xi;θθθ)

#0 W

"

1 n

n

X

i =1

g(xi;θθθ)

#

(11.28)

where W is some positive definite weighting matrix of constants.

4In Chapter 9 we did not assume the sample to be identically distributed; i.e., the density functions fi(xi|θθθ) have the subscript i , indicating they are all different. In such a case, the variance-covariance matrix of the asymp- totical distribution is the inverse of

n→∞lim 1 n

n

X

i =1

E

"

2ln fi(xi|θθθ)

∂θθθ∂θθθ0

# .

It is readily seen that such a matrix reduces to(θθθ) = −G(θθθ) in the present i.i.d. case.

(8)

Given the assumption that G(θθθ) has full column rank and some additional regularity con- ditions, ˆθθθ is consistent. To see this we note that the first-order condition for the minimization problem (11.28) is

"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

#0 W

"

1 n

n

X

i =1

g(xi;θθθ)

#

=0, (11.29)

which can be viewed as the sample counterpart of the moment conditions

G(θθθ)0W Eg(x; θθθ) = 0. (11.30)

Given that G(θθθ) has full column rank and that W is nonsingular, then only the ture parameter valueθθθcan satisfy these moment conditions, which in turns implies that the GMM estimator θθθ is consistent.ˆ

We can further show that ˆθθθ is asymptotically normal:5 θθθˆ ∼A N(θθθ, 1

nG(θθθ)0W G(θθθ)1G(θθθ)0W(θθθ) W G(θθθ)G(θθθ)0W G(θθθ)1).

(11.31) Obviously, different weighting matrix W will give different estimators with different asymp- totic variance-covariance matrices. That is, the efficiency of the resulting GMM estimators

5Given that ˆθθθ converges in probability to θθθ and that g is twice differentiable with respect toθθθ, then for sufficiently large n the Taylor expansion of (11.29) around the true valueθθθgives the following approximation

0 =

"

1 n

n

X

i =1

∂g(xi; ˆθθθ)

∂θθθ

#0

W

"

1 n

n

X

i =1

g(xi; ˆθθθ)

#

"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

#0

W

"

1 n

n

X

i =1

g(xi;θθθ)

# +

("

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

#0

W

"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

# +S

)

(ˆθθθ − θθθ)

or

n(ˆθθθ−θθθ) ≈ (

"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

#0

W

"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

# +S

)−1"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

#0

W

"

n 1

n

n

X

i =1

g(xi;θθθ)

# ,

where S is a k × k matrix in which the j th column is

"

1 n

n

X

i =1

2g(xi;θθθ)

∂θθθ∂θj

#0

W

"

1 n

n

X

i =1

g(xi;θθθ)

# .

We note that (11.6) implies that S converges in probability to zero. Thus, by (11.9) and (11.10), we have

n(ˆθθθ − θθθ) −→d G(θθθ)0W G(θθθ)−1G(θθθ)0W ·u

N(0, G(θθθ)0W G(θθθ)−1G(θθθ)0W(θθθ) W G(θθθ)G(θθθ)0W G(θθθ)−1).

(9)

depends on the weighting matrix W. It can be shown that6

G(θθθ)0W G(θθθ)1G(θθθ)0W(θθθ) W G(θθθ)G(θθθ)0W G(θθθ)1G(θθθ)0(θθθ)1G(θθθ)1 (11.32) for any positive definite W. This finding implies that the most efficient GMM estimator ˆθθθ can be obtained by setting W = e1for any consistent estimator e of (θθθ) = Eg(x; θθθ)g(x; θθθ)0 and then solving the following minimization problem:

minθθθ

"

1 n

n

X

i =1

g(xi;θθθ)

#0

e−1

"

1 n

n

X

i =1

g(xi;θθθ)

#

. (11.33)

The resulitng GMM estimator is denoted again as ˆθθθ which from now on will represent such an efficient GMM estimator. It is readily seen that ˆθθθ is consistent and asymptotically normal

θθθˆ ∼A N(θθθ, 1

n G(θθθ)0(θθθ)1G(θθθ)1). (11.34) The derivation of the GMM estimator ˆθθθ with over-identified moment condition essentially requires a two-stage procedure because a preliminary estimator is needed for calculating the weighting matrix e. A particularly simple choice ofe is

1 n

n

X

i =1

g(xi; ˜θθθ)g(xi; ˜θθθ)0 (11.35)

where ˜θθθ is a preliminary estimator of θθθ which can be any consistent estimator of θθθ. A common one can be derived by solving the following simpler minimization problem

minθθθ

"

1 n

n

X

i =1

g(xi;θθθ)

#0"

1 n

n

X

i =1

g(xi;θθθ)

#

. (11.36)

That is, the preliminary consistent estimator ˜θθθ itself is a GMM estimator based on an especially simple weighting matrix W = I.

The asymptotic variance-covariance matrix can be consistently estimated by 1

n

"

1 n

n

X

i =1

∂g(xi; ˆθθθ)

∂θθθ

#0"

1 n

n

X

i =1

g(xi; ˆθθθ)g(xi; ˆθθθ)0

#1"

1 n

n

X

i =1

∂g(xi; ˆθθθ)

∂θθθ

#

−1

. (11.37)

which can be compared to the one for the just-identified case in (11.12).

6Let G G(θθθ) and  ≡ (θθθ), then (G0WG)−1G0WWG(G0WG)−1 (G0−1G)−1 = (G0WG)−1G0WWG − G0WG(G0−1G)−1G0WG(G0WG)−1 = (G0WG)−1G0W−1



−1G(G0−1G)−1G0−1WG(G0WG)−1 which can be expressed in the form of AA0 with A = (G0WG)−1G0W−1 −1G(G0−1G)−1G0−1. Since AA0 is necessarily a p.d. matrix, we therefore have(G0WG)−1GW0WG(G0WG)−1(G0−1G)−1.

(10)

11.6 The GMM Interpretation of the Instrumental Variable Estimation

For the linear regression model (11.13), suppose the stochastic explanatory variables xi are endogenous; i.e.,

E(xiεi) = E[xi(yix0iβββ)] 6= 0. (11.38) The analysis in Chapter 10 indicates that the OLS estimation will not be consistent. To estimate the regression coefficient βββ, we need to employ certain instrumental variables zi such that Cov(zi, xi) 6= O and

Cov(zi, εi) = E(ziεi) = E[zi(yix0iβββ)] = 0. (11.39) Here, let’s assume the dimensions m of zi is greater than or equal to k, the number of ex- planatory variables in xi. The condition (11.39) can now be viewed as the (over-identified or just-identified) moment conditions we need for conducting the GMM estimation for the linear regression model (11.13).

In order to implement the GMM estimation, we need to first evaluate7

(θθθ) ≡ E(ziεiεiz0i) = E(ε2iziz0i) = EE(εi2|zi)ziz0i = E(σ2ziz0i) = σ2E(ziz0i). (11.40) The GMM estimation forβββ is then based on

minβββ

"

1 n

n

X

i =1

zi(yix0iβββ)

#0 σ21

n

n

X

i =1

ziz0i

!1"

1 n

n

X

i =1

zi(yix0iβββ)

#

. (11.41)

which can be written as8 minβββ

1

n(y − Xβββ)0Z(Z0Z)−1Z0(y − Xβββ), (11.42) where Z = [ z1z2 . . . zn]0. It is readily seen that the solution to this minimization problem is

βββ = Xˆ 0Z(Z0Z)−1Z0X1

X0Z(Z0Z)−1Z0y, (11.43) which is also referred to as the instrumental variable (IV) estimator ofβββ.9

7Here, we have further assumed that Ei2|zi) = σ2; i.e.,εiis homoscedastic with respect to zi.

8We drop the scalarσ2from the expression. Doing so will not affect the derivation of the GMM estimator of βββ.

9In Chapter 10 we suggested a two-stage estimation for using the instrumental variables. It is readily seen that the resulting two-stage estimator is identical to (11.43).

(11)

In order to apply the asymptotic theory for the GMM estimation, we need G(βββ) ≡ E

∂zi(yix0iβββ)

∂βββ



= −E(zix0i). (11.44)

The general asymptotic theory for the GMM estimation implies that the IV estimator ˆβββ is con- sistent and

n( ˆβββ − βββ) −→d N(0, σ2n

E(xiz0i)E(ziz0i)1E(zixi0)o1

). (11.45)

Note that the asymptotic variance-covariance matrix for the IV estimatorβββ can be estimated by s2X0Z(Z0Z)1Z0X1

, where s2is some consistent estimator ofσ2.

11.7 The Restricted GMM Estimation

Suppose other than the over-identified moment condition, we have another set of conditions that we believe the true parameterθθθ should satisfy. Let’s also assume such extraneous conditions can be expressed as an J -vector of functions of the parameterθθθ:

h(θθθ) = 0. (11.46)

We note these conditions do not involve the random variable xi so that they are fundamentally different from the moment condition. If these conditions are true, then we certainly want the GMM estimator to satisfy them. The way to impose these conditions to the GMM estimator is to consider the restricted minimization with the condition h(θθθ) = 0 imposed as a set of restrictions:

minθθθ

"

1 n

n

X

i =1

g(xi;θθθ)

#0

e1

"

1 n

n

X

i =1

g(xi;θθθ)

#

subject to h(θθθ) = 0, (11.47) or

minθθθ

"

1 n

n

X

i =1

g(xi;θθθ)

#0

e

1

"

1 n

n

X

i =1

g(xi;θθθ)

#

+h(θθθ)0λλλ, (11.48)

whereλλλ is an J-vector of Lagrange multipliers. The solution to such a problem, denoted as ˆθθθ, is called the restricted GMM estimator as opposed to the unrestricted GMM estimator ˆθθθ.

When we derive the GMM estimator based on the over-identified moment condition Eg(x; θθθ) = 0, the moment condition is never exactly satisfied by either the restricted or the unrestricted GMM estimator. But it should be pointed out the restriction h(θθθ) = 0, in contrast, is exactly satisfied by the restricted GMM estimator. So the moment condition and restriction are not treated symmetrically although both are conditions on the parameter value.

(12)

It can be proved that, just like the unrestricted GMM estimator, the restricted GMM estima- tor is consistent and has an asymptotic normal distribution. However, while both estimators are consistent, their asymptotic normal distributions are not the same. In particular, the asymptotic variance-covariance matrix of the restricted GMM estimator is always smaller than or equal to that of the unrestricted GMM estimator. This result simply reflects the fact that the restricted GMM estimator, by incorporating more information from the restriction h(θθθ) = 0, is more (asymptotically) efficient. The present discussion is very similar to the one we had on the the relationship between the unrestricted MLE and the restricted MLE. As a matter of fact, the derivation of the asymptotic distribution of the restricted GMM estimator is parallel to that of the restricted MLE.

11.7.1 Comparing Restricted and Unrestricted GMM Estimators

To simplify our exposition here, let’s denote (half of) the objective function for minimization in defining the GMM estimator with over-identified moment condition by

q(θθθ) ≡ 1 2

"

1 n

n

X

i =1

g(xi;θθθ)

#0

e−1

"

1 n

n

X

i =1

g(xi;θθθ)

#

. (11.49)

and the first order derivative of q(θθθ) by

s(θθθ) ≡

"

1 n

n

X

i =1

∂g(xi;θθθ)

∂θθθ

#0

e

−1

"

1 n

n

X

i =1

g(xi;θθθ)

#

. (11.50)

Because of the difference between the restricted GMM estimator ˆθθθ and the unrestricted GMM estimator ˆθθθ, we observe the following inequalities

h(ˆθθθ) = 0 6= h(ˆθθθ), q(ˆθθθ) ≥ q(ˆθθθ), s(ˆθθθ) 6= 0 = s(ˆθθθ). (11.51) The second inequality is due to the fact that the restriction h(θθθ) = 0 restricts the possible values ofθθθ for minimization. The third inequality results from the fact that the first order condition for the restricted minimization is

s(ˆθθθ) + H(ˆθθθ)0λλλ = 0,ˆ where H(θθθ) = ∂h(θθθ)

∂θθθ . (11.52)

while the first order condition for the unrestricted minimization is

s(ˆθθθ) = 0. (11.53)

These three sets of inequalities in h, q, and s hold for any random sample of a finite sample size.

Let’s first denote the probability limit of the restricted GMM estimator ˆθθθ by θθθ, then, like ˆθθθ for every sample size, θθθ must also satisfy the restriction: h(θθθ) = 0. It is obvious

(13)

that whetherθθθ is equal toθθθ, so that the restricted GMM estimator is consistent, depends on whether h(θθθ) = 0 is correct or not. The theory for the restricted GMM estimation mentioned in the previous subsection is based on the implicit assumption that the restriction h(θθθ) = 0 is correctly specified. We should also note that the unrestricted GMM estimator is always consistent irrespective of whether the restriction h(θθθ) = 0 is correct or not.

We can now conclude that if the restriction h(θθθ) = 0 is correct, then

h(θθθ) = 0 = h(θθθ), q(θθθ) = q(θθθ), s(θθθ) = 0 = s(θθθ). (11.54) But if the restriction h(θθθ) = 0 is incorrect, then

h(θθθ) = 0 6= h(θθθ), q(θθθ) > q(θθθ), s(θθθ) 6= 0 = s(θθθ). (11.55) The direct implication of the above inequalities is that, depending on whether h(θθθ) is equal to 0, or whether q(θθθ) is greater than q(θθθ), or whether s(θθθ) is equal to 0, we can judge whether the restriction h(θθθ) = 0 is correct or not. Therefore, even though for a finite sample size we have h(ˆθθθ) 6= 0, q(ˆθθθ) ≥ q(ˆθθθ), and s(ˆθθθ) 6= 0, the differences are expected to become small as the sample size becomes large if, and only if, the restriction h(θθθ) = 0 is correct. This conclusion is important because it helps us formulate three formal tests for the hypothesis about the truthfulness of the restriction h(θθθ) = 0, as will be explained next.

11.8 Hypothesis Testing

Given the GMM estimator ˆθθθ that is based on over-identified moment condition, there are three asymptotically equivalent tests for testing

H0: h(θθθ) = 0 against H1: h(θθθ) 6= 0,

where h is a J -vector of functions of the parameterθθθ. To explain the motivation of the tests, we need to think the null hypothesis h(θθθ) = 0 as a set of restrictions on the true parameter value θθθ.

11.8.1 Wald Test:

Wald test is based on the idea of using the difference between h(ˆθθθ) and 0 to decide whether the null hypothesis is true. To determine whether h(ˆθθθ) is significantly close to 0 or not, we need the following result which can be proved easily:

h(ˆθθθ) ∼A N(h(θθθ), 1

nH(θθθ)G(θθθ)0(θθθ)−1G(θθθ)1H(θθθ)0), where H(θθθ) = ∂h(θθθ)

∂θθθ . (11.56) When the null hypothesis is true so that h(θθθ) = 0, then we have the following distribution result for the quadratic form W :

W ≡ n h0(ˆθθθ)n

H(ˆθθθ)G(ˆθθθ)0(ˆθθθ)−1G(ˆθθθ)1H(ˆθθθ)0o−1

h(ˆθθθ) ∼A χ2(J), (11.57)

(14)

where J is the number of restrictions or the number of rows in the vector h. This result forms the basis for the Wald test. Given the size of the testα and the corresponding critical value cα

from theχ2(J) distribution, the null hypothesis is rejected if h(ˆθθθ) is significantly different from 0 or, equivalently, the value of W is greater than the critical value cα.

11.8.2 The Minimum χ

2

Test:

The minimum χ2 test is based on the idea of using the difference between q(ˆθθθ) and q(ˆθθθ) to decide whether the null hypothesis is true. Specifically, we have the following asymptotic result: if the null hypothesis h(θθθ) = 0 is true, then

MC ≡ 2nq(ˆθθθ) − q(ˆθθθ)A χ2(J). (11.58) Hence, the null hypothesis is rejected if the value of MC is greater than the critical value cα.

11.8.3 The Lagrange Multiplier Test:

Lagrange multiplier test is based on the idea of using the difference between s(ˆθθθ) and 0 to decide whether the null hypothesis is true. It can be shown that the quadratic form

L M ≡ n s(ˆθθθ)0G(ˆθθθ)0(ˆθθθ)−1G(ˆθθθ)1s(ˆθθθ) ∼A χ2(J), (11.59) if the null hypothesis is true. Hence, we reject the null hypothesis if the value of L M is greater than the critical value cα.10

The three test statistics W , MC, and L M are asymptotically equivalent and have the same asymptotic distribution χ2(J) when the null hypothesis is true. But in finite sample applica- tions, these three tests may give conflicting results and there is no consensus about how to resolve such conflicts when they occur.

Finally, since the three tests are asymptotically equivalent, there is no need to compute all three test statistics all the time. We note the Wald test statistic W only requires the unrestricted GMM estimator ˆθθθ, the Lagrange multiplier test statistic L M only requires the restricted GMM estimator ˆθθθ, while the minimumχ2test statistic MC requires both restricted and unrestricted GMM estimators.

10The reason for the name Lagrange-Multiplier test is because the first-order condition for the restricted GMM estimator implies s(ˆθθθ) = −H(ˆθθθ)0λλλ so that L M ≡ n ˆλλλˆ 0H(ˆθθθ)G(ˆθθθ)0(ˆθθθ)−1G(ˆθθθ)−1H(ˆθθθ)0λλλ, which is a testˆ statistic based on the Lagrange multiplierλλλ.

(15)

11.9 The GMM Interpretation of the Restricted OLS Esti- mation

As mentioned earlier in Subsection 11.3, the OLS estimation of the multiple linear regression model is based on the just-identified population moment condition (11.14) and its sample coun- terpart (11.15). The immediate consequence of imposing the linear restriction

Rβββ = q (11.60)

to the GMM estimation is that the restricted GMM estimator cannot be exactly solved from the just-identified moment condition alone since the linear restriction also needs to be satisfied. The restricted GMM estimation has changed from the just-identified case to an over-identified case.

To construct the objective function for deriving the GMM estimator in the over-identified case requires the sample counterpart of(βββ) in (11.17) which is

s2 1

nX0X, (11.61)

where s2is any consistent estimator ofσ2. The objective function for the over-identified GMM estimation is a quadratic function in the sample moments with the inverse of the above term as the weighting matrix:

q(βββ) = 1

2ns2(y − Xβββ)0X(X0X)−1X0(y − Xβββ), (11.62) Recall that the objective function for the OLS estimation is S(βββ) = (y − Xβββ)0(y − Xβββ). It is easy to show

2ns2q(βββ) = S(βββ) − y0My, (11.63) where M = X(X0X)1X0. Because of the equivalence between q(βββ) and S(βββ) (both 2ns2 and y0My do not involveβββ), we conclude that the OLS estimator and the GMM estimator are identical. We should understand that in the present framework the OLS estimator is derived as the GMM estimator from the over-identified moment conditions. The approach is different from the one in Subsection 11.3 where the OLS estimator is derived as the GMM estimator from the just-identified moment conditions. It is interesting to note that if we plug the OLS estimator b into the quadratic objective function (11.62), we get q(b) = (y0My − y0My)/2ns2 = 0, which is the smallest possible value of that quadratic function. This special result reflects that the objective function (11.62) actually is built from just-identified, instead of over-identified, moment conditions.

We now turn to the restricted GMM estimator subject to the linear restriction (11.60) which is to be solved from

minβββ q(βββ) s.t. Rβββ = q. (11.64)

Because q(βββ) and S(βββ) are equivalent, the first-order condition for the restricted GMM estima- tion is also equivalent to the one for the restricted OLS estimation so that, similar to the case of the unrestricted estimation, the restricted GMM estimator is the same as the restricted OLS estimator b.

(16)

Three Asymptotic Tests Given that the unrestricted and restricted OLS estimators both have the GMM interpretations, they can be used to construct the three asymptotically equivalent tests for testing

H0: Rβββ = q against H1: Rβββ 6= q.

1. Wald Test: based on the asymptotic result

RbA N(q, σ2R(X0X)1R0), (11.65) we can immediately get the Wald test statistic which is

W = (Rb − q)0[R(X0X)1R0]1(Rb − q) s2

A χ2(m), (11.66)

under the null hypothesis H0, where s2is any consistent estimator ofσ2.

2. The Minimumχ2Test: given the objective function (11.62) for the GMM estimation, the minimumχ2test statistic is

MC = 2nq(b)−q(b) = 2n q(b) = (Rb − q)0[R(X0X)1R0]1(Rb − q) s2

A χ2(m), (11.67) under the null hypothesis H0, where s2is any consistent estimator ofσ2. Note that q(b) is identically equal to 0.

3. The Lagrange Multiplier Test: given the score function s(βββ) = 1

ns2X0(y − Xβββ), (11.68)

the Lagrange multiplier test statistic is11

L M = (y − Xb)0X(X0X)1X0(y − Xb) s2

= (Rb − q)0[R(X0X)1R0]1(Rb − q) s2

A χ2(m), (11.69) under the null hypothesis H0, where s2is any consistent estimator ofσ2.

It is interesting to see that these three asymptotic tests are identically equal and

W = MC = L M = m · F, (11.70)

where F is the F test statistic discussed in Chapter 6 and m is the number of restrictions. It should also be pointed out that, in contrast to the F test, the three asymptotic tests do not hinge on the normality assumption and they are valid only when the sample size is sufficiently large.

11The Lagrange multiplier test statistic can also be derived from the fact that the asymptotic distribution of the Lagrange multiplier estimator c which is c = [R(X0X)−1R0]−1(Rb − q) A N(0, σ2[R(X0X)−1R0]−1), under the null hypothesis H0. See (6.120) in Chapter 6.

參考文獻

相關文件

[This function is named after the electrical engineer Oliver Heaviside (1850–1925) and can be used to describe an electric current that is switched on at time t = 0.] Its graph

了⼀一個方案,用以尋找滿足 Calabi 方程的空 間,這些空間現在通稱為 Calabi-Yau 空間。.

Verify that the function satisfies the hypotheses of the Mean Value Theorem on the

32.Start from the definition of derivative, then find the tangent line equation.. 46.Find the product of their slopes at there

(12%) Among all planes that are tangent to the surface x 2 yz = 1, are there the ones that are nearest or farthest from the origin?. Find such tangent planes if

[r]

The cross-section is a hexagon, and the shape of the solid looks like the union of two umbrellas..

This contradiction shows that the given equation can’t have two distinct real roots, so it has exactly one