Speci…cation Testing and The Fallacy of Alignment

8 Econometric Practice

8.4 Speci…cation Testing and The Fallacy of Alignment

The message of Sections 3 through 7 is that the choice of an estimator to evaluate a program requires making judgments about outcome equations, participation rules and the relationship between the two. All estimators, including social experiments, are based on identifying assumptions which are often di¢cult if not impossible to test on the available data. For example, the validity of social experiments depends on assumption (5.A.1) or assumptions (5.A.2a) and (5.A.2b), which state that randomization does not disrupt the program being evaluated. Testing for disruption e¤ects turns out to be a di¢cult task (see Heckman, Khoo, Roselius and Smith, 1996). Testing whether a variable is a valid instrument is also di¢cult unless one has access to the true parameter via some other identifying assumption, such as another instrument, a valid social experiment or one of the other identifying restrictions discussed above or in Heckman and Robb (1985a, 1986a).

The inability to test maintained identifying assumptions on the available data is a source of frustration to many.

One widely-used practice in the evaluation literature apparently evades this problem by testing evaluation models on pre-program data and then using the models that pass the tests to evaluate the program. Papers by Ashenfelter (1978), Ashenfelter and Card (1985) and Heckman and Hotz (1989) exemplify this approach. The idea underlying this approach is that if a selection estimator correctly adjusts for di¤erences in pre-program earnings levels (or some other outcome measure) between future participants and nonparticipants, it should also adjust correctly for post-program di¤erences and therefore be a valid estimator for evaluating the program. This method could also be applied to the matching estimators de…ned in Section 7.4.1. According to this line of reasoning, a good match on pre-program outcome levels should produce a valid estimator for post-program levels.

The basic idea underlying this method is captured by the following testing frame-work. Write A(Y1t⁰; X_t0) for the adjusted pre-program earnings of program participants and A(Y0t⁰; Xt⁰) for the adjusted pre-program earnings of nonparticipants, where t⁰ < k.

Then, for a common Xt⁰, test the hypothesis:

(8.10) A(Y_1t0; X_t0) = A(Y_0t0; X_t0):

Most commonly such tests are based on the model of (3.10). In that context, the test for a valid comparison group is a test of the hypothesis H0 : ® = 0 in the equation:

Yt⁰ = Xt⁰¯ + D® + Ut⁰; t⁰ < k;

estimated using pre-program data on participants and comparison group members. Here D = 1 denotes that a person will be a participant in period k. If H0 is not rejected, the comparison group is deemed to be adequate.

This logic seems compelling, but is potentially misleading. The success of testing strate-gies based on the alignment of pre-program earnings depends on the serial correlation properties of the error term in the earnings equation. Suppose, for example, that program participants and nonparticipants have identical pre-program earnings histories but that participants experience a permanent loss in earnings at the time of enrollment in period k. In this case, …nding that a particular estimator or comparison group correctly aligns earnings in periods prior to k tells little about the validity of a post-program compari-son. Even if the program had a strong positive impact on participant earnings compared to what they would have earned without the program, post-program comparisons between participants and nonparticipants based on estimators or comparison groups which correctly aligned pre-program earnings might still yield a negative impact estimate for the program because of the large negative shock experienced by participants.

Using tests based on the alignment of pre-program earnings or outcome levels to evalu-ate the validity of an estimator or comparison group or both is the alignment fallacy. The widely-used Heckman and Hotz (1989) tests of the validity of non-experimental selection estimators using pre-program earnings are based on the alignment fallacy. Its practical im-portance can be illustrated by re-examining an old controversy in the evaluation literature.

In the early 1980s, two major consulting …rms – Westat, Inc. and SRI International – used matching to construct comparison groups to evaluate the U.S. CETA training program.

Both …rms had access to the same large data sets and both hired expert statisticians who advocated matching as an evaluation estimator. They both chose their comparison groups to align the earnings of participants and comparison group members in the pre-program period.

As shown in Figure 4.3, Ashenfelter’s dip characterized the earnings of participants in the CETA program. SRI chose to match on earnings two periods prior to the enrollment period. It picked as comparison group members persons whose earnings were very similar to participants in period k ¡ 2. Westat aligned using earnings in period k ¡ 1. Using a simple matching estimator for post-program earnings, SRI reported a negative impact of CETA on participant earnings that was substantially lower than the impact reported by Westat. Figures 8.6 and 8.7 demonstrate how this would happen. Those …gures are based on our adaptation of the empirical model of Ashenfelter and Card (1985) used to generate the simulations in Section 8.3. That model is rich enough to generate Ashenfelter’s dip. Figures 8.7 and 8.8 show the earnings of participants, matched participants, unmatched non-participants and all non-non-participants for comparison groups based on matching in periods k¡ 2 and k ¡ 1, respectively.

Comparing Figures 8.6 and 8.7, when we match so that future participant and nonpar-ticipant earnings are the same in period k ¡ 2, mean reversion causes the earnings after period k of persons aligned in k ¡ 2 to be higher than those of persons aligned based on

earnings in period k ¡ 1.⁸⁸ This implies that the matching estimator used by SRI should produce a lower estimate of program impact than the matching estimator used by Westat, which is exactly what was found. Neither matching estimator may be correct, but the or-dering of the estimates obtained from them is predicted from our knowledge of the earnings dynamics of program participants.

Alignment on pre-program earnings is not guaranteed to produce valid estimators of the impact of a program using post-program earnings. It is thus interesting, but not by any means conclusive, that speci…cation tests based on alignment of pre-program earnings developed by Heckman and Hotz (1989) have been found by them and by others such as Friedlander and Robins (1995) to eliminate from consideration the most biased estimators of training impact. Even in these studies, many estimators that survive the tests still exhibit substantial bias.

8.4.1 Testing Identifying Assumptions

As noted by Heckman and Robb (1985a, 1986a), most of the conventional econometric estimators make strong overidentifying restrictions which can be tested. The …xed e¤ects and inverse Mills’ ratio estimators are examples of evaluation models with strong over-identifying assumptions.⁸⁹ Heckman, Ichimura and Todd (1997) present tests of over-identifying assumptions for matching estimators for nonexperimental data.

Nonetheless, Heckman and Robb (1985a, 1986a) also note that all econometric evalu-ation models can be weakened to a just-identi…ed form, and they present many examples of how this can be done. Just-identi…ed models o¤er one interpretation of the available data but other just-identi…ed models are equally good descriptions of the same data. The only way to test the validity of just-identi…ed models is to get better data to eliminate the e¤ects of unobservables on selection.

88There were other matching variables used by both groups but the use of earnings at di¤erent lags to form matched samples plays the main role in explaining the discrepancy between the two studies.

89Tests of the …xed e¤ect model for panels of length greater than T = 2 are presented in Chamberlain (1984), Hsiao (1986) and Baltagi (1995). Tests for the normal selection model based on the properties of censored normal residuals are discussed in Amemiya (1985) among other sources. See also Bera, Jarque and Lee (1984).

9 Indirect E¤ects, Displacement and General

在文檔中 7 Non-experimental Evaluations 7.1 The Problem of Causal Inference in Non-experimental Eval- uations (頁 75-78)