An evaluation entails making some comparison between “treated” and “untreated” persons.
This section considers three widely-used comparisons for estimating the impact of treatment on the treated: E(Y1¡ Y0 j X; D = 1). All use some form of comparison to construct the required counterfactual E(Y0 j X; D = 1). Data on E(Y1 j X; D = 1) are available from program participants. A person who has participated in a program is paired with an
“otherwise comparable” person or set of persons who have not participated in it. The set may contain just one person. In most applications of the method, the paired partner is not literally assumed to be a replica of the treated person in the untreated state although some panel data evaluation estimators make such an assumption. Thus, in general, ¢ = Y1¡ Y0
is not estimated exactly. Instead, the outcome of the paired partners is treated as a proxy for Y0 for the treated individual and the population mean di¤erence between treated and untreated persons is estimated by averaging over all pairs. The method can be applied symmetrically to nonparticipants to estimate what they would have earned if they had participated. For that problem the challenge is to …nd E(Y1 j X; D = 0) since the data on nonparticipants enables one to identify E(Y0 j X; D = 0).
A major di¢culty with the application of this method is providing some objective way of demonstrating that a candidate partner or set of partners is “otherwise comparable.”
Many econometric and statistical methods are available for adjusting di¤erences between persons receiving treatment and potential matching partners which we discuss in Section 7.
4.1 The Before-After Estimator
In the empirical literature on program evaluation, the most commonly-used evaluation strategy compares a person with himself/herself. This is a comparison strategy based on longitudinal data. It exploits the intuitively-appealing idea that persons can be in both states at di¤erent times, and that outcomes measured in one state at one time are good proxies for outcomes in the same state at other times at least for the no-treatment state.
This gives rise to the motivation for the simple “before-after” estimator which is still widely used. Its econometric descendent is the …xed e¤ect estimator without a comparison group.
The method assumes that there is access either (i) to longitudinal data on outcomes measured before and after a program for a person who participates in it, or (ii) to repeated cross section data from the same population where at least one cross section is from a period prior to the program. To incorporate time into our analysis, we introduce “t”
subscripts. Let Y1t be the post-program earnings of a person who participates in the program. When longitudinal data are available, Y0t0 is the pre-program outcome of the
person. For simplicity, assume that program participation occurs only at time period k, where t > k > t0. The “ before-after” estimator uses preprogram earnings Y0t0 to proxy the treatment state in the post program period. In other words, the underlying identifying assumption is
(4.A.1) E(Y0t¡ Y0t0 j D = 1) = 0:
If this assumption is valid, the “Before-After” estimator is given by (4.1) (Y1t¡ Y0t0)1;
where the subscript “1” denotes conditioning on D = 1, and the “¡” denotes sample means.
To see how this estimator works, observe that for each individual the gain from the program may be written as
Y1t¡ Y0t = (Y1t¡ Y0t0) + (Y0t0 ¡ Y0t):
The second term (Y0t0 ¡ Y0t) is the approximation error. If this term averages out to zero, we may estimate the impact of participation on those who participate in a program by subtracting participants’ mean pre-program earnings from the mean of their post-program earnings. These means also may be de…ned for di¤erent values of participants’ character-istics, X.
The before-after estimator does not literally require longitudinal data to identify the means (Heckman and Robb, 1985a,b). As long as the approximation error averages out, repeated cross-sectional data that sample the same population over time, but not necessarily the same persons, are su¢cient to construct a before-after estimate. An advantage of this approach is that it only requires information on the participants and their pre-participation histories to evaluate the program.
The major drawback to this estimator is its reliance on the assumption that the ap-proximation errors average out. This assumption requires that among participants, the mean outcome in the no-treatment state is the same in t and t0. Changes in the overall state of the economy between t and t0, or changes in the life cycle position of a cohort of participants, can violate this assumption.
A good example of a case in which assumption (4.A.1) is likely violated is provided in the work of Ashenfelter (1978). Ashenfelter observed that prior to enrollment in a training program, participants experience a decline in their earnings. Later research demonstrates that Ashenfelter’s “dip” is a common feature of the pre-program earnings of participants in government training programs. See Figures 4.1 to 4.6 which show the dip for a variety of programs in di¤erent countries. If this decline in earnings is transitory, and earnings is a mean-reverting process so that the dip is eventually restored, even in the absence of participation in the program, and if period t0 falls in the period of transitorily low
earnings, then the approximation error will not average out. In this example, the before-after estimator overstates the average e¤ect of training on the trained and attributes mean reversion that would occur under any event to the e¤ect of the program. On the other hand, if the decline is permanent, the before-after estimator is unbiased for the parameter of interest. In this case, any improvement in earnings is properly attributable to the program. Another potential defect of this estimator is that it attributes to the program any trend in earnings due to macro or lifecycle factors.
Two di¤erent approaches have been used to solve these problems with the before-after estimators. One controversial method generalizes the before-after estimator by making use of many periods of pre-program data and extrapolating from the period before t0to generate the counterfactual state in period t. It assumes that Y0tand Y0t0 can be adjusted to equality using data on the same person, or the same populations of persons, followed over time. As an example, suppose that Y0t is a function of t, or is a function of t- dated variables. If we have access to enough data on pre-program outcomes prior to date t0 to extrapolate post-program outcomes Y0t; and if there are no errors of extrapolation, or if it is safe to assume that such errors average out to zero across persons in period t, one can replace the missing data or at least averages of the missing data, using extrapolated values. This method is appropriate if population mean outcomes evolve as deterministic functions of time or macroeconomic variables like unemployment. This procedure is discussed further in Section 7.5.11 The second approach is based on the before-after estimator which we discuss next.
4.2 The Di¤erence-in-Di¤erences Estimator
A more widely used approach to the evaluation problem assumes access either (i) to lon-gitudinal data or (ii) to repeated cross-section data on nonparticipants in periods t and t0. If the mean change in the no-program outcome measures are the same for participants and nonparticipants i.e. if the following assumption is valid:
(4.A.2) E(Y0t¡ Y0t0 j D = 1) = E(Y0t¡ Y0t0 j D = 0) then the di¤erence-in-di¤erences estimator given by
(4.2) ( ¹Y1t¡ ¹Y0t0)1¡ ( ¹Y0t¡ ¹Y0t0)0 t > k > t0 :
is valid for E(¢tj D = 1) = E(Y1t¡ Y0t j D = 1) where ¢t = Y1t¡ Y0t because E[( ¹Y1t¡ ¹Y0t0)1¡ ( ¹Y0t¡ ¹Y0t0)0] = E(¢t j D = 1):12
11See also Heckman and Robb (1985a), p. 210-215.
12The proof is immediate. Make the following decomposition
( ¹Y1t¡ ¹Y0t0)1= ( ¹Y1t¡ ¹Y0t0)1+ ( ¹Y0t¡ ¹Y0t0)1:
If assumption (4.A.2) is valid, the change in the outcome measure in the comparison group serves to benchmark common year or age e¤ects among participants.
Because we cannot form the change in outcomes between the treated and untreated states, the expression
(Y1t¡ Y0t0)1¡ (Y0t¡ Y0t0)0;
cannot be formed for anyone, although we can form one or the other of these terms for everyone. Thus, we cannot use the di¤erence-in-di¤erences estimator to identify the dis-tribution of gains without making further assumptions.13 Like the before-after estimator, we can implement the di¤erence-in-di¤erences estimator for means (4.2) on repeated cross sections. It is not necessary to sample the same persons in periods t and t0— just persons from the same populations.
Ashenfelter’s dip provides an example of a case where assumption (4.A.2) is likely to be violated. If Y is earnings, and t0 is measured at the time of a transitory earnings dip, and if non-participants do not experience the dip, then (4.A.2) will be violated, because the time path of no-program earnings between t0 and t will be di¤erent between participants and nonparticipants. In this example, the di¤erence-in-di¤erences estimator overstates the average impact of training on the trainee.
4.3 The Cross-Section Estimator
A third estimator compares mean outcomes of participants and nonparticipants at time t: This estimator is sometimes called the cross-section estimator. It does not compare the same persons because by hypothesis a person cannot be in both states at the same time. Because of this fact, cross-section estimators cannot estimate the distribution of gains unless additional assumptions are invoked beyond those required to estimate mean impacts.
The key identifying assumption for the cross-section estimator of the mean is that (4.A.3) E(Y0t j D = 1) = E(Y0t j D = 0);
i.e., that on average persons who do not participate in the program have the same no-treatment outcome as those who do participate. If this assumption is valid, then the cross-section estimator is given by
The claim follows upon taking expectations.
13One assumption that identi…es the distribution of gains is to assume that (Y1t¡Y0t)1is independent of (Y0t¡ Y0t0)1 and that the distribution of (Y1t¡ Y0t)1 is the same as the distribution of (Y0t¡ Y0t0)0:Then the results on deconvolution in Heckman, Smith and Clements (1997) can be applied. See their paper for details.
(4.3) ( ¹Y1t)1¡ ( ¹Y0t0)0:
This estimator is valid under assumption (4.A.3) because E(( ¹Y1t)1¡ ( ¹Y0t)0) = E(¢tj D = 1):14
If persons go into the program based on outcome measures in the post-program state, then assumption (4.A.3) will be violated. The assumption would be satis…ed if participation in the program is unrelated to outcomes in the no program state in the post-program period.
Thus, it is possible for Ashenfelter’s dip to characterize the data on earnings in the pre-program period, and yet for (4.A.3) to be satis…ed. Moreover, as long as the macro economy and aging process operate identically on participants and nonparticipants, the cross section estimator is not vulnerable to the problems that plague the before-after estimator.
The cross section estimator (4.3), the di¤erence-in-di¤erences estimator (4.2), and the before-after estimator (4.1) comprise the trilogy of conventional non-experimental evalua-tion estimators. All of these estimators can be de…ned condievalua-tional on observable character-istics X. Conditioning on X or additional “instrumental” variables make it more likely that modi…ed versions of assumptions (4.A.3), (4.A.2), or (4.A.1) will be satis…ed but this is not guaranteed. If, for example, the distribution of X characteristics is di¤erent between partic-ipants (D = 1) and nonparticpartic-ipants (D = 0), conditioning on X may eliminate systematic di¤erences in outcomes between the two groups. Using modern nonparametric procedures, it is possible to exploit each of the identifying conditions to estimate nonparametric ver-sions of all three estimators. On the other hand, if the di¤erence between participants and nonparticipants is due to unobservables, conditioning may accentuate, and not eliminate, di¤erences between participants and nonparticipants in the no-program state.15
The three estimators exploit three di¤erent principles but all are based on making some comparison. The assumptions that justify one method will not, in general, justify any of the other methods. All of the estimators considered in this chapter exploit one of these three principles. They extend the simple mean di¤erences just discussed by making a variety of adjustments to the means. Throughout the rest of the chapter, we organize our discussion of alternative estimators by discussing how they modify the simple mean di¤erences used in the three intuitive estimators to account for nonstationary environments and di¤erent regressors in the di¤erent comparison groups. We …rst consider social experimentation and how it constructs the counterfactuals used in policy evaluations.
14Proof:
( ¹Y1t)1¡ ( ¹Y0t0)0= ( ¹Y1t)1¡ ( ¹Y0t)1+ ( ¹Y0t)1¡ ( ¹Y0t0)0 and take expectations invoking assumption (4-A-3).
15Thus if j E(Y0j D = 1) ¡ E(Y0j D = 0) j= M, there is no guarantee that j E(Y0j D = 1; X) ¡ E(Y0j D = 0; X)j< M. For some values of X, the gap could widen.