3 The Evaluation Problem and the Parameters of In- terest in Evaluating Social Programs

(1)

The Economics and Econometrics of Active Labor Market Programs James J. Heckman, University of Chicago

Robert J. LaLonde, Michigan State University and

Je¤rey A. Smith, University of Western Ontario

Prepared for the Handbook of Labor Economics, Volume III, Orley Ashenfelter and David Card, editors. We thank Susanne Ackum Agell for her helpful comments on Scan- danavian active labor market programs and Costas Meghir for very helpful comments on Sections 1-7.

(2)

The Economics and Econometrics of Active Labor Programs Contents

1. Introduction

2. Public Job Training and Active Labor Market Policies

3. The Evaluation Problem and the Parameters of Interest in Evaluating Social Programs 3.1 The Evaluation Problem

3.2 The Counterfactuals of Interest

3.3 The Counterfactuals Most Commonly Estimated in the Literature 3.4 Is Treatment on the Treated an Interesting Economic Parameter?

4. The Prototypical Solutions to the Evaluation Problem 4.1 The Before-After Estimator

4.2 The Di¤erence-in-Di¤erences Estimator 4.3 The Cross Section Estimator

5. Social Experiments

5.1 How Social Experiments Solve the Evaluation Problem 5.2 Intention to Treat and Substitution Bias

5.3 Social Experiments in Practice

5.3.1 Two Important Social Experiments

5.3.2 The Practical Importance of Dropping Out and Substitution 5.3.3 Additional Problems Common to All Evaluations

6. Econometric Models of Outcomes and Program Participation 6.1 Uses of Economic Models

6.2 Prototypical Models of Earnings and Program Participation 6.3 Expected Present Value of Earnings Maximization

6.3.1 Common Treatment E¤ect 6.3.2 A Separable Representation 6.3.3 Variable Treatment E¤ect 6.3.4 Imperfect Credit Markets

6.3.5 Training as a Form of Job Search

6.4 The Role of Program Eligibility Rules in Determining Participation

6.5 Administrative Discretion and the E¢ciency and Equity of Training Provision 6.6 The Con‡ict between the Economic Approach to Program Evaluation and the

Modern Approach to Social Experiments 7. Non-experimental Evaluations

7.1 The Problem of Causal Inference in Non-experimental Evaluations

(3)

7.2 Constructing a Comparison Group 7.3 Econometric Evaluation Estimators

7.4 Identi…cation Assumptions for Cross-Section Estimators 7.4.1 The Method of Matching

7.4.2 Index Su¢cient Methods and the Classical Econometric Selection Model 7.4.3 The Method of Instrumental Variables

7.4.4 The Instrumental Variable Estimator as a Matching Estimator 7.4.5 IV Estimators and the Local Average Treatment E¤ect

7.4.6 Regression Discontinuity Estimators

7.5 Using Aggregate Time Series Data on Cohorts of Participants to Evaluate Programs 7.6 Panel Data Estimators

7.6.1 Analysis of the Common Coe¢cient Model 7.6.2 The Fixed E¤ects Method

7.6.3 Ut Follows a First-Order Autoregressive Process 7.6.4 Ut is Covariance Stationary

7.6.5 Repeated Cross-Section Analogs of Longitudinal Procedures 7.6.6 The Fixed E¤ect Model

7.6.7 The Error Process Follows a First-Order Autoregression 7.6.8 Covariance Stationary Errors

7.6.9 The Anomalous Properties of First Di¤erence or Fixed E¤ect Models 7.6.10 Robustness of Panel Data Methods in the Presence of Heterogeneous

Responses to Treatment

7.6.11 Panel Data Estimators as Matching Estimators 7.7 Robustness to Biased Sampling Plans

7.7.1 The IV Estimator and Choice-Based Sampling 7.7.2 The IV Estimator and Contamination Bias

7.7.3 Repeated Cross-Section Methods with Unknown Training Status and Choice- Based Sampling

7.8 Bounding and Sensitivity Analysis 8. Econometric Practice

8.1 Data Sources

8.1.1 Using Existing General Survey Data Sets 8.1.2 Using Administrative Data

8.1.3 Collecting New Survey Data 8.1.4 Combining Data Sources 8.2 Characterizing Selection Bias

8.3 A Simulation Study of the Sensitivity of Nonexperimental Methods 8.3.1 A Model of Earnings and Program Participation

(4)

8.3.2 The Data Generating Process 8.3.3 The Estimators We Examine 8.3.4 Results from the Simulations

8.4 Speci…cation Testing and the Fallacy of Alignment

9. Indirect E¤ects, Displacement, and General Equilibrium Treatment E¤ects 9.1 Review of Traditional Approaches to Displacement and Substitution 9.2 General Equilibrium Approaches

9.2.1 Davidson and Woodbury 9.2.2 Heckman, Lochner, and Taber

9.3 Summary on General Equilibrium Approaches 10. A Survey of Empirical Findings

10.1 The Objectives of Program Evaluations

10.2 The Impact of Government Programs on Labor Market Outcomes 10.3 The Findings from U.S. Social Experiments

10.4 The Findings from Non-experimental Evaluations of U.S. Programs 10.5 The Findings from European Evaluations

11. Conclusions

1 Introduction

Public provision of job training, of wage subsidies and of job search assistance is a feature of the modern welfare state. These activities are cornerstones of European “active labor market policies,” and have been a feature of U.S. social welfare policy for more than three decades. Such policies also have been advocated as a way to soften the shocks administered to the labor markets of former East Bloc and Latin economies currently in transition to market-based systems.

A central characteristic of the modern welfare state is a demand for “objective” knowledge about the e¤ects of various government tax and transfer programs. Di¤erent parties bene…t and lose from such programs. Assessments of these bene…ts and losses often play critical roles in policy decision-making. Recently, interest in evaluation has been elevated as many economies with modern welfare states have ‡oundered, and as the costs of running welfare states have escalated.

This chapter examines the evidence on the e¤ectiveness of such welfare state active labor market policies such as training, job search and job subsidy policies, and the methods used to obtain the evidence on their e¤ectiveness. Our methodological discussion of alternative approaches to evaluating programs has more general interest. Few U.S. government programs have received such intensive scrutiny, and have been subject to so many di¤erent

(5)

types of evaluation methodologies as has governmentally-supplied job training. In part, this is due to the fact that short run measures of government training programs are more easily obtained and are more readily accepted. Outcomes such as earnings, employment, and educational and occupational attainment are all more easily measured than the outcomes of health and public school education programs. In addition, short run measures of the outcomes of training programs are more closely linked to the “treatment” of training.

In public school and health programs, a variety of inputs over the life cycle often give rise to measured outcomes. For these programs, attribution of speci…c e¤ects to speci…c causes is more problematic.

A major focus of this chapter is on the general lessons learned from over thirty years of experience in evaluating government training programs. Most of our lessons come from American studies because the U.S. government has been much more active in promoting evaluations than have other governments, and the results from the evaluations are often used to expand – or contract – government programs. We demonstrate that recent studies in Europe indicate that the basic patterns and lessons from the American case apply more generally.

The two relevant empirical questions in this literature are (i) adjusting for their lower skills and abilities, do participants in government employment and training programs ben- e…t from these programs? and (ii) are these programs worthwhile social investments? As currently constituted, these programs are often ine¤ective on both counts. For most groups of participants, the bene…ts are modest, and at worst participation in government programs is harmful. Moreover, many programs and initiatives can not pass a cost-bene…t test. Even when programs are cost e¤ective, they are rarely associated with a large scale improvement in skills. But, at the same time, there is substantial heterogeneity in the impacts of these programs. For some groups these programs appear to generate signi…cant bene…ts both to the participants and to society.

We believe that there are two reasons why the private and social gains from these programs are generally small. First, the per-capita expenditures on participants are usually small relative to the de…cits that these programs are being asked to address. In order for such interventions to generate large gains they would have to be associated with very large internal rates of return. Moreover, these returns would have to larger than what is estimated for private sector training (Mincer, 1993). Another reason that the gains from these programs are generally low is that these services are targeted toward relatively unskilled and less able individuals. Evidence on the complementarity between the returns to training and skill in the private sector suggests that the returns to training in the public sector should be relatively low.

We also survey the main methodological lessons learned from thirty years of evaluation activity conducted mainly in the United States. We have identi…ed eight lessons from the

(6)

evaluation literature that we believe should guide practice in the future. First, there are many parameters of interest in evaluating any program. This multiplicity of parameters results in part because of the heterogeneous impacts of these programs. As a result of this heterogeneity, some popular estimators that are well-suited for estimating one set of parameters are poorly suited for estimating others. Understanding that responses to the same measured treatment are heterogenous across people, that measured treatments themselves are heterogeneous, that in many cases people participate in programs based in part on this heterogeneity and that econometric estimators should allow for this possibility, is an important insight of the modern literature that challenges traditional approaches to program evaluation. Because of this heterogeneity, many di¤erent parameters are required to answer the interesting evaluation questions.

Second, there is inherently no method of choice for conducting program evaluations.

The choice of an appropriate estimator should be guided by the economics underlying the problem, the data that are available or that can be acquired, and the evaluation question being addressed.

A third lesson from the evaluation literature is that better data helps a lot. The data available to most analysts have been exceedingly crude as we document below. Too much has been asked of econometric methods to remedy the defects of the underlying data. When certain features of the data are improved, the evaluation problem becomes much easier. The best solution to the evaluation problem lies in improving the quality of the data on which evaluations are conducted and not in the development of formal econometric methods to circumvent inadequate data.

Fourth, it is important to compare comparable people. Many non-experimental evaluations identify the parameter of interest by comparing observationally di¤erent persons using extrapolations based on inappropriate functional forms imposed to make incompa- rable people comparable. A major advantage of nonparametric methods for solving the problem of selection bias is that, rigorously applied, they force analysts to compare only comparable people.

Fifth, evidence that di¤erent non-experimental estimators produce di¤erent estimates of the same parameter does not indicate that non-experimental methods cannot address the underlying self-selection problem in the data. Instead, di¤erent estimates obtained from di¤erent estimators simply indicate that di¤erent estimators address the selection problem in di¤erent ways and that non-random participation in social programs is an important problem that deserves more attention in its own right. Di¤erent methods produce the same estimates only if there is no problem of selection bias.

Sixth, a corollary lesson, derived from lessons three, four and …ve, is that the message from LaLonde’s (1986) in‡uential study of nonexperimental estimators has been misunder- stood. Once analysts de…ne bias clearly, compare comparable people, know a little about

(7)

the unemployment histories of trainees and comparison group members, administer them the same questionnaire and place them in the same local labor market, much of the bias in using nonexperimental methods is attenuated. Variability in estimates across estimators arises from the fact that di¤erent nonexperimental estimators solve the selection problem under di¤erent assumptions, and these assumptions are often incompatible with each other. Only if there is no selection bias would all evaluation estimators identify the same parameter.

Seventh, three decades of experience with social experimentation have enhanced our understanding of the bene…ts and limitations of this approach to program evaluation. Like all evaluation methods, this method is based on implicit identifying assumptions. Experi- mental methods estimate the e¤ect of the program compared to no programs at all when they are used to evaluate the e¤ect of a program for which there are few good substitutes.

They are less e¤ective when evaluating ongoing programs in part because they appear to disrupt established bureaucratic procedures. The threat of disruption leads local bureaucrats to oppose their adoption. To the extent that programs are disrupted, the program evaluated by the method is not the ongoing program that one seeks to evaluate. The parameter estimated in experimental evaluations is often not likely to be of primary interest to policy makers and researchers, and under any event has to be more carefully interpreted than is commonly done in most public policy discussions. However, if there is no disruption, and the other problems that plague experiments are absent, the evidence from social experiments provides a benchmark for learning about the performance of alternative non-experimental methods.

Eighth, and …nally, programs implemented at a national or regional level a¤ect both participants and nonparticipants. The current practice in the entire “treatment e¤ect”

literature is to ignore the indirect e¤ects of programs on nonparticipants by assuming they are negligible. This practice can produce substantially misleading estimates of program impacts if indirect e¤ects are substantial. To account for the impacts of programs on both participants and nonparticipants, general equilibrium frameworks are required when programs substantially impact the economy.

The remainder of the chapter is organized as follows. In Section 2, we distinguish among several types of active labor market policies and describe the types of employment and training services o¤ered both in U.S. and in Europe, their approximate costs, and their intended e¤ects. We introduce the evaluation problem in Section 3. We discuss the importance of heterogeneity in the response to treatment for de…ning counterfactuals of interest.

We consider what economic questions the most widely used counterfactuals answer. In section 4, we present three prototypical solutions to the problem cast in terms of mean impacts. These prototypes are generalized throughout the rest of this chapter, but three basic principles introduced in this section underlie all approaches to program evaluation when the

(8)

parameters of interest are means or conditional means. In Section 5, we present conditions under which social experiments solve the evaluation problem and assess the e¤ectiveness of social experiments as a tool for evaluating employment and training programs. In Section 6, we outline two prototypical models of program participation and outcomes that represent the earliest and the latest thinking in the literature. We demonstrate the implications of these decision rules for the choice of an econometric evaluation estimator. We discuss the empirical evidence on the determinants of participation in government training programs.

The econometric models used to evaluate the impact of training programs in nonexperimental settings are described in Section 7. The interplay between the economics of program participation and the choice of an appropriate evaluation estimator is stressed. In Section 8, we discuss some of the lessons learned from implementing various approaches to evaluation. Included in this section are the results of a simulation analysis based on the empirical model of Ashenfelter and Card (1985), where we demonstrate the sensitivity of the performance of alternative estimators to assumptions about heterogeneity in impact among persons and other data generating processes of the underlying econometric model. We also reexamine LaLonde’s (1986) evidence on the performance of nonexperimental estimators and reinterpret the main lessons from his study.

Section 9 discusses the problems that arise in using microeconomic methods to evaluate programs with macroeconomic consequences. A striking example of the problems that can arise from this practice is provided. Two empirically operational general equilibrium frameworks are presented, and the lessons from applying them in practice are summarized.

Section 10 surveys the …ndings from the non-experimental literature, and contrasts them with those from experimental evaluations. We conclude in Section 11 by surveying the main methodological lessons learned from the program evaluation literature on job training.

(9)

2 Public Job Training and Active Labor Market Poli- cies

Many government policies a¤ect employment and wages. The “active labor market” policies we analyze have two important features that distinguish them from general policies, such as income taxes, that also a¤ect the labor market. First, they are targeted toward the unemployed or toward those with low skills or little work experience who have completed (usually at a low level) their formal schooling. Second, the policies are aimed at promoting employment and/or wage growth among this population, rather than just providing income support.

Table 2.1 describes the set of policies we consider. This set includes: (a) classroom training (CT) consisting of basic education to remedy de…ciencies in general skills or vocational training to provide the skills necessary for particular jobs; (b) subsidized employment with public or private employers (WE), which includes public service employment (wholly subsidized temporary government jobs) and work experience (subsidized entry-level jobs at public or non-pro…t employers designed to introduce young people to the world of work) as well as wage supplements and …xed payments to private …rms for hiring new workers;

(c) subsidies to private …rms for the provision of on-the-job training (OJT); (d) training in how to obtain a job; and (e) in-kind subsidies to job search such as referrals to employers and free access to job listings. Policies (d) and (e) fall under the general heading of job search assistance (JSA), which also includes the job matching services provided by the U.S.

Employment Service and similar agencies in other countries.

As we argue in more detail below, distinguishing the types of training provided is important for two reasons. First, di¤erent types of training often imply di¤erent economic models of training participation and impact and therefore di¤erent econometric estimation strategies. Second, because most existing training programs provide a mix of these services, heterogeneity in the impact of training becomes an important practical concern. As we show in Section 7, this heterogeneity has important implications for the choice of econometric methods for evaluating active labor market policies.

We do not analyze privately supplied job training despite its greater quantitative importance to modern economies (see Heckman, Lochner and Taber, 1998a, or Mincer, 1962, 1993). For example, in the United States, Jacob Mincer has estimated that such training amounts to approximately 4 to 5 percent of GDP, annually. Despite the magnitude of this investment there are surprisingly few publicly-available studies of the returns to private job training, and many of those that are available do not control convincingly for the non-random allocation of training among private sector workers. Governments demand publicly-justi…ed evaluations of training programs while private …rms, to the extent that

(10)

they formally evaluate their training programs, keep their …ndings to themselves. An emphasis on objective publicly accessible evaluations is a distinctive feature of the modern welfare state, especially in an era of limited funds and public demands for accountability.

Table 2.2 presents the amount spent on active labor market policies by a number of OECD countries. Most OECD countries provide some mix of the employment and training services described in Table 2.1. Di¤erences among countries include the relative emphasis on each type of service, the particular populations targeted for service, the total resources spent on the programs, how resources are allocated among programs and the extent to which employment and training services are integrated with other programs such as unemployment insurance or social assistance. In addition, although the programs we study are funded by governments, they are not always conducted by governments, especially in the U.S. and the U.K. In decentralized training systems, private …rms and local organizations play an important role in providing employment and training services.

Table 2.2 reveals that many OECD countries spend substantial sums on active labor market policies. In nearly all countries, total expenditures are more than one-third of total expenditures on unemployment bene…ts, and some countries’ expenditures on active labor market policies exceed those on unemployment bene…ts. Usually only a fraction of these expenditures are for CT. Further, even in countries that emphasize classroom training, governments spend substantial sums on other active labor market policies. Denmark spends 1 percent of its GDP on CT for adults, the most of any OECD country. However, this expenditure amounts to only 40 percent of its total spending on active labor market programs. Only in Canada is the fraction spent on CT larger. At the opposite extreme, Japan and the U.S. spend only 0.03 percent and 0.04 percent, respectively, of their GDP on CT. However, as the table shows, these two countries also spend the smallest share of GDP on active labor market policies.

The low percentage of GDP spent on active labor market programs in the U.S. has led some researchers to comment on the irony that despite these low expenditures, U.S.

programs have been evaluated more extensively and over a longer period of time than programs elsewhere (Haveman and Saks, 1985; Björklund, 1993). Indeed, much of what is known about the impacts of these programs and many of the methodological developments associated with evaluating them come from U.S. evaluations.¹

1However, the level of total expenditure in the U.S. is still quite large. Relative total expenditures on active labor market policies can be inferred from Table 2.2 using the relative sizes of each economy compared with the U.S. For example, the German economy is somewhat less than one-fourth the size of the U.S. economy, and the French, Italian and British economies are approximately one-sixth the size of the U.S. economy. Accordingly, training expenditures are somewhat greater in Germany and France, about the same in Italy, and less in the United Kingdom than in the U.S. See OECD, Employment Outlook (1996), Table 1.1, p.2.

(11)

We now consider in detail each type of employment and training service in Table 2.1.

This discussion motivates the consideration of alternative economic models of program participation and impact in Sections 6 and 7, and our focus on heterogeneity in program impacts. It also provides a context for the empirical literature on the impact these programs that we review in Section 10.

The …rst category listed in Table 2.1 is classroom training. In many countries, CT represents the largest fraction of government expenditures on active labor market policy, and most of that expenditure is devoted to vocational training. Even in the U.S., where remedial programs aimed at high school dropouts and other low-skill individuals play a larger role than elsewhere, most CT programs provide vocational training. By design, most CT programs in the OECD are of limited duration. For example in Denmark, CT typically lasts 2 to 4 weeks (Jensen, et al., 1993) while in Sweden a duration of four months and in the United Kingdom, and the United States three months is the more typical duration.

Per capita expenditures on such training varies substantially, with a training slot costing approximately $7,500 in Sweden and between $2,000 and $3,000 in the United States.² The Swedish …gures include stipends for participants while the U.S. …gures do not.

An important di¤erence among OECD countries that provide CT is the extent to which the training is relatively standardized and therefore less tailored to the requirements of …rms or the market in general. In the 1980s and early 1990s, the Nordic countries usually provide CT in government training centers that use standardized materials and teaching methods.

However, the emphasis has shifted recently, especially in Sweden, toward decentralized and …rm based training. In the United Kingdom and the U.S., the provision of CT is highly decentralized and its content depends on the choices made by local councils of business, political, and labor leaders. The local councils receive funding from the federal government and then subcontract for CT with private vocational and proprietary schools and local community colleges. Due to this highly decentralized structure, both participant characteristics and training content can vary substantially among locales, which suggests that the impact of training is likely to vary substantially across individuals in evaluations of such programs.

The second category of services listed in Table 2.1 are wage and employment subsidies.

This category encompasses several di¤erent speci…c services which we group together due to their analytic similarity. The simplest example of this type of policy provides subsidies to private …rms for hiring workers in particular groups. These subsidies may take the form of a …xed amount for each new employee hired or some fraction of the employee’s wage for a period of time. In the U.S., the Targeted Jobs Tax Credit is an example of this type of program. Heckman, Lochner, Smith and Taber (1997) discuss the empirical evidence on

2Unless otherwise indicated all monetary units are expressed in 1997 U.S. dollars.

(12)

the e¤ectiveness of wage and employment subsidies in greater detail.

Temporary work experience (WE) usually targets low skilled youth or adults with poor employment histories and provides them with a job lasting 3 to 12 months in the public or nonpro…t sector. The idea of these programs is to ease the transition of these groups into regular jobs, by helping them learn about the world of work and develop good work habits. Such programs constitute a very small proportion of U.S. training initiatives, but substantial fractions of services provided to youth in countries such as France (TUC) and the United Kingdom (Community Programmes). In public sector employment (PSE) programs, governments create temporary public sector jobs. These jobs usually require some amount of skill and are aimed at unemployed adults with recent work experience rather than youth or the disadvantaged. Except for a brief period during the late 1970s, they have not been used in the United States since the Depression era. However, they have been and remain an important component of active labor market policy in several European countries.

The third category in Table 2.1 is subsidized on-the-job training at private …rms. The goal of subsidized OJT programs is to induce employers provide job-relevant skills, including

…rm-speci…c skills, to disadvantaged workers. In the U.S., employers receive a 50 percent wage subsidy for up to six months; in the U.K. employers receive a lump sum per week (O’Higgins, 1994). Although evidence is limited and …rm training is di¢cult to measure, there is a widespread view that these programs in fact provide little training, even informal on-the-job training, and are better characterized as a work experience or wage subsidy program (e.g., Breen, 1988; Hutchinson and Church, 1989).³ Survey responses by employers who have hired or sponsored OJT trainees suggest that they value the program for its help in reducing the costs associated with hiring and retaining suitable employees more than for the opportunity to increase the skills of new workers (Begg, et al., 1991).

For purposes of evaluation, it is almost always impossible to distinguish those OJT experiences from which new skills were acquired from those that amounted to work experience or wage subsidy without a training component. In addition, because OJT is provided by individual employers, this indeterminacy is not simply a program-speci…c feature, but holds among individuals within the same program. Consequently, OJT programs will likely have heterogeneous e¤ects, and the impact, if any, of these programs will result from some combination of learning by doing, the usual training provided by the …rm to new workers

3The provision of subsidized OJT is particularly hard to monitor both because on-the-job training has proven di¢cult to measure with survey methods (Barron, Berger and Black, 1997) and because trainees often do not peceive that they have been treated any di¤erently than their co-workers who are not subsidized. In fact, both groups may have received substantial amounts of informal on-the-job training. For evidence of the importance of informal on-the-job training in the U.S., see Barron, Black and Lowenstein (1989).

(13)

and incremental training beyond that provided to unsubsidized workers.

The fourth category of services in Table 2.1 is job search assistance. The purpose of these services is to facilitate the matching process between workers and …rms both by reducing time unemployed and by increasing match quality. The programs are usually operated by the national or local employment service, but sometimes may be subcontracted out to third parties. Included under this category are direct placement in vacant jobs, employer referrals, in-kind subsidies to search such as free access to job listings and telephones for contacting employers, career counseling, and instruction in job search skills. The last of these, which often includes instruction in general social skills, was developed in the U.S., but is now used in U.K., Sweden, and recently France (Björklund and Regner, 1996, p.

24). In recent years, JSA has become more popular due to its low cost, usually just a few hundred dollars per participant, and relatively solid record of performance (which we discuss in detail in Section 10).

To conclude this section, we discuss …ve features of employment and training programs that should be kept in mind when evaluating them. First, as the operation of these programs has become more decentralized in OECD countries, there have emerged di¤erences between how these programs were designed and how they are implemented (Hollister and Freedman, 1988). Actual practice can deviate substantially from explicit written policy.⁴ Therefore, the evaluator must be careful to characterize the program as implemented when assessing its impacts.

Second, participants often receive services from more than one category in Table 2.1. For example, classroom training in vocational skills might be followed by job search assistance.

In the U.K., the Youth Training Scheme (now Youth Training) was explicitly designed to combine OJT with 13 weeks of CT. Some expensive programs combine several of the services listed in Table 2.1 into a single package. For example, in the U.S. the Job Corps program for youth combines classroom training with work experience and job search assistance in a residential setting at a current cost of around $19,000 per participant. Many available survey data sets do not identify all the services received by a participant. In this case, the practice of combining together various types of training, particularly when combinations are tailored to the needs of individual trainees as in the U.S. JTPA program, constitutes another source of heterogeneity in the impact of training. Even when administrative data are available that identify the services received, isolating the impact of particular individual services often proves di¢cult or impossible in practice due to the small samples receiving particular combinations of services or due to di¢culties in determining the process by which

4For example, see Breen (1988) and Hollister and Freedman (1990) describing the implementation of WEP in Ireland and Hollister and Freedman (1990) and Leigh (1995) describing the implementation of JTPA in the United States.

(14)

individuals come to receive particular service combinations.

Third, certain features of active labor market programs a¤ect individuals’ decisions to participate in training. In some countries, such as Sweden and the United Kingdom, participation in training is a condition for receiving unemployment bene…ts rather than less generous social assistance payments. In the U.S., participation is sometimes required by a court order in lieu of alternative punishment.

Fourth, program administrators often have considerable discretion over whom they admit into government training programs. This discretion results from the fact that the number of applicants often exceeds the number of available training positions. It has long been a feature of U.S. programs, but also has characterized programs in Austria, Denmark, Germany, Norway, and the United Kingdom (Björklund and Regner, 1996; Westergard- Neilsen, 1993; Kraus, et al., 1997). Consequently, when modeling participation in training, it may be important to account for not only individual incentives, but also those of the program operators. In Section 6, we discuss the incentives facing program operators and how they a¤ect the characteristics of participants in government training programs.

Finally, the di¤erent types of services require di¤erent economic models of program participation and impact. For example, the standard human capital model captures the essence of individual decisions to invest in vocational skills (CT). It provides little guidance to be- havior regarding job search assistance or wage subsidies. In Section 6 we describe economic models that describe participation in alterative programs and discuss their implications for evaluation research.

(15)

3 The Evaluation Problem and the Parameters of In- terest in Evaluating Social Programs

3.1 The Evaluation Problem

Constructing counterfactuals is the central problem in the literature on evaluating social programs. In the simplest form of the evaluation problem, persons are imagined as being able to occupy one of two mutually exclusive states: “0” for the untreated state and “1”

for the treated state. Treatment is associated with participation in the program being evaluated.⁵ Associated with each state is an outcome, or set of outcomes. It is easiest to think of each state as consisting of only a single outcome measure, such as earnings, but just as easily, we can use the framework to model vectors of outcomes such as earnings, employment and participation in welfare programs. In the models presented in section 6, we study an entire vector of earnings or employment at each age that result from program participation.

We can express these outcomes as a function of conditioning variables, X. Denote the potential outcomes by Y0 and Y1, corresponding to the untreated and treated states.

Each person has a (Y0; Y1) pair. Assuming that means exist, we may write the (vector) of outcomes in each state as

(3.1a) Y0 = ¹₀(X) + U0

(3.1b) Y1 = ¹₁(X) + U1

where E(Y0jX) = ¹0(X) and E(Y1jX) = ¹1(X): To simplify the notation, we keep the conditioning on X implicit unless it serves to clarify the exposition by making it explicit. The potential outcome actually realized depends on decisions made by individuals, …rms, fam- ilies or government bureaucrats. This model of potential outcomes is variously attributed to Fisher (1935), Neyman (1935), Roy (1951), Quandt (1972, 1988) or Rubin (1974).

To focus on main ideas, throughout most of this chapter we assume E(U1jX) = E(U0jX) = 0, although as we note at several places in this paper, this is not strictly required. For many of the estimators that we consider in this chapter we allow for the more general case

Y0 = g0(X) + U0

Y1 = g1(X) + U1

where E(U0 j X) 6= 0 and E(U¹ j X) 6= 0. Then ¹0(X) = g0(X) + E(U0jX) and ¹1(X) = g1(X) + E(U1jX).⁶ Thus X is not necessarily exogenous in the ordinary econometric usage

5In this paper, we only consider a two potential state model in order to focus on the main ideas.

Heckman (1998a) develops a multiple state model of potential outcomes for a large number of mutually exclusive states. The basic ideas in his work are captured in the two outcome models we present here.

6For example, an exogeneity assumption is not required when using social experiments to identify

(16)

of that term. These conditions do not imply that E(U1¡U⁰jX; D = 1) = 0. D may depend on U1, U0 or U1¡ U0 and X.

Note also that Y may be a vector of outcomes or a time series of potential outcomes: (Y0t; Y1t); for t = 1; : : : ; T , on the same type of variable. We will encounter the latter case when we analyze panel data on outcomes. In this case, there is usually a companion set of X variables which we will sometimes assume to be strictly exogenous in the conventional econometric meaning of that term: E(U0tjX) = 0; E(U^1tjX) = 0 where X = (X1;:::; XT): In de…ning a sequence of “treatment on the treated” parameters, E(Y1t¡ Y0tjX; D = 1) t = 1; : : : ; T; this assumption allows us to abstract from any dependence between U1t, U0t and X. It excludes di¤erences in U1t and U0t arising from X dependence and allows us to focus on di¤erences in outcomes solely attributable to D.

While convenient, this assumption is overly strong.

However, we stress that the exogeneity assumption in either cross section or panel contexts is only a matter of convenience and is not strictly required. What is required for an interpretable de…nition of the “treatment on the treated” parameter is avoiding conditioning on X variables caused by D even holding Y^P = ((Y01;Y11); : : : ; (Y0T;Y1T))

…xed where Y^P is the vector of potential outcomes. More precisely, we require that for the conditional density of the data

f (XjD; Y^P) = f (XjY^P)

i.e. we require that the realization of D does not determine X given the vector of potential outcomes. Otherwise, the parameter E(Y1¡ Y⁰jX; D = 1) does not capture the full e¤ect of treatment on the treated as it operates through all channels and certain other technical problems discussed in Heckman (1998a) arise. In order to obtain E(Y1t¡ Y0tjX; D = 1) de…ned on subsets of X; say Xc; simply integrate out E(Y_1t¡ Y0tjX; D) against the density f (X^fcjD = 1) where X^fc is the portion of X not in Xc: X = (Xc;Xfc).

Note, …nally, that the choice of a base state “0” is arbitrary. Clearly the roles of “0”

and “1” can be reversed. In the case of human capital investments, there is a natural base state. But for many other evaluation problems the choice of a base is arbitrary.

Assumptions appropriate for one choice of “0” and “1” need not carry over to the opposite choice. With this cautionary note in mind, we proceed as if a well-de…ned base state exists.

In many problems it is convenient to think of “0” as a benchmark “no treatment ” state. The gain to the individual of moving from “0” to “1” is given by

(3.2) ¢ = Y₁¡ Y0:

If one could observe both Y0 and Y1 for the same person at the same time, the gain ¢ would be known for each person. The fundamental evaluation problem arises because we do not know both coordinates of (Y1; Y₀) and hence ¢ for anybody. All approaches to

(17)

solving this problem attempt to estimate the missing data. These attempts to solve the evaluation problem di¤er in the assumptions they make about how the missing data are related to the available data, and what data are available. Most approaches to evaluation in the social sciences accept the impossibility of constructing ¢ for anyone. Instead, the evaluation problem is rede…ned from the individual level to the population level to estimate the mean of ¢, or some other aspect of the distribution of ¢, for various populations of interest. The question becomes what features of the distribution of ¢ should be of interest and for what populations should it be de…ned?

3.2 The Counterfactuals of Interest

There are many possible counterfactuals of interest for evaluating a social program. One might like to compare the state of the world in the presence of the program to the state of the world if the program were operated in a di¤erent way, or to the state of the world if the program did not exist at all, or to the state of the world if alternative programs were used to replace the present program. A full evaluation entails an enumeration of all outcomes of interest for all persons both in the current state of the world and in all the alternative states of interest, and a mechanism for valuing the outcomes in the di¤erent states.

Outcomes of interest in program evaluations include the direct bene…ts received, the level of behavioral variables for participants and nonparticipants and the payments for the program, for both participants and nonparticipants, including taxes levied to …nance a publicly provided program. These measures would be displayed for each individual in the economy to characterize each state of the world.

In a Robinson Crusoe economy, participation in a program is a well-de…ned event. In a modern economy, almost everyone participates in each social program either directly or indirectly. A training program a¤ects more than the trainees. It also a¤ects the persons with whom the trainees compete in the labor market, the …rms that hire them and the taxpayers who …nance the program. The impact of the program depends on the number and composition of the trainees. Participation in a program does not mean the same thing for all people.

The traditional evaluation literature usually de…nes the e¤ect of participation to be the e¤ect of the program on participants explicitly enrolled in the program. These are the “Direct E¤ects.” They exclude the e¤ects of a program that do not ‡ow from direct participation, known as the “Indirect E¤ects”. This distinction appears in the pioneering work of H. G. Lewis on measuring union relative wage e¤ects (Lewis, 1963). His insights apply more generally to all evaluation problems in social settings.

There may be indirect e¤ects for both direct participants and direct nonparticipants.

Thus a direct participant may pay taxes to support the program just as persons who do not

(18)

directly participate may also pay taxes. A …rm may be an indirect bene…ciary of the lower wages resulting from an expansion of the trained workforce. The conventional econometric and statistical literature ignores the indirect e¤ects of programs and equates “treatment”

outcomes with the direct outcome Y1 in the program state and “no treatment” with the direct outcome Y0 in the no program state.

Determining all outcomes in all states is not enough to evaluate a program. Another aspect of the evaluation problem is the valuation of the outcomes. In a democratic society, aggregation of the evaluations and the outcomes in a form useful for social deliberations also is required. Di¤erent persons may value the same state of the world di¤erently even if they experience the same “objective” outcomes and pay the same taxes. Preferences may be interdependent. Redistributive programs exist, in part, because of altruistic or parternalistic preferences. Persons may value the outcomes of other persons either posi- tively or negatively. Only if one person’s preferences are dominant (the idealized case of a social planner with a social welfare function) is there a unique evaluation of the outcomes associated for each possible state from each possible program.

The traditional program evaluation literature assumes that the valuation of the direct e¤ects of the program boil down to the e¤ect of the program on GDP. This assumption ignores the important point that di¤erent persons value the same outcomes di¤erently and that the democratic political process often entails coalitions of persons who value outcomes in di¤erent ways. Both e¢ciency and equity considerations may receive di¤erent weights from di¤erent groups. Di¤erent mechanisms for aggregating evaluations and resolving social con‡icts exist in di¤erent societies. Di¤erent types of information are required to evaluate a program under di¤erent modes of social decision making.

Both for pragmatic and political reasons, government social planners, statisticians or policy makers may value objective output measures di¤erently than the persons or insti- tutions being evaluated. The classic example is the value of nonmarket time (Greenberg, 1997). Traditional program evaluations exclude such valuations largely because of the dif-

…culty of inputting the value and quantity of nonmarket time. By doing this, however, these evaluations value labor supply in the market sector at the market wage, but value labor supply in the nonmarket sector at a zero wage. By contrast, individuals value labor supply in the nonmarket sector at their reservation wage. In this example, two di¤erent sets of preferences value the same outcomes di¤erently. In evaluating a social program in a society that places weight on individual preferences, it is appropriate to recognize personal evaluations and that the same outcome may be valued in di¤erent ways by di¤erent social actors.

Programs that embody redistributive objectives inherently involve di¤erent groups.

Even if the taxpayers and the recipients of the bene…ts of a program have the same preferences, their valuations of a program will, in general, di¤er. Altruistic considerations often

(19)

motivate such programs. These often entail private valuations of distributions of program impacts - how much recipients gain over what they would experience in the absence of the program. (See Heckman and Smith, 1993, 1995, 1998a and Heckman, Smith and Clements, 1997.)

Answers to many important evaluation questions require knowledge of the distribution of program gains especially for programs that have a redistributive objective or programs for which altruistic motivations play a role in motivating the existence of the program. Let D = 1 denote direct participation in the program and D = 0 denote direct nonparticipation.

To simplify the argument in this section, ignore any indirect e¤ects. From the standpoint of a detached observer of a social program who takes the base state values (denoted “0”) as those that would prevail in the absence of the program, it is of interest to know, among other things,

(A) the proportion of people taking the program who bene…t from it:

Pr(Y1 > Y0 j D = 1) = Pr(¢ > 0 j D = 1);

(B) the proportion of the total population bene…ting from the program:

Pr(Y1 > Y0 j D = 1) ¢ Pr(D = 1) = Pr(¢ > 0 j D = 1) ¢ Pr(D = 1);

(C) selected quantiles of the impact distribution

inf¢ f¢ : F (¢ j D = 1) > qg, where q is a quantile of the distribution and where “inf” is the smallest attainable value of ¢ that satis…es the condition stated in the braces;

(D) the distribution of gains at selected base state values:

F (¢j D = 1; Y0 = y₀);

(E) the increase in the level of outcomes above a certain threshold ¹y due to a policy:

Pr(Y1 > ¹yj D = 1) ¡ Pr(Y0 > ¹yj D = 1).

Measure (A) is of interest in determining how widely program gains are distributed among participants. Participants in the political process with preferences over distributions of program outcomes would be unlikely to assign the same weight to two programs with the same mean outcome, one of which produced favorable outcomes for only a few persons while the other distributed gains more broadly. When considering a program, it is of interest to determine the proportion of participants who are harmed as a result of program participation, indicated by Pr(Y1 < Y₀ j D = 1): Negative mean impact results might be acceptable if most participants gain from the program. These features of the outcome distribution are likely to be of interest to evaluators even if the persons studied do not know their Y0 and Y1 values in advance of participating in the program.

Measure (B) is the proportion of the entire population that bene…ts from the program, assuming that the costs of …nancing the program are broadly distributed and are not perceived to be related to the speci…c program being evaluated. If voters have correct

(20)

expectations about the joint distribution of outcomes, it is of interest to politicians to determine how widely program bene…ts are distributed. At the same time, large program gains received by a few persons may make it easier to organize interest groups in support of a program than if the same gains are distributed more widely.

Evaluators interested in the distribution of program bene…ts would be interested in measure (C). Evaluators who take a special interest in the impact of a program on recipients in the lower tail of the base state distribution would …nd measure (D) of interest. It reveals how the distribution of gains depends on the base state for participants. Measure (E) provides the answers to the question “do the distributions of gains for the participants dominate the distribution of outcomes if they did not participate?” (See Heckman, Smith and Clements, 1997; and Heckman and Smith, 1998a.) Expanding the scope of the discussion to evaluate the indirect e¤ects of the program makes it more likely that estimating distributional impacts is an important part in conducting program evaluations.

3.3 The Counterfactuals Most Commonly Estimated In The Lit- erature

The evaluation problem in its most general form for distributions of outcomes is formidable and is not considered in depth either in this chapter or in the literature. (Heckman and Smith, 1998a, and Heckman, Smith and Clements, 1997, consider identi…cation and estimation of counterfactual distributions.) Instead, in this chapter we focus on counterfactual means, and consider a form of the problem in which analysts have access to information on persons who are in one state or the other at any time, and for certain time periods there are some persons in both states, but there is no information on any single person who is in both states at the same time. As discussed in Heckman (1998a) and Heckman and Smith (1998a), a crucial assumption in the traditional evaluation literature is that the no treatment state approximates the no program state. This would be true if indirect e¤ects are negligible.

Most of the empirical work in the literature on evaluating government training programs focuses on means and in particular on one mean counterfactual: the mean direct e¤ect of treatment on those who take treatment. The transition from the individual to the group level counterfactual recognizes the inherent impossibility of observing the same person in both states at the same time. By dealing with aggregates, rather than individuals, it is sometimes possible to estimate group impact measures even though it may be impossible to measure the impacts of a program on any particular individual. To see this point more formally, consider the switching regression model with two regimes denoted by “1” and “0”

(Quandt, 1972). The observed outcome Y is given by

(21)

(3.3) Y = DY1+ (1¡ D)Y⁰:

When D = 1 we observe Y1; when D = 0 we observe Y0:

To cast the foregoing model in a more familiar-looking form, and to distinguish it from conventional regression models, express the means in (3.1a) and (3.1b) in more familiar linear regression form:

E(YjjX) = ¹j(X) = X¯_j; j = 0; 1.

With these expressions, substitute from (3.1a) and (3.1b) into (3.3) to obtain Y = D(¹₁(X) + U1) + (1¡ D)(¹0(X) + U0):

Rewriting,

Y = ¹₀(X) + D(¹₁(X)¡ ¹0(X) + U1¡ U⁰) + U0: Using the linear regression representation, we obtain

(3.4) Y = X¯₀ + D(X(¯₁¡ ¯0) + U1¡ U⁰) + U0:

Observe that from the de…nition of a conditional mean, E(U0 j X) = 0 and E(U1 j X) = 0:

The parameter most commonly invoked in the program evaluation literature, although not the one actually estimated in social experiments, or in most nonexperimental evaluations, is the e¤ect of randomly picking a person with characteristics X and moving that person from “0” to “1”:

E(Y₁¡ Y0jX) = E(¢jX):

In terms of the switching regression model this parameter is the coe¢cient on D in the

“regression” non-error component of following equation:

(3.5) Y = ¹₀(X) + D(¹₁(X)¡ ¹0(X)) +fU0+ D(U1¡ U0)g

= ¹₀(X) + D(E(¢jX)) + fU0+ D(U1¡ U0)g

= X¯₀+ DX(¯₁¡ ¯0) +fU⁰+ D(U1¡ U⁰)g where the term in braces is the “error.”

If the model is specialized so that there are K regressors plus an intercept and ¯₁ = (¯₁₀; : : : ; ¯_1K) and ¯₀ = (¯₀₀; : : : ¯_0K), where the intercepts occupy the …rst position, and the slope coe¢cients are the same in both regimes:

¯_1j = ¯_0j = ¯_j; j = 1; : : : ; K

(22)

and ¯₀₀= ¯₀ and ¯₁₀¡ ¯00= ®, the parameter under consideration reduces to ®:

(3.6) E(Y1¡ Y0jX) = ¯10¡ ¯00= ®:

The regression model for this special case maybe written as (3.7) Y = X¯ + D® +fU⁰+ D(U1 ¡ U⁰)g :

It is nonstandard from the standpoint of elementary econometrics because the error term has a component that switches on or o¤ with D. In general, its mean is not zero because E[U0+ D(U1¡ U⁰)] = E(U1¡ U⁰jD = 1) Pr(D = 1): If U¹ ¡ U⁰; or variables statistically dependent on it, help determine D, E(U1 ¡ U0 j D = 1) 6= 0. Intuitively, if persons who have high gains (U1 ¡ U0) are more likely to appear in the program, than this term is positive.

In practice most non-experimental and experimental studies do not estimate E(¢ j X).

Instead, most nonexperimental studies estimate the e¤ect of treatment on the treated, E(¢j X; D = 1): This parameter conditions on participation in the program as follows:

(3.8) E(¢jX; D = 1) = E(Y1¡ Y0jX; D = 1) = X(¯1¡ ¯0) + E(U1¡ U0jX; D = 1):

It is the coe¢cient on D in the non-error component of the following regression equation:

(3.9) Y = ¹₀(X) + D(E(¢jX; D = 1))

+fU0+ D [(U1¡ U0)¡ E(U1¡ U0jX; D = 1)]g

= X¯₀+ D(X(¯₁¡ ¯0) + E(U₁ ¡ U0jX; D = 1)) +fU⁰+ D [(U1¡ U⁰)¡ E(U¹¡ U⁰jX; D = 1)]g :

E(¢j X; D = 1) is a nonstandard parameter in conventional econometrics. It combines

“structural” parameters X(¯₁¡¯0) with the means of the unobservables (E(U1¡U0jX; D = 1)): It measures the average gain in the outcome for persons who choose to participate in a program compared to what they would have experienced in the base state. It computes the average gain in terms of both observables and unobservables. It is the latter that makes the parameter look nonstandard. Most econometric activity is devoted to separating ¯₀ and ¯₁ from the e¤ects of the regressors on U1 and U0. Parameter (3.8) combines these e¤ects.

This parameter is implicitly de…ned conditional on the current levels of participation in the program in society at large. Thus it recognizes social interaction. But at any point in time the aggregate participation level is just a single number, and the composition of trainees is …xed. From a single cross section of data, it is not possible to estimate how variation in the levels and composition of participants in a program a¤ect the parameter.

The two evaluation parameters we have just presented are the same if we assume that U1¡ U⁰ = 0, so the unobservables are common across the two states. From (3.9) we now have Y1¡Y0 = ¹₁(X)¡¹0(X) = X(¯₁¡¯0). The di¤erence between potential outcomes in the two states is a function of X but not of unobservables. Further specializing the model to one of intercept di¤erences (i.e. Y1¡Y⁰ = ®); requires that the di¤erence between potential

(23)

outcomes is a constant. The associated regression can be written as the familiar-looking dummy variable regression model:

(3.10) Y = X¯ + D® + U, where E(U) = 0:

The parameter ® is easy to interpret as a standard structural parameter and the speci…cation (3.10) looks conventional. In fact, model (3.10) dominates the conventional evaluation literature. The validity of many conventional instrumental variables methods and longitudinal estimation strategies is contingent on this speci…cation as we document below. The conventional econometric evaluation literature focuses on ®; or more rarely, X(¯₁ ¡ ¯0), and the selection problem arises from the correlation between D and U.

While familiar, the framework of (3.10) is very special. Potential outcomes (Y1; Y0) di¤er only by a constant (Y1¡ Y0 = ®). The best Y1 is the best Y0: All people gain or lose the same amount in going from “0” to “1”. There is no heterogeneity in gains. Even in the more general case, with ¹₁(X) and ¹₀(X) distinct, or ¯₁ 6= ¯0 in the linear regression representation, so long as U1 = U0among people with the same X, there is no heterogeneity in the outcomes moving from “0” to “1”. This assumed absence of heterogeneity in response to treatments is strong. When tested, it is almost always rejected (see Heckman, Smith and Clements, 1997 and the evidence presented below).

There is one case when U1 6= U0, where the two parameters of interests are still equal even though there is dispersion in gain ¢. This case occurs when

(3.11) E(U1¡ U0jX; D = 1) = 0:

Condition (3.11) arises when conditional on X; D does not explain or predict U1¡U0. This condition could arise if agents who select into state “1” from “0” either do not know or do not act on U1¡ U0, or information dependent on U1 ¡ U0, in making their decision to participate in the program. Ex post, there is heterogeneity, but ex ante it is not acted on in determining participation in the program.

When the gain does not a¤ect individuals’ decisions to participate in the program, the error terms (the terms in braces in (3.7) and (3.9)) have conventional properties. The only bias in estimating the coe¢cients on D in the regression models arise from the dependence between U0 and D just as the only source of bias in the common coe¢cient model is the covariance between U and D when E(U(X)) = 0. To see this point take the expectation of the terms in braces in (3.7) and (3.9), respectively, to obtain the following:

E(U0+ D(U1¡ U⁰)jX; D) = E(U⁰jX; D) and

E(U₀+ D [(U₁¡ U0)¡ E(U1¡ U0jX; D = 1)] jX; D) = E(U0j X; D).

(24)

A problem that remains when condition (3.11) holds is that, the D component in the error terms contributes a component of variance to the model and so makes the model heteroscedastic:

V ar(U0+ D(U1¡ U⁰)jX; D) = V ar(U⁰jX; D)

+2COV (U0; U1¡ U0jX; D)D + V ar(U1¡ U0jX; D)D:

The distinction between a model with U1 = U₀, and one with U1 6= U0, is fundamental to understanding modern developments in the program evaluation literature. When U1 = U0

and we condition on X, everyone with the same X has the same treatment e¤ect. The evaluation problem greatly simpli…es and one parameter answers all of the conceptually distinct evaluation questions we have posed. “Treatment on the treated” is the same as the e¤ect of taking a person at random and putting him/her into the program. The distributional questions (A)–(E) all have simple answers because everyone with the same X has the same ¢. Equation (3.10) is amenable to analysis by conventional econometric methods. Eliminating the covariance between D and U is the central problem in this model.

When U1 6= U0, but (3.11) characterizes the program being evaluated, most of the familiar econometric intuition remains valid. This is the “random coe¢cient” model with the coe¢cient on D “random” (from the standpoint of the observing economist), but uncorre- lated with D. The central problem in this model is covariance between U0 and D and the only additional econometric problem arises in accounting for heteroscedasticity in getting the right standard errors for the coe¢cients. In this case, the response to treatment varies among persons with the same X values. The mean e¤ect of treatment on the treated and the e¤ect of treatment on a randomly chosen person are the same.

In the general case when U1 6= U0 and (3.11) no longer holds, we enter a new world not covered in the traditional econometric evaluation literature. A variety of di¤erent treatment e¤ects can be de…ned. Conventional econometric procedures often break down or require substantial modi…cation. The error term for the model (3.5) has a non-zero mean.⁷ Both error terms are heteroscedastic. The distinctions among these three models — (a) the coe¢cient on D is …xed (given X) for everyone; (b) the coe¢cient on D is variable (given X), but does not help determine program participation; and (c) the coe¢cient on D is variable (given X) and does help determine program participation — are fundamental to this chapter and the entire literature on program evaluation.

7E[U0+ D(U1¡ U0)X] = E(U1¡ U0j X; D = 1) Pr(D = 1 j X) 6= 0:

(25)

3.4 Is Treatment on the Treated an Interesting Economic Para- meter?

What economic question does parameter (3.2) answer? How does it relate to the conventional parameter of interest in cost-bene…t analysis - the e¤ect of a program on GDP?

In order to relate the parameter (3.2) with the parameters needed to perform traditional cost-bene…t analysis, it is fruitful to consider a more general framework. Following our previous discussion, we consider two discrete states or sectors corresponding to direct participation and nonparticipation and a vector of policy variables ' that a¤ect the outcomes in both states and the allocation of persons to states or sectors. The policy variables may be discrete or continuous. Our framework departs from the conventional treatment e¤ect literature and allows for general equilibrium e¤ects.

Assuming that costless lump-sum transfers are possible, that a single social welfare function governs the distribution of resources and that prices re‡ect true opportunity costs, traditional cost-bene…t analysis (see, e.g., Harberger, 1971) seeks to determine the impact of programs on the total output of society. E¢ciency becomes the paramount criterion in this framework, with the distributional aspects of policies assumed to be taken care of by lump sum transfers and taxes engineered by an enlightened social planner. In this framework, impacts on total output are the only objects of interest in evaluating programs. The distribution of program impacts is assumed to be irrelevant. This framework is favorable to the use of mean outcomes to evaluate social programs.

Within the context of the simple framework discussed in Section 3.1, let Y1 and Y0 be individual output which trades at a constant relative price of “1” set externally and not a¤ected by the decisions of the agents we analyze. Alternatively, assume that the policies we consider do not alter relative prices. Let ' be a vector of policy variables which operate on all persons. These generate indirect e¤ects. c(') is the social cost of ' denominated in

“0” units. We assume that c(0) = 0 and that c is convex and increasing in '. Let N1(') be the number of persons in state “1” and N0(') be the number of persons in state “0”.

The total output of society is

N1(')E(Y1 j D = 1; ') + N⁰(')E(Y0 j D = 0; ') ¡ c(');

where N1(') + N0(') = ¹N is the total number of persons in society. For simplicity, we assume that all persons have the same person-speci…c characteristics X. Vector ' is general enough to include …nancial incentive variables for participation in the program as well as mandates that assign persons to a particular state. A policy may bene…t some and harm others.

(26)

Assume for convenience that the treatment choice and mean outcome functions are di¤erentiable and for the sake of argument further assume that ' is a scalar. Then the change in output in response to a marginal increase in ' from any given position is:

(3.12) ¢(') =

@N1(')

@' [E(Y₁ j D = 1; ') ¡ E(Y0 j D = 0; ')]+

N₁(')

"

@E(Y₁ j D = 1; ')

@'

#

+ N0(')

"

@E(Y₁ j D = 0; ')

@'

#

¡@c(')

@' :

The …rst term arises from the transfer of persons across sectors that is induced by the policy change. The second term arises from changes in output within each sector induced by the policy change. The third term is the marginal social cost of the change.

In principle, this measure could be estimated from time-series data on the change in aggregate GDP occurring after the program parameter ' is varied. Assuming a well-de…ned social welfare function and making the additional assumption that prices are constant at initial values, an increase in GDP evaluated at base period prices raises social welfare provided that feasible bundles can be constructed from the output after the social program parameter is varied so that all losers can be compensated. (See, e.g., La¤ont, 1989, p. 155, or the comprehensive discussion in Chipman and Moore, 1976).

If marginal policy changes have no e¤ect on intra-sector mean output, the bracketed elements in the second set of terms inside the braces are zero. In this case, the parameters of interest for evaluating the impact of the policy change on GDP are:

(i) @N1(')

@' ; the number of people entering or

leaving state 1.

(ii) E(Y1 j D = 1; ') ¡ E(Y0 j D = 0; ');the mean output di¤erence between sectors.

(iii) @c(')

@' ; the social marginal cost of the policy.

It is revealing that nowhere on this list are the parameters that receive the most attention in the econometric policy evaluation literature. (See, e.g., Heckman and Robb, 1985a).

These are “the e¤ect of treatment on the treated”:

(a) E(Y1¡ Y0 j D = 1,') or

(b) E(Y1 j ' = ¹') ¡ E(Y⁰ j ' = 0) where ' = ¹' sets N1(¹') = ¹N . This is the e¤ect of universal coverage for the program.