PhdThesis Adorno

PhdThesis Adorno
Dottorato di ricerca in
Metodologia Statistica per la Ricerca Scientifica
XIX ciclo
Università di Bologna
Program evaluation with continuous treatment:
theoretical considerations and
empirical application
Valentina Adorno
Dipartimento di Scienze Statistiche “P. Fortunati”
Marzo 2007
2
Dottorato di ricerca in
Metodologia Statistica per la Ricerca Scientifica
XIX ciclo
Università di Bologna
Program evaluation with continuous treatment:
theoretical considerations and
empirical application
Valentina Adorno
coordinatore
prof. Daniela Cocchi
tutor
prof. Attilio Gardini
co-tutor
prof. Guido Pellegrini
Settore Disciplinare
SECS-S/03
Dipartimento di Scienze Statistiche “P. Fortunati”
Marzo 2007
4
Preface
The ambitious aim of the thesis is to develop a matching estimator approach to evaluate causal effects of a policy intervention on some outcome variables in a continuous
treatment framework.
The main motivation of the research project derives from a raising and strong demand
of modern welfare states for objective knowledge about the effects of various government interventions and activities; someone can benefit and other can lose from such
programs. Assessment of these benefits and loses often plays an important role in policy decision-making. In particular, the evaluation problem of concern here is the ex-post
measurement of the impact of a policy reform or intervention on a set of well-defined
outcome variables.
Most of the relevant literature on the program evaluation deals with the estimation
of causal effects of a binary treatment on one or more outcomes of interest in a nonexperimental framework. In practice, however, treatment regimes need not to be binary
and individuals might be exposed to different levels or doses of treatment. In these situations, studying the impact of such treatment as if it were binary can mask some
important features on it. Moreover other parameters might be of interest. It could
be interesting, for example, learning about the form of the entire function of average
treatment effects over all possible values of the treatment levels.
The idea of estimating causal effects of a public intervention when the treatment variable is continuous comes from an economic problem we want to solve. It refers to the
evaluation of the impacts of the Law 488/92 in Italy, that is the most important policy
intervention to subsidize private capital accumulation in the poorest Italian regions in
the last decade. The questions we want to answer mainly refer to two aspects. Does
the law 488 affect performances of subsidized firms? And, are the effects different with
respect to the amount of public subsides? Then, the idea is to use the amount of received subsidy as a continuous treatment variable in order to estimate causal effects as
a function of the different treatment levels.
The empirical application has some particular characteristics that suggest to use a
matching estimation approach among the other methods commonly adopted in the
program evaluation literature. Then, some matching estimators will be proposed, together with the assumptions they require in order to estimate causal effects.
Then, the main objectives of this thesis refer to an analysis of program evaluation in a
continuous treatment setting, the development of an appropriate matching estimation
method and the analysis of impact of differences in treatment level on policy outcomes.
i
As empirical application, is proposed the case of the Law 488/1992 for the manufacturing sector in Italy. The final results show that the impact of L. 488 on subsidized
firms is positive and statistically significant: the firm growth outcome variables increase
higher in the subsidized firms than in non subsidized ones. Furthermore we find that
higher the level of incentive, higher the policy effect until a certain point, from which
the marginal impact is decreasing.
Analysis have been obtained using the statistical softwares STATA and SAS.
Structure of the thesis
An introduction to the program evaluation problem in the economic statistic context
together with some considerations on the interesting parameter to estimate is contained
in the Chapter 1. The aim is to contextualize in a statistical vision the problems to
solve in order to estimate causal effects of an intervention.
Chapter 2 concerns statistical tools used in program evaluation with a binary treatment
regime context. The aim is to give an idea to the reader about the state of the literature
in this field, and at the same time to explain methods that are important building
blocks for the further generalization to more complicate settings, such as a continuous
treatment regime, that is introduced in Chapter 3. It refers to an overview on the
relevant literature of programme evaluation with continuous treatment. From this,
some considerations on the proposed approaches are discussed. They constitute the
starting point of our research project (Chapter 4). A matching estimation approach is
proposed in order to estimate causal effects at particular values of the treatment levels.
Chapter 5 describes the dataset used in the application, while Chapter 6 contains the
final results of the application. Some final comments and conclusions are contained in
the Chapter 7, with a brief introduction to further developments.
ii
Acknowledgements
This thesis is the result of my three-year period of research at the Statistics Department
of the University of Bologna.
It is my great pleasure to thank all people who helped me in some ways in the realization of this work. I would like to thank Prof. Guido Pellegrini for his supervision,
precious suggestions and inspiration. He introduced and involved me in this interesting
research project, giving me the chance to learn and grow really much, not only from a
professional point of view. I am particularly grateful to Cristina Bernini for her suggestions and constant and encouraging support. I wish to thank Prof. Attilio Gardini
for the realization of this thesis.
My period of study at the University College of London was crucial for my research, so
I would like to thank all the people who made it possible. A special thank to Alessia
Matano for being a kind roommate and still a special friend. I am very grateful to my
PhD colleagues, Mariagiulia Matteucci and Laura Sardonini, who always unconditionally helped me. They are not only precious colleagues but very special friends. I also
would like to thank Marta Disegna, Caterina Liberati and Giula Roli for the fun time
and talks we had together.
Many thanks to all people I had close to me in these years. A very special thank goes
to my family, who always trusts and supports me, and to my friend Alex for his patience. A particular thought to Giulia who always helps me to keep in mind how much
important is a smile. Last but not least I would like to thank Furio for being close to
me in this final period: most of my energy comes from him.
iii
iv
Contents
Preface
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 Program evaluation
i
9
1.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.2
The Evaluation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3
Estimation of the counterfactual: problems to solve . . . . . . . . . . . .
11
1.4
Which Parameter of Interest? . . . . . . . . . . . . . . . . . . . . . . . .
12
2 The binary treatment case
15
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.2
The Parameters of Interest . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2.1
Homogeneous Treatment Effects . . . . . . . . . . . . . . . . . .
18
2.2.2
Heterogeneous Treatment Effects . . . . . . . . . . . . . . . . . .
19
Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.3.1
Experimental Data in Practice . . . . . . . . . . . . . . . . . . .
22
2.4
Non-Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.5
Instrumental Variable Estimator . . . . . . . . . . . . . . . . . . . . . .
25
2.5.1
Homogeneous Treatment Effect . . . . . . . . . . . . . . . . . . .
26
2.5.2
Heterogeneous Treatment Effect . . . . . . . . . . . . . . . . . . .
27
2.5.3
Instrumental variable approach in practice . . . . . . . . . . . . .
30
Heckman Selection model . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2.6.1
Homogeneous Treatment Effect . . . . . . . . . . . . . . . . . . .
32
2.6.2
Heterogeneous Treatment Effect . . . . . . . . . . . . . . . . . . .
34
2.6.3
Selection models in practice . . . . . . . . . . . . . . . . . . . . .
35
Difference-in-Difference Estimators . . . . . . . . . . . . . . . . . . . . .
35
2.7.1
Trend Adjusted Diff-in-Diff . . . . . . . . . . . . . . . . . . . . .
37
2.7.2
Difference in difference in practice . . . . . . . . . . . . . . . . .
38
Matching Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.3
2.6
2.7
2.8
1
2
CONTENTS
2.9
2.8.1
The Propensity score . . . . . . . . . . . . . . . . . . . . . . . . .
45
2.8.2
Matching Diff-in-Diff Approach . . . . . . . . . . . . . . . . . . .
47
2.8.3
Matching approach in practice . . . . . . . . . . . . . . . . . . .
48
Regression Discontinuity Estimators . . . . . . . . . . . . . . . . . . . .
49
2.9.1
53
Regression discontinuity Design in practice . . . . . . . . . . . .
3 The continuous treatment case
55
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
3.2
From binary to continuous treatment . . . . . . . . . . . . . . . . . . . .
55
3.3
An overview on the literature . . . . . . . . . . . . . . . . . . . . . . . .
57
3.4
Generalized propensity score: parametric approach . . . . . . . . . . . .
58
3.5
Some non-parametric approaches . . . . . . . . . . . . . . . . . . . . . .
61
3.5.1
Subclassification on the propensity score . . . . . . . . . . . . . .
63
3.5.2
Matching estimators . . . . . . . . . . . . . . . . . . . . . . . . .
65
Conclusions: our starting point . . . . . . . . . . . . . . . . . . . . . . .
71
3.6
4 A new approach to empirical estimation
73
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.2
The continuous treatment setting . . . . . . . . . . . . . . . . . . . . . .
73
4.2.1
The selection process . . . . . . . . . . . . . . . . . . . . . . . . .
75
The parameters of interest . . . . . . . . . . . . . . . . . . . . . . . . . .
77
4.3.1
Average treatment level effects . . . . . . . . . . . . . . . . . . .
78
4.3.2
Treatment dose function . . . . . . . . . . . . . . . . . . . . . . .
79
The random treatment level case . . . . . . . . . . . . . . . . . . . . . .
81
4.4.1
Estimation of the effects: a matching approach . . . . . . . . . .
82
4.4.2
Test for the independence assumption . . . . . . . . . . . . . . .
85
4.5
The non-random treatment level case . . . . . . . . . . . . . . . . . . . .
87
4.6
A new approach: the use of a matching procedure . . . . . . . . . . . . .
89
4.6.1
Structured form of the selection process . . . . . . . . . . . . . .
90
4.6.2
Non-structured form of the selection process . . . . . . . . . . . .
96
Reduction of the multidimensionality . . . . . . . . . . . . . . . . . . . .
98
4.7.1
Structured selection process . . . . . . . . . . . . . . . . . . . . .
99
4.7.2
Non-structured selection process . . . . . . . . . . . . . . . . . . 101
4.3
4.4
4.7
5 The law 488/1992 in Italy
103
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2
Capital subsidies in Italy . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
CONTENTS
5.3
5.4
3
The Law 488/92 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.1
The higher applicable subsidy . . . . . . . . . . . . . . . . . . . . 107
5.3.2
Why not a RDD approach? . . . . . . . . . . . . . . . . . . . . . 109
The data implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6 Application
115
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2
The treatment variable: amount of subsidies . . . . . . . . . . . . . . . . 115
6.3
The outcome variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.4
Structured from of the selection process . . . . . . . . . . . . . . . . . . 118
6.4.1
6.5
Impact of the Law 488 . . . . . . . . . . . . . . . . . . . . . . . . 120
Non-structured from of the selection process . . . . . . . . . . . . . . . . 130
6.5.1
Impact of the Law 488 . . . . . . . . . . . . . . . . . . . . . . . . 130
7 Conclusions
139
Bibliography
141
Appendices
149
A Integration of datasets
151
B Outcome variable distributions
155
4
CONTENTS
List of Tables
5.1
Intensity of the subsidies . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2
Distribution of projects according to main characteristics (1996-2000) . . 113
5.3
Summary of main covariates in the final dataset (1996-2000) . . . . . . . 114
6.1
Summary of the granted subsidies . . . . . . . . . . . . . . . . . . . . . . 116
6.2
Summary of the percent share of subsidies on the total investment . . . 116
6.3
Summary of outcome variables . . . . . . . . . . . . . . . . . . . . . . . 118
6.4
Structured form: Propensity score estimate . . . . . . . . . . . . . . . . 119
6.5
Structured form: Level of treatment estimate . . . . . . . . . . . . . . . 120
6.6
Structured form: TTE estimates radius=fix . . . . . . . . . . . . . . . . 121
6.7
Structured form: TTE estimates radius=mean . . . . . . . . . . . . . . . 121
6.8
Structured form: TTE estimates radius=std . . . . . . . . . . . . . . . . 122
6.9
Structured form: TTE estimates radius=mean . . . . . . . . . . . . . . . 122
6.10 OLS estimates: impact of the amount of subsidies on the treatment level
(structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.11 Quantile estimates: Employment (structured case) . . . . . . . . . . . . 126
6.12 Quantile estimates: Turnover and fixed assets (structured case) . . . . . 127
6.13 Non structured form: Propensity score estimate . . . . . . . . . . . . . . 130
6.14 Non-structured form: TTE estimates radius=fix . . . . . . . . . . . . . . 131
6.15 Non-structured form: TTE estimates radius=mean . . . . . . . . . . . . 131
6.16 Non-structured form: TTE estimates radius=std . . . . . . . . . . . . . 131
6.17 Non-structured form: TTE estimates radius=mean . . . . . . . . . . . . 132
6.18 OLS estimates: impact of the amount of subsidies on the treatment level
(non structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.19 Quantile estimates: Employment (non structured case) . . . . . . . . . . 135
6.20 Quantile estimates: Turnover and Fixed assets (non structured case) . . 136
5
6
LIST OF TABLES
A.1 From the original to the matched dataset . . . . . . . . . . . . . . . . . 152
A.2 Impact of data matching process on eligible projects distribution . . . . 154
List of Figures
6.1
Distribution of the treatment variable . . . . . . . . . . . . . . . . . . . 117
6.2
Predicted OLS estimates for treatment impact on the amount of subsidy
(structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3
Kernel estimates for treatment impact on the amount of subsidy (structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4
Kernel estimates for treatment impact on the amount of subsidy (structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.5
Predicted OLS estimates for treatment impact on the amount of subsidy
(non structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.6
Kernel estimates for treatment impact on the amount of subsidy (non
structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.7
Kernel estimates for treatment impact on the amount of subsidy (non
structured case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
B.1 Distribution of the outcome variable: turnover
. . . . . . . . . . . . . . 155
B.2 Distribution of the outcome variable: employment
. . . . . . . . . . . . 156
B.3 Distribution of the outcome variable: fixed assets . . . . . . . . . . . . . 156
B.4 Distribution of the outcome variable: gr. margin on turnover . . . . . . 157
B.5 Distribution of the outcome variable: per capita turnover
. . . . . . . . 157
B.6 Distribution of the outcome variable: debt charges on debt stock . . . . 158
7
8
LIST OF FIGURES
Chapter 1
Program evaluation
1.1
Introduction
A common characteristic of the modern welfare state is the demand for objective knowledge about the effects of various government interventions and activities; someone can
benefit and other can lose from such programs. Assessment of this benefits and loses
often plays an important role in policy decision-making. In this introduction, we want
to provide an understanding overview of the evaluation problem and to outline how,
along with adequate economic and statistical analysis, program evaluation can play a
fundamental role in modern welfare states.
1.2
The Evaluation Problem
Recently the literature on the economic statistic concerning the role of economic policies and their ability to produce effects in one desired direction is vast and continues
to growth, as many economics with modern welfare states have floundered and costs
of running welfare states have been increased. Even if the importance of this kind of
matters is well acknowledged, it refers to a complex argument, which is difficult to completely define. In general, a clearly and complete definition of the program evaluation
concept doesn’t exist, because of its different fields of application and aims at which
is referred. It can include very different concepts depending on the idea of the various
authors.
The evaluation problem of concern here is the measurement of the impact of a policy
reform or intervention on a set of well-defined outcome variables. Some example of these
9
10
CHAPTER 1. PROGRAM EVALUATION
policies can be childcare subsidy, training program and business incentives; relative
outcome variables could be, respectively, child’s exams results, individual employment
earnings or duration, job creation or business earnings. There are many references
in the literature which document the development of the evaluation policy analysis
in economics. In the labour market some examples are Heckman and Robb (1985),
Heckman et al. (1999) and Blundell et al. (2001) and Ichino et al. (2003) for Italy.
Some examples of application of economic development program evaluation aimed at
influencing business are Bondonio (2000) and Bondonio (2004), Bartik and Bingham
(1995), Boarnet and Bogart (1996), Dowall (1996) and Carlucci and Pellegrini (2003)
for Italy.
In general, these kind of studies could be classified by the level of analysis: macroeconomics evaluation is aimed to quantify effects on territory and economic system together with spillover implications, while in microeconomic evaluation the attention is
on the individual effects of the units which the program is addressed to. However, in
both cases, the main interest is the causal relationship between the program and the
observed variations in the variables of interest, that is the causal effect of a “treatment"
on an “outcome”. It should be noted that this is not the only potential evaluation question, but its importance is certainly fundamental to understand what kind of result has
been obtained.
At this point, the main problem is to identify what is, and how can be quantified
an effect; the first step for a correct evaluation question is to clearly define on what
effect we are looking for. This could be done specifying one or more variables of interest (outcomes), that identify the observable features of the population the policy is
addressed to. Secondly, it is fundamental to know the effect of what we are looking for;
it is necessary to identify one variable that denote the treatment state. In this sense,
the concept of effect denotes the variation caused by a policy intervention, and the
evaluation analysis concerns the methods applied to empirically verify the causality of
the intervention. This concept of causality plays a critical role in program evaluation.
Some references about this can be founded in Ichino (2002).
The evaluation problem, therefore, is to measure the impact of the program on an
outcome of interest (Y ) for each individual; this is commonly defined as the difference
between the value of Y observed after the units have been exposed to the intervention,
and the value that Y would have taken if the same unit had not being exposed to the
treatment. This latter value is defined as the counterfactual. This framework follows
the potential outcome approach as described and developed by Rubin (1974); it means
that ex-post only the outcome corresponding to the program in which the individuals
1.3. ESTIMATION OF THE COUNTERFACTUAL: PROBLEMS TO SOLVE
11
participate is observed. That is why it is called evaluation problem: it can be regarded
as a missing-data problem since, at a moment in time, each unit is either in the program
under consideration or not, but not both. If we could observe the outcome variable for
those in the program had they not participated, there would be no evaluation problem.
1.3
Estimation of the counterfactual: problems to solve
To solve the evaluation problem described above, the focus of all approaches presented
in literature is on the estimation of missing counterfactual data; they mainly differ in
the assumptions they make about how the missing data are related to the available
information. In any case, there are two major problems that arise in estimation of the
counterfactual: first, there could be some factors, independent from the implementation
of the program being evaluated, that can influence the outcome variable Y . The second
problem refers to the non-random nature of the selection process that designate treated
individuals.
The first problem derives from the fact that changes in the general conditions of
treated units may occur during the program intervention for reason independently from
the latter. For example, these types of variation may be caused by a general economic
growth of target individuals, and one may be mistakenly induced to overestimate the
real impact of the program. This type of problem is called omitted variable bias. It is
typical of one group design evaluation strategies, that is when only data from treated
units are considered. In this case the outcome variable is usually measured at different
times, pre and post intervention: mean impact estimations are obtained as pre-post
intervention differences. Nevertheless, they are unbiased estimations of the effect only
if there are no differences between the pre-intervention values of Y of the untreated and
the outcome values they had obtained if they were treated (counterfactual). If there is
some difference, it represents the degree of omitted variable bias: such a variation could
arise when there is an impact on the outcome due to some economic or social factors,
uncorrelated to the program intervention, that would have increased, or decreased the
value of Y even in the absence of the program. As a result, the main problem that an
evaluator has to deal with when a one group design strategy is adopted, is to control for
any exogenous factors that may affect the outcome during the program intervention.
The second difficulty in program evaluation arises because treatment assignment
is typically based on pre-intervention economic and social conditions. Nevertheless,
at the beginning of the program, target units may have a significantly different set of
12
CHAPTER 1. PROGRAM EVALUATION
conditions from the excluded individuals. It is typically the case when impact estimates
are obtained using a comparison group design strategy. A source of bias can be arisen
because the effect estimates are obtained by comparing the output of the treated and
the excluded; these are unbiased only if the difference between the pre-intervention set
of condition of the two groups is equal to zero. This type of variation measures the
size of the bias, referred to as selection bias, affecting the impact estimates. Hence,
when dealing with this strategy it is crucial to properly control for the intervention
differences, between treated and untreated units.
1.4
Which Parameter of Interest?
Another important issue that has to be taken into consideration to better understand
and fully describe the program evaluation problem regards the parameters and the
quantities that might be useful in the analysis of a modern welfare economy.
There are many possible counterfactuals of interest for evaluating a program; for
example it could be interesting to compare the state of the world in the presence of the
intervention to the state of the world if the program did not exist at all, or if alternative
programs were used. A full evaluation should consider all outcomes of interest for all
units, both in the current state and in all alternative state of interest, and a mechanism
for valuing these outcomes in the different states.
The fundamental elements of a well-done evaluation research, once appropriate outcomes measures has been identified, include the direct benefits received, the level of
behavioral variables and the payment for the program for both participants and non
participants.
The conventional econometric and statistic evaluation literature usually defines the
effect of participation to be the direct effect of the program on participants explicitly
enrolled in the program. It ignores the effects of a program that do not flow direct
participation, known as the indirect effects, and equates “treatment” outcomes with the
direct outcome variable in the program state and “no treatment” with the direct outcome
variable in the no program state. Among possible outcomes of interest, the traditional
econometric literature focuses on counterfactual means, with the conventional assumption that the no treatment state approximates the no program state; this would be true
if indirect effects are negligible. The transition from the individual to the group level
counterfactual recognizes the impossibility of observing the same person in both states
at the same time. This represents the statistical solution to the problem of causality:
1.4. WHICH PARAMETER OF INTEREST?
13
given that the causal effect for a single individual i cannot be observed, the statistical
solution proposes to compute the average causal effect for the entire population or for
some interesting sub-groups.
This wants to be just a brief introduction of the issue regarding the interesting
quantities to estimate. As it might be easily understood, it represents the starting point
of any proposed studies in the evaluation literature and it depends on the framework the
analysis refer to. In particular, there is a fundamental distinction that will characterize
this thesis: the program evaluation issue will be studied starting with the easier case of
a binary treatment intervention. The next step is to extend the analysis and to widen its
applicability by allowing for continuous treatment regimes. Thus, in the next chapters
the issue regarding the interesting quantities to estimate will be better defined, and the
traditional parameters of interest will be presented and discussed, keeping separated
this two treatment regime cases.
14
CHAPTER 1. PROGRAM EVALUATION
Chapter 2
The binary treatment case
2.1
Introduction
This chapter concerns statistical tools used in program evaluation with a binary treatment regime context. The attention is focused on statistical techniques for solving the
evaluation problem in order to estimate impacts of an intervention on some well defined
outcomes of interest. The aim is to give an idea to the reader about the state of the
literature in this field, and at the same time to explain methods that are important
building blocks for the further generalization to more complicate settings, such as a
continuous treatment regime.
The binary treatment case represents the starting point not only for our analysis,
but for all studies in the program evaluation literature. It represents the most intuitive
and easier case, and in some situation the only possible one. It is the case, for example,
of an equal medical treatment for some patients or a job training program for some
individuals. In this situation, units can be classified only in two group: the participants
and the non-participants.
In the following sections, after a formal description of the binary treatment setting
and of the conventional parameters of interest, there will be given an overview of the
statical tools and traditional approaches proposed in literature to estimate the effects
of a program, following the paper of Blundell and Costa Dias (2002), that summarizes
the alternative approaches to evaluation in empirical microeconometrics.
15
16
CHAPTER 2. THE BINARY TREATMENT CASE
2.2
The Parameters of Interest
As mentioned in the previous chapter, there are many possible counterfactuals of interest
for evaluating a program. Once established that the objective of the analysis is the expost evaluation of impacts of an intervention, the statistical solution to the problem
of causality is represented by the transition from the individual to the group level
counterfactual: given that the causal effect for a single individual cannot be observed,
the solution is to compute the average causal effect for the entire population or for some
interesting sub-groups. Following this part of literature and using the now standard
potential outcome notation, introduced by Rubin (1974), we can consider N units,
indexed by i = 1, . . . N , viewed as drawn randomly from a large population, and a
policy intervention introduced at time k, for which we want to measure the impact
on some outcome variable Y . Let consider the case where D is a dummy variable
representing the treatment status, assuming value 0 if the unit has not been treated
and 1 otherwise. With this binary treatment variable, each units is characterized by a
pair of potential outcome: Yit1 for the outcome under the active treatment, Yit0 otherwise.
This outcome Y is assumed to depend on a set of covariates X through a particular
relationship concerning the participation status in each period t. The equation of the
outcomes can be generically represented as:
Yit1 = ft1 (Xi ) + Uit1
Yit0 = ft0 (Xi ) + Uit0
(2.1)
where the subscripts i and t identify the unit and the time period, while the superscript
stands for the treatment status. The functions f 0 and f 1 represent the relationships
between the set of covariates X and the potential outcome Y 0 e Y 1 , while the terms
U 0 e U 1 identify the mean zero error terms, assumed to be uncorrelated with the X.
This vector of variables is assumed known at the participation decision time, and not
influenced by treatment. That is why the time subscription is excluded. Then, the
general observed outcome Y can be written as:
Yit = Dit Yit1 + (1 − Dit )Yit0 .
(2.2)
When D = 1 we observed Y 1 , when D = 0 we observed Y 0 .
The definition of potential outcome already made implicitly uses the assumption of
no interference between units or, the so called, stable-unit-treatment-value assumption
2.2. THE PARAMETERS OF INTEREST
17
(SUTVA). It assumes that the potential outcome Y 0 and Y 1 of individual i are not
affected by the allocation of other individuals to the treatments. In other words, it
is assumed that the observed outcome Yit depends only on the treatment to which
individual i is assigned and not on the allocation of other individuals.
Moreover we assume that the participation decision can be identify by the following
rules:
Ii = h(Wi ) + Vi
that means for each units there is an index I, function of the set of variable W , for
which participation occurs when it raises above zero. Vi is the error term and
Dit = 1 if Ii > 0 and t > k
Dit = 0 otherwise.
When specifical application are considered, there are several fundamental decision to
be taken in account: first, except for experimental data, participation to treatment is not
random. For that reason the correlation between the outcome error term (U 0 , U 1 ) and
the enrolment in the program, represented by Dit , could not be equal to zero. That could
be happen because individual non-observable characteristics leading to the participation
decision could affect the outcome Y , and thus some correlation between the error term
and the participation variable is expected. Any method that fails to consider this
problem is not able to identify the correct parameter of interest. Secondly, an important
decision to take, is whether to assume heterogeneous or homogeneous treatment effects.
Typically, we do not expect the policy intervention affects all individual in exactly the
same way. That is, there will be heterogeneity in the impact across units. Consequently,
possible questions that evaluation methods attempt to answer could be different. As
previous mentioned, the most commonly in the evaluation literature are average effects
on individuals of a certain type. To be more precisely, the different parameter of interest
measured in period t > k can be expressed as:
• Average Treatment Effect:
αAT E = E[αit |X = Xi ]
that represents the population average treatment effect,
18
CHAPTER 2. THE BINARY TREATMENT CASE
• Average Treatment on the Treated Effect:
αT T E = E[αit |X = Xi , Dt = 1]
that is the effect of treatment on units that were assigned to treatment,
• Average Treatment on the Untreated Effect:
αT U = E[αit |X = Xi , Dt = 0]
that is commonly interesting for decision about some treatment extended to a
group formally excluded from the treatment.
The parameter that had received more interest in the current literature is the αT T E ;
Heckman and Robb (1985) and Heckman et al. (1997b) argued that the subpopulation
of treated units is often of more interest than the overall population in the context of
narrowly targeted programs. For example, if a program is to encourage business in
particular depressed areas, there is often little interest to evaluate such program on
areas that has not this kind of disadvantages. Of course the treatment D might have an
effect not only on the mean of the outcome, it might influence the whole distribution.
In some situation this could be what we are interested in, in other could not be.
Only under the assumption of homogeneous treatment effect all these parameter are
identical, but it is obviously not true when treatment effects vary across individuals.
Let consider separately this two different situations.
2.2.1
Homogeneous Treatment Effects
This is the simplest case; effect is assumed to be constant across units. So that
αt = αit (X) = ft1 (Xi ) − ft0 (Xi )
with t > k, ∀ i
This means that all people gain or lose the same amount in going from the state “0”
to the state “1”. So long as U 0 = U 1 among people with the same X, there is no
heterogeneity in the outcomes moving from “no treated” versus “treated”. The outocme
equation (2.1) can be re-written as
2.2. THE PARAMETERS OF INTEREST
19
Yit = ft0 (Xi ) + αt Dit + Uit
(2.3)
However this assumption of absence of heterogeneity in response to treatment is very
strong, and when tested, it is almost always rejected (see Heckman et al. (1997a)).
2.2.2
Heterogeneous Treatment Effects
It seems reasonable to assume that treatment impact varies across individuals; this
could be come systematically through the observable component, or be a part of the
unobservables. The outcome equation (2.2) can be written as:
Yit = Dit Yit1 + (1 − Dit )Yit0 = ft0 (Xi ) + αt (Xi )Dit + [Uit0 + Dit (Uit1 − Uit0 )]
(2.4)
where
αt (Xi ) = E[αit (Xi )] = ft1 (Xi ) − ft0 (Xi )
is the expected treatment effect at time t among units characterized by Xi .
Obviously, the additional problems with this heterogeneous setting concern the observables and their role in the identification of the parameter of interest and the form
of the error term. This can differ across observations, according to the treatment status
of each unit. If there is selection on observables, the OLS estimator, after controlling
for the covariates X, is inconsistent for αt (X), identifying the following parameter:
E[α̂t (X)] = αt (X) + E[Ut1 |X, Dit = 1] − E[Ut0 |X, Dit = 0]
with t > k
There is one case when, even if there is a dispersion in the treatment effect and
the unobservables components are different among the two groups (Uit1 6= Uit0 ), the two
parameters of interest αAT E and αT T E are still equal. This is the case when
E[Uit1 − Uit0 |X = Xi , Dit = 1] = 0
that arises when, conditional on X, D does not explain or predict Uit1 − Uit0 . This is
the case when individuals who select into state 1 or 0 do not know Uit1 − Uit0 in making
their decision to participate in the program.
20
CHAPTER 2. THE BINARY TREATMENT CASE
The distinction between a model with U 1 = U 0 and one with U 1 6= U 0 is fundamental to understanding modern developments in the program evaluation literature.
When we condition on X and U 1 = U 0 , it means that everyone with the same value of
X has the same treatment effect, and this is a quite strong assumption. Nevertheless,
this setting greatly simplifies the evaluation problem; one parameter is able to answer
all of the conceptually evaluation question we have presented. The “treatment on the
treated effect” can be viewed as the same effect of taking an individual at random and
putting him into the program. Otherwise, when U 1 6= U 0 there are a variety of different
treatment effects that can be defined. In this case, conventional econometric procedures
often break down or require substantial modifications.
2.3
Experimental Data
One of the solution of the evaluation problem could be the experimental data setting
because, as mentioned previously, it provides the correct missing counterfactual. In
recent years, the use of experimental designs, for example to evaluate North America
employment and training programs, has rapidly increased. This approach has been less
common in Europe, though a small number of experiments have been conducted in
Britain, Norway and Sweden. The impact estimate of these experiments, often called
social experiments, are easy for analyst to calculate and for policymakers to understand.
They also can be used to study the properties of the other methodologies applied to
program evaluation. That is, a comparison of results from non experimental data to
those obtained from experimental data can help to assess appropriate methods where
experimental data are not available.
In many ways this method is the most convincing one, since it directly constructs
a comparison, or control, group. In this way it may be overcome the missing data
problem and avoided the problem of causal effects identification. The advantages of
experimental data are discussed in Hausman and Wise (1985). They were based on
earlier statistical experiment developments (see for example Fisher (1951)).
The contribution of this type of data is to rule out bias caused by self-selection,
as individuals are randomly assigned to the program. By design, the experiment D
will be independent of any kind of influence on the outcome Y , whether observed or
unobserved. To see why, let image an experiment in which is chosen to participate in
a program a random sample from a group of eligible units. Within this focus group,
assignment to treatment is completely independent either from any outcome or the
2.3. EXPERIMENTAL DATA
21
treatment effect. Under the assumption of no spillover effect, the two groups, treated
and excluded, are statistically equivalent in all dimension, except treatment status.
In the case of homogeneous effect the treatment impact can be measured by a simple
difference between the mean outcomes of the treated and the untreated:
α̂AT E = α̂T T E = Ȳt1 − Ȳt0 = E[Yit |Dit = 1] − E[Yit |Dit = 0] t > k
(2.5)
where Ȳt1 and Ȳt0 are respectively the mean of the treated and non-treated outcomes at
time t after the program. Notice that, within the context of the model of equation (2.1),
the method of social experiment does not set either E[Uit1 |X, D = 1] or E[Uit0 |X, D = 1]
equal to zero. It simply balances the selection bias in the treatment and control groups.
To obtain an estimator of the causal effect in (2.5) it might be simply taken the
difference in the sample mean equivalent of the population moments between the two
groups. There is no particular problem to calculate the causal effects in this way, but
it can be useful to derive it via regression model. Let consider the model
Yit = β0t + β1t Dit + εit
with t > k.
The OLS estimate of β1t will be an estimation of the causal effect since is the difference
in the mean of the outcome Y between the control and the treatment group. The
regression model presented here has the advantage that it will also give immediately an
estimation of the standard error on the coefficient of D that can be used for inference
purposes. Another advantage derives from the fact that it can naturally be generalized
to the case of continuous treatment simply running a regression of Y on D. Obviously
the assumption of the linearity relationship between the two variables is quite strong
and must be verified.
As previous mentioned, when we move into the case of heterogeneous effects we
need to provide some summarize of the treatment impact because of the impossibility
of observing the same person in both states at the same time. This raises the question
of what kind of effect we are interested in. In the homogeneous structure this is not a
meaningful question, because all the parameter of interest mentioned above are equal.
It is easy to demonstrate that the average treatment effect, ATE, can be consistently
estimated by the coefficient on Dit of the regression of Yit on Dit (see Stock and Watson
22
CHAPTER 2. THE BINARY TREATMENT CASE
(2003)). Simply re-write the model in this way:
Yit = β0t + β̄1t Dit + (β1it − β1t )Dit + εit = β0t + β̄1t Dit + uit
Note that the unobservable component uit is mean-independent of Dit so that the OLS
estimate of the coefficient on D will be the ATE.
Of course social experiments have their own drawbacks: they might not be feasible
at all for political, ethical, logistic, or financial reason and they are rare in economics
and typically expensive to implement. Moreover, the result of this type of estimation
can be invalidated by a number of disrupting factors associated with the experimental
design: for example there could be some dropouts, especially among the non-treatment
group (see Heckman et al. (1999)). This process could be non-random and the statistical
equivalence should not more hold. To ensure random assignment, at least with respect
to observables, it can be compared the observable characteristics of both the treatment
and control groups. Moreover, further differentiating factors are introduced when some
experimental controls may search for alternative programs and likely to succeed. On
the other hand, some treatment units can decide to leave the program. This is the so
called problem of the partial compliance. Furthermore, the individual may also change
their behavior as a consequence of the experiment itself. All these kind of factors
may invalidate the fundamental characteristic of the experimental approach and the
consistency of the presented estimator. Finally, most practical applications suffer the
fact that samples are small. This usually implies the presence of observable factors X,
which are not be balanced completely by the process of randomization.
2.3.1
Experimental Data in Practice
A detailed list of the different social experiments taken place during the 60th, especially
in the US, can be found in Greenberg and Shroder (1997). Among these several social
experiments that have been carried out, the best known are the training and employment experiments of the United States: the National Supported Work Demonstration
(NSWD) and the National JTPA Studies.
The NSWD, one of the first training and employment social experiment, was carried
out in 10 different sites across the US. It was planned to help workers in disadvantaged
conditions, in particular women in receipt of AFDC (aid for families with dependent
children), ex-criminal transgressor, ex-drugs addicts, and in general economically disadvantaged youths. The program provided the applicants with a guarantee job for 9 to
2.4. NON-EXPERIMENTAL DATA
23
18 months. Qualified candidates were randomly assigned to treatment: one half of the
individuals were assigned to the treatment group and the other half was precluded to
received it and was assigned to the control group. The impact of the intervention was
measured on the earnings and on rate of employment.
This program is an example of the so called pilot programs or demonstrations, in
which the intervention are quite easy to implement and conduct. That is why they
are stronger from the “internal validity” point of view. On the other hand, it is more
difficult to generalize outside the conclusions drawn from this kind of experimental
situation because for example the sample and the program might be non representative.
In general, they have weak properties in term of “external validity”.
It is opposite the case of the so called ongoing programs, in which the results are
more robust but they are very difficult and complicated to implement. This is the case
of the JTPA studies carried out between the 1984 and 1996 in 16 sites of the US (see
Orr et al. (1996)): the focus of the analysis is on the evaluation of different ongoing
training and employment randomly assigned programs on the social-economic situation
of disadvantaged individuals.
Another application of a ongoing social experiment is the study of the Self-Sufficiency
Project (see Card and Robins (1998)): it evaluates labor supply responses of single
mother in British Columbia. Half of the candidate were recorded, but they were randomly executed from the program. The analysis shows a significant evidence of the
effectiveness of financial incentives in inducing welfare receipt into work.
Social experiment are also useful to understand and to study the properties of other
methodologies applied to program evaluation. For example the main focus of the studies
carried out by LaLonde (1986) is to underline the difference between experimental
and non-experimental data. This has been done by comparing the results obtained
applying different type of non-experimental estimation methodologies using the same
experimental dataset.
2.4
Non-Experimental Data
In program evaluation the randomization assignment is generally considered as the gold
standard (Orr (1999)). In the natural sciences, in particular in medical application, this
is especially true; interventions are randomized to different people on an individual basis.
Policy interventions affecting in general the economic system are certainly different;
often they do not lend themselves to control for implementation, and even more often
24
CHAPTER 2. THE BINARY TREATMENT CASE
they are implemented before a controlled experiment can be designed and executed.
By contrast to the experimental analysis, in non-experimental, or observational,
studies the data are not derived in a process that is completely under the researcher
control. For example a government authority might have offered the program to particular areas or specific individuals, in order to improve their condition or because it is
believed that they held favorable expectations regarding the impact of the program.
The main objective of any observational study is to use observable information in
an appropriate way, in order to replace the comparability of treatment and control
group by an appropriate alternative identification condition. Analysts must replace
the counterfactual missing data with data on the non-participants, together with some
assumptions that can make comparable the two groups. The objective is to use the
available information such that, in sub-population defined by these observables any
remaining differences between treated and non-treated might be attribuite to the program. Hence, in general, non-experimental data are more difficult to deal with and
require special care. Another problem arises if it is considered that in general there
exists a set of assumptions necessary to identify the effect of a program that cannot be
verified with the data. It also happens with social experiment where the assumption
of no randomization bias cannot be tested from experimental data. Both approaches
require assumptions that might be not verified without a specifical collection of data,
properly designed to test these assumptions.
When the control group is drawn from the population at large, even if strict comparability rules based on observable information are satisfied, which is really quite hard
to achieve, it cannot be ensured that there are not differences in unobservables related
to the program participation. This is the econometric selection problem as defined by
Heckman (1979). In this case using the estimator (2.5) results in a non-identification
problem since, abstracting from other regressors in the outcome equation, it approximates:
E[α̂AT E ] = α + {E[Uit |di = 1] − E[Uit |di = 0]}.
In the case where E[Uit di ] 6= 0 (selection on unobservable), E[α̂AT E ] is expected to
differ from α, unless the two final r.h.s. terms cancel out. Then, different estimators
are needed.
The different methodologies for solving the evaluation problem with non-experimental
data mainly depend on four factors: the type of information available to the researcher,
the parameters of interest, the underlying model and how the counterfactual is constructed. As regards this, each class of estimators differs in the way it transforms
2.5. INSTRUMENTAL VARIABLE ESTIMATOR
25
or adjusts the data, in order to estimate the counterfactual part. For the different
approaches presented in this review, it will be considered how the various estimators
construct this counterfactual and what kind of assumptions they require.
The following sections will present methods dealing with non-experimental data,
characterizing the identification assumptions necessary to justify their application, possible reasons for their failure and empirical studies existing in literature. The focus is
on the problem of identification, while the issue of precision or sampling variability is
not discussed. Nevertheless, this is a fundamental issue for all practical applications; a
reported effect estimate is absolutely worthless if it is not accompanied by a measure
of variability.
Five distinct but related approaches existing in the program evaluation literature
are considered. The next section starts with discussing the Instrumental Variable that
is, together with the two-step Heckman selection estimator, closest to the structural
econometric method, since it relies on the endogeneity problem. If longitudinal or
repeated cross-section data are available, the diff-in-diff approach can be applied to
obtain a more robust estimate of the treatment effects. The third approach, the so
called matching estimator, requires detailed individual information for both participant
and non-participant group, before and after the program. The intuition behind this nonparametric approach is to simulate with non-experimental data the randomize control
of the experimental setting through independence assumption. When the participation
rules is deterministic defined, the regression discontinuity design method provides a
good way to deal with the selection bias between the two groups. In some sense the
assignment rule here is the opposite to that in social experiment, but it turns out that it
is good as random in a neighborhood of the critical value that defines the discontinuity
point.
For each estimator I will discuss how the counterfactual part is estimated, together
with the identification of the treatment impact in a homogeneous and heterogeneous
setting, as well as the required assumptions, advantages and disadvantages.
2.5
Instrumental Variable Estimator
The main idea of this econometric approach is that one or more observable characteristics of individuals may well induce people to participate in a program, but have no direct
consequences on outcome variables. In that way, a comparison of the average outcomes
between participant and nonparticipant with different values on this characteristics can
26
CHAPTER 2. THE BINARY TREATMENT CASE
replace the comparison of participants with a randomized control group. The correspondence between randomized experiment and instrumental variable approach can be
found in Heckman (1996).
In a more formal way the IV method require the existence of at least one regressor,
Z, which satisfies these two following conditions:
IV1 The participant status defined by the decision rule, conditional on X, is a nontrivial function of Z.
This means that P r(D = 1|X, Z) does not depend in a constant way on both X
and Z, that is the instrument Z affects the decision rule independently from X; in
the specification of this rule the Z coefficients are non-zero. Thus, E[D|X, Z] =
P (D = 1|X, Z) 6= P (D = 1|X).
IV2 The regressor Z, conditional on X, is not correlated with the unobservable components (U 0 , V ) and (U 1 , V ).
This implies that E(U 0 , Z) = E(U 1 , Z) = E(V, Z) = 0. This assumption means
that the instrument Z has no influence on the outcome equation Y through the
unobservable component, but it affects the outcome only trough the participation
rule. In a homogeneous framework this means that only the level is affected by
Z, while under heterogeneous setting the particular values of X determine how
much is the influence of Z on Y .
These two assumptions together mean that the instruments Z provides a form of variation that is correlated with the decision rule but does not influence the potential
outcome from treatment directly. This approach seems to be very easy to understand
and also to implement. However in the treatment evaluation problem the instrument
choice is not so easy to solve: it is not so trivial to find a variable that satisfies all the
assumptions required to identify α. A possible solution is to consider lagged values of
some determinant variable, when longitudinal data are available.
2.5.1
Homogeneous Treatment Effect
Under the above conditions, the treatment effect α is identified by applying the standard
IV procedure; it is used only the part of the variation in D that is associated with Z.
Thus, the instrumental variable estimator can be written as
α̂IV =
cov(yi , Zi )
cov(di , Zi )
2.5. INSTRUMENTAL VARIABLE ESTIMATOR
27
A special case arises when the instrument Z is a discrete variable that takes only the
value 1 for a group of the observations and the value 0 for the remaining observations.
In this case the instrumental estimator is equivalent to the following Wald estimator:
Y¯1 − Y¯0
W ald
α̂IV
= ¯1
X − X¯0
where Y¯1 is the mean of Y across the observation with Z = 1 and Y¯0 is the mean of Y
across the observation with Z = 0 and the analogous for X.
Alternatively, the effect of the homogeneous treatment might be found using both
Z and X to predict D: a new variable D̂ is builded in order to be used in the regression
instead of D. Another possibility is directly derived from the first assumption: it states
that there must be at least two values of Z, for example Z 0 and Z 00 , such that, for any
X, P (D = 1|X, Z 0 ) 6= P (D = 1|X, Z 00 ). Moreover, applying the second assumption and
equation (2.3), it might be used the law of iterated expectation to write:
E[Y |X, Z] = f 0 (X) + α P (D = 1|X, Z)
Then, the IV estimator is:
α̂IV =
2.5.2
E[Y |X, Z 0 ] − E[Y |X, Z 00 ]
P (D = 1|Z 0 ) − P (D = 1|Z 00 )
Heterogeneous Treatment Effect
Some problems might arise when the impact of a program is evaluated in a heterogeneous
framework. To understand why, from equation (2.4) the error term is given by
Uit0 + Dit (Uit1 − Uit0 ).
Even if the instrument Z is not correlated with Uit the same is not be true with respect
to Uit0 + Dit (Uit1 − Uit0 ) because Z determines Dit by assumption. For this reason,
conditions IV1 and IV2 are no longer enough to identify ATE or TTE. Then, some
additional requirements on data must be imposed in order to identify a treatment effect
in the heterogeneous framework. To be more clear, consider an example in which it is
choose as instrument the distance between a unit’s residence and the center in which
is carried out the program. The assumptions stated above tell us that this distance
28
CHAPTER 2. THE BINARY TREATMENT CASE
influences outcomes only through the participation indicator in the outcome equation.
The problem in a heterogeneous setting arises because it is expected that people who
live far away from the treatment location will participate in the program only when
their expected gain from the program participation is relative large with respect to the
lower expected gain of the people living closer, who incur less cost of participation.
As a result, the post-program gains for all the participants also depend on how far
away they live from the treatment location; thus the instrument is correlated with
the unobservable component of the outcome. That is, knowing the distance between
residences and program location tells us something about expected outcomes, which
means that such distance is not a suitable instrument.
In the simplest case an additional assumption might be:
IV3 When deciding about participation, individuals do not use the information on the
idiosyncratic component of the treatment effect αi (X) − α(X), where α(X) =
E[αi (X)]
If this assumption is satisfied, potential participants have no idea about future gains
related to program participation, and their decision is taken on the basis of the average
treatment effect. Then, together with the above assumptions IV1 and IV2, it might be
identified the average treatment effect (E[αi (X)]) because
E[Ui1 − Ui0 |X, Z, D] = E[Di [αi (Xi ) − α(X)]|X, Z] = 0.
On the other hand, selection on the unobservable is expected if units are aware of
their idiosyncratic gains from treatment in a sense that individuals that gain more from
participation are the most likely to participate within X-group. This selection process
generates correlation between αi (X) and Z.
Local Average Treatment Effect
A possible solution of the selection on unobservable problem described above is proposed by Imbens and Angrist (1994) who reinterpret the IV estimator as the effect of
treatment from local changes of the instrument Z. They called it as the Local Average
Treatment Effect (LATE). It depends on variation in an instrumental variable that is
external to the outcome equation. However, opposite to the IV approach discusses so
far, different instruments determine different parameters. To understand better what
2.5. INSTRUMENTAL VARIABLE ESTIMATOR
29
LATE parameter wants to estimate, consider the example discussed before; if the distance to the nearest treatment location is the instrument, LATE estimates the effect of
variation in this distance on the outcome variable of units who are induced to change
their participant status as a consequence of the different costs, due to distance that is
variable within a specified interval.
The basic idea behind this estimator is that some local changes in Z can reproduce
random assignment, because agents take different decisions as they face different conditions uncorrelated to potential outcomes. However another assumption is needed to
ensure that the two groups are comparable:
IV4 The participant status defined by the decision rule, conditional on X, is a nontrivial monotonic function of Z.
To define the LATE parameter more precisely, suppose D is an increasing function
of Z. In a case where Z changes from Z = z to Z = z 0 , where z 0 > z, units that
modify their participant status, as a consequence of the variation in Z, are those that
participate under Z = z 0 excluding the ones that prefer to participate under Z = z or,
equivalently, those that do not participate under Z = z excluding the ones that decide
to not participate under Z = z 0 . Then, it can be estimated the expected outcome for
treated and non-treated units for those influenced by the variation in Z as:
E[Yi1 |Xi , Di (z) = 0, Di (z 0 ) = 1] =
E[Yi1 |Xi , Di (z 0 ) = 1]P [Di = 1|Xi , z 0 ] − E[Yi1 |Xi , Di (z) = 1]P [Di = 1|Xi , z]
=
P [Di = 1|Xi , z 0 ] − P [Di = 1|Xi , z]
E[Yi0 |Xi , Di (z) = 0, Di (z 0 ) = 1] =
E[Yi0 |Xi , Di (z) = 0]P [Di = 0|Xi , z] − E[Yi0 |Xi , Di (z 0 ) = 0]P [Di = 0|Xi , z 0 ]
=
P [Di = 1|Xi , z 0 ] − P [Di = 1|Xi , z]
Then, the estimated local average treatment effect is,
αLAT E = E[Yi1 − Yi0 |Xi , Di (z) = 0, Di (z 0 ) = 1]
E[Yi |Xi , z 0 ] − E[Yi |Xi , z]
=
P [Di = 1|Xi , z 0 ] − P [Di = 1|Xi , z]
Since its dependence on the particular value of Z used, the LATE parameter does not
represent TTE or ATE and it is different from the IV estimator discussed before. LATE
30
CHAPTER 2. THE BINARY TREATMENT CASE
measures the impact of the treatment for units that move from Z = z to Z = z 0 . In
general, they do not represent the whole population, or the whole treated population.
Thus, LATE represents a treatment effect on a sub-group of the treated units who are
at the margin of participating for a given Z = z. In this sense, it can be viewed as the
discrete approximation of the Marginal Treatment Effect (MTE), defined as the limit
of the LATE estimator, when z 0 − z → 0, that is for an infinitesimal variation in Z.
The MTE represents the TTE for units that are indifferent between participating and
not participating at Z = z and can be written as,
¯
∂ E [Y |Xi , Z] ¯¯
α̂M T E (Xi , z) =
∂P [D = 1|Xi , Z] ¯Z=z
All the three parameters (ATE, TTE and LATE) can be viewed as averages of
MTE over different subset of the Z support: ATE is over the entire support of Z,
TTE excludes the subset of the Z support when the treatment does not occur and
LATE is over an interval of Z, where the size of the interval is define by two different
participation rates.
2.5.3
Instrumental variable approach in practice
An example, among the several empirical studies that have been carried out to estimate
a treatment effect with an IV approach in a program evaluation context, is the analysis of
Angrist and Krueger (1991). The principal question the study wants to answer is on how
compulsory school attendance affects schooling and earnings. Considering the education
as the treatment and the length of education as an endogenous participation decision,
the authors want to search for an instrumental variable that has to be correlated with
an individual’s participation decision and to have no independent effect on the outcome
of interest, that is the weekly wage.
Considering the fact that most of the US school districts do not admit students
to first grade unless they will attain the age of 6 by January 1st of the academic year
in which they enter school, students born early in the year are older when they start
school than students born later in the year. As a consequence, under any compulsory
school leaving age, they can leave earlier and get less education. Interaction of school
entry requirements and compulsory schooling laws compel students born in certain
months to attend school longer than students born in other months. For that reason
the authors proposed as instrumental variable the season of birth, which generates
2.5. INSTRUMENTAL VARIABLE ESTIMATOR
31
exogenous variation in education that can be used to estimate different effects on some
outcomes of interest, such as the impact of compulsory schooling on education and the
effect of education on earnings.
The first step of the analysis is to recognize an evidence of the impact of the quarter
birth on the level of education. The second step regards the estimation of the rate
of return to education. Authors propose different models: the first one finds OLS
and Wald estimates of the return to education, using the instrument variable as a
dummy, equal to one for units born in the first quarter of the year. Another economic
model is specified excluding the instrument from the wage equation and adding birth
years dummy variables. The two models provide very similar estimates and the final
interpretation of the results is that compulsory schooling laws are effective in compelling
some students to attend school. Furthermore, those compelled to remain in school earn
higher wages as a result of their extra schooling. Some final considerations are made
about the problem that arise when instruments are only weakly correlated with the
endogenous explanatory variables. This can lead to a large inconsistency in the IV
estimates. That is why it is very important to examine the power of instruments in the
first-stage regression. Bound et al. (1995) present evidence that weak instruments are
a potential concern in this work of Angrist and Krueger (1991).
Another application of the IV approach can be found in Levitt (1997). The focus
here is on the effects of police on city crime rates. Noting that increases in the size
of the police force in large cities are disproportionately concentrated in election years,
the proposed instrument is the timing of mayoral and gubernatorial elections. The
identifying assumption is that timing of elections affects the size of the police force but
does not directly affect crime rates.
To estimate how much profits of a firm’s are captured by workers Van Reenen
(1994) starts considering that, under imperfect competition in labor market, wages
depend on company’s profits as well as individual profitability (e.g. oligopoly and union
bargaining). However, a regression of firm average wages on firms average profitability
will be biased downwards because a wage shock will lead to lower profits. The solution
he proposes is to use observed technological innovations as an instrument for profits.
The underlying assumption is that this instrument should increase profits, but no direct
affect wages.
32
CHAPTER 2. THE BINARY TREATMENT CASE
2.6
Heckman Selection model
To understand how this method works let consider the model
Yit = Xit β0 + β1 Di + Uit
with t > k
Yit = Xit β0 + Uit
with t ≤ k
and the index model of the individual’s non-random participation decision,
Ii = Zi γ + Vi
Dit = 1 if Ii > 0 and t > k
Dit = 0 otherwise
The main objective of this approach is to remove the sample selection problem that
remains even after controlling for other determinants in the outcome equation; when
E[U, V ] 6= 0, estimating the outcome equation using OLS yields biased and inconsistent
estimates of the treatment effect. The baseline idea of the Heckman Selection model,
also called “Heckit” (Heckman (1979)), is to directly control for that part of the error
term (U ) in the outcome equation that is correlated with the participation dummy
variable (D). Then, it is necessary to impose additional structure on the model:
• Z is exogenous in the outcome equation, E(U |X, Z) = 0
• X is a strict subset of Z. This implies there is at least one additional regressor in
the participation decision
• the jointly density of the distribution of the errors Uit and Vi , h(Uit , Vi ), is known
or it can be consistently estimated.
As a consequence of the additional assumptions, this method is more robust than the
IV estimator. As above the homogeneous treatment effect is considered first.
2.6.1
Homogeneous Treatment Effect
To estimate the treatment effect the procedure uses two steps: first, it is estimated the
part of the error term Uit which is correlated with the dummy variable Di . Then it is
included in the outcome equation and in the second step the effect of the program is
estimated. By construction, what remains of the error term in the last equation of the
outcome is not correlated with the participation decision.
2.6. HECKMAN SELECTION MODEL
33
The last assumption about the knowledge of the error terms joint distribution is
necessary to obtain the estimation in the first step. To be more clear take the example
in where Uit and Vi are assumed to follow a joint normal distribution and, for simplicity,
consider the standardization σV = 1. Taking the expectations in the outcome equation
and using the properties of the normal distribution, the conditional outcome expectation
can be written as
E[Yit |D = 1] = Xit β0 + β1 + ρ
E[Yit |D = 0] = Xit β0 − ρ(U,V )
φ(Zi γ)
Φ(Zi γ)
φ(Zi γ)
Φ(Zi γ)
where the final term on the right-hand of side of each equation corresponds to the
expected value of the error term (U ), conditional on the participation variable (D). This
new regressor is the part of the error term which is correlated with the decision process.
Including it in the outcome equation controls for non-random selection, enabling us to
identify the treatment effect, separating the true impact of treatment from the selection
process. The term φ(Zi γ)\Φ(Zi γ) is the inverse Mills ratio, that is the quotient between
the standard normal pdf and standard normal cdf evaluated at Zi γ. To obtain an
estimation of β1 , the Heckman selection estimator for the selection model, the first step
is to use observations for treated and non-treated units to estimate a probit model of
participation, regressing I on Z. In the second step observations are used to estimate
the outcome equation using OLS; in that way the estimated coefficients are consistent
and approximately normally distributed.
One important advantage of this procedure in the homogeneous framework is that
it is robust to choice-based sampling. This is the situation when the non-randomness
is obtained drawing the comparison group of the non-treated from the population (see
Heckman and Robb (1985)). Even if, as usual, the sample portion of the treated is not
equal to the population one, and so, the treatment group is likely to be over-represented
in the sample, robustness is achieved always by controlling the part of the error term
Uit which is correlated with the participation variable Di , since the remaining error
term is orthogonal to Di .
34
2.6.2
CHAPTER 2. THE BINARY TREATMENT CASE
Heterogeneous Treatment Effect
When a heterogeneous framework is imposed, the outcome equation, abstracting from
other regressor, takes the form
Yit = β0 + β1i Di + Uit
when t > k
when β1i is the treatment impact of the i-th individual. Let β¯1 the population mean
impact, εi the individual’s deviation from the population mean and β1T the mean impact
of treatment on the treated. Thus
β1i = β¯1 + εi
β1T
= β¯1 + E[εi |Di = 1]
where E[εi |Di = 1] stands for the mean deviation of the impact among participants.
The outcome regression can now be rewritten as
Yit = β0 + β1t Di + {Uit + Di (εi − E[εi |Di = 1])} = β0 + β1t Di + ξit
Now, the two steps procedure necessary to estimate the treatment effect requires knowledge of the joint density of Uit , Vi and εi . As before, take the example of a joint normal
distribution and, for simplicity, consider the standardization σV = 1. Thus,
E[ξit |Di = 1] = Corr(Uit + εi Vi ) V ar (Uit + εi )1\2
E[ξit |Di = 0] = Corr(Uit , Vi ) V ar (Uit )1\2
φ(Zi γ)
φ(Zi γ)
= ρ(U,V,ε)
Φ(Zi γ)
Φ(Zi γ)
−φ(Zi γ)
−φ(Zi γ)
= ρ(U,V )
1 − Φ(Zi γ)
1 − Φ(Zi γ)
Then,the outcome equation that provide a consistent estimation of the treatment effect
becomes:
·
Yit = β0 + Di β1T
¸
φ(Zi γ̂)
−φ(Zi γ̂)
+ ρ(U,V,ε)
+ (1 − Di )ρ(U,V )
+ λit
Φ(Zi γ̂)
1 − Φ(Zi γ̂)
2.7. DIFFERENCE-IN-DIFFERENCE ESTIMATORS
2.6.3
35
Selection models in practice
An example of an empirical study that applies the Heckman selection model for an
estimation of a treatment effects, is the work of Willis and Rosen (1980); they want to
measure the effect of the college education on earnings. Taking the college education
as the treatment, the aim of this study is to find the relationship between individual
characteristics and earnings, after controlling for selection bias. A second question they
want to answer is if alternative earning prospects, as opposed to family background and
financial constraints, influence the decision to attend college.
The starting point of the work is the specification of an econometric model of the
college education decision constituted of two equations: one for the outcome (earnings)
and the other one for the choosing level of schooling. After specifying a model for the
expected values of earnings stream they define a college selection equation. The estimation procedure starts from this last equation; after that, the inverse Mills ratio are
evaluated and included when structural earnings and growth equations are estimated.
Finally the predicted values generated from the structural earnings and growth equations are included in the college selection equation to estimate the structural relationship
between earnings and schooling.
2.7
Difference-in-Difference Estimators
A widely used approach to the evaluation problem when longitudinal or repeated crosssection data on nonparticipants in different periods are available, is the difference in
difference approach, also called diff-in-diff. The main idea here is to use the additional
time dimension to refine the counterfactual estimate and, thus, to reduce the selection
bias in treatment effect estimation. This method is also called the natural experiment
approach: the idea is to look for a natural occurring comparison group in order to mimic
the properties of the control group in the properly designed experimental.
To understand how this method works, suppose information is available for a preand post-program periods, denoted respectively by t0 and t1 , (t0 < k < t1 ). In this
case, the mean effect of the program on the treated units can be easily defined as:
1
0
0
0
α = E[Yit1
− Yit0
|Di = 1] − E[Yit1
− Yit0
|Di = 1]
where the first expected value is the one of the pre-post intervention growth of the
outcome registered in the treated units, while the second expected values refers to the
36
CHAPTER 2. THE BINARY TREATMENT CASE
counterfactual before-after program growth always for the participants. The effect α
cannot be directly estimated because the counterfactual growth is not observable. What
can be calculated is instead the effect obtainable as:
0
0
0
0
α∗ = E[Yit1
− Yit0
|Di = 0] − E[Yit1
− Yit0
|Di = 0]
which yields unbiased estimates only if the expected value of pre-post program growth
of Y , for the nonparticipants, corresponds to the counterfactual growth of treated units.
This is the main assumption underlying the diff-in-diff estimator. To be more precise,
0 − Y 0 |D = 1) = E(Y 0 − Y 0 |D = 0)
• E(Yit1
i
i
it0
it1
it0
It means that participants and nonparticipants have the same mean change in the noprogram outcome measures.
Another way to represent this approach is to express it in a regression term, rewriting
the model (2.4) as follows,
Yit = ft0 (Xi ) + αit (Xi )Dit + (φi + θt + εit )
(2.6)
where the error term Uit0 , is being decomposed on an individual specific fixed effect, φi ,
a common macro economic effect, θt , and a temporary individual specific effect εit . The
assumption above can be rewritten as,
E[Uit0 |Xi , Di ] = E[φi |Xi , Di ] + θt
that is, selection into treatment is independent of the temporary individual specific
effect εit . Then, it is easy to define the diff-in-diff estimator that measures the excess
outcome growth for the treated compared to the non-treated. It can be written as,
αDID (X) = [Ȳt11 (X) − Ȳt01 (X)] − [Ȳt10 (X) − Ȳt00 (X)]
where Ȳ stands for the mean outcome among the specific group considered.
In the case of heterogeneous treatment is easy to demonstrate that the diff-in-diff
estimator recovers the TTE since
E[α̂DID (X)] = E[αi (X)|Di = 1] = αT T E (X).
2.7. DIFFERENCE-IN-DIFFERENCE ESTIMATORS
37
That is, the effect of treatment on the treated is identifiable, but not the population
impact.
In the homogeneous effect case, one may omit the covariates from the equation of
the diff-in-diff estimator and average over all the treated and non-treated units.
The principal drawbacks and weaknesses of the natural experiment approach arise
from assumptions the method relies on. The first one is that there are no systematic
composition changes within each group. This is related to the fact that there is no
control for unobserved temporary individual specific components, εit , that influence the
participation decision. The second assumption is that there are common time effects
across groups. However, it is not so unlike that some macro effects have differential
impact across the two groups. Some characteristics that distinguish between the treatment and the comparison group could not be equal and made the groups react different
to common macro shocks. That is why an extended version of the diff-in-diff estimator
has been studied, which allows to consider different trends between the two groups.
2.7.1
Trend Adjusted Diff-in-Diff
The basic principle upon which this approach bases on is that the availability of a series
of observations Yi,t−k−1 , Yi,t−k−2 . . . registered at times prior to the program intervention, allows one to control for pre-intervention differences between the treated units and
the excluded ones. In particular, if it is available the information at a time t − k − 1,
the size of the selection bias can be reduced; this third temporal observation allows one
to be more precise on the estimate of the counterfactual growth and on the difference
between the pre-intervention growth rate recorded for the treated units and the pre
intervention growth rate registered for the excluded units. This difference is then used
to correct the estimation of the counterfactual part that would be obtained with only
two temporal observations.
To be more precise and to express the problem in a regression term, suppose that
there are heterogeneous time effects between the treatment and control groups. Thus,
the error term can be rewritten as,
Uit = φi + ω D θt + εit
when ω D specifies the differential macro effect across the two groups. Then, the new
assumption takes this form:
• E[Uit0 |Di ] = E[φi |Di ] + ω D θt
38
CHAPTER 2. THE BINARY TREATMENT CASE
Now, the diff-in-diff estimator identifies
E[α̂DID (X)] = αT T E (X) + (ω 1 − ω 0 )(θt1 − θt0 )
It is easy to understand that this estimator represents the true TTE only when ω 0 −ω 1 =
0. To obtain a consistent estimate of the treatment impact, allowing for heterogeneous
time effect, Bell et al. (1999) proposed the trend-adjusted diff-in-diff estimator, which
takes the form,
α̂T ADID = [(Ȳt1T − Ȳt0T ) − (Ȳt1C − Ȳt0C )] − [(ȲtT00 − ȲtT0 ) − (ȲtC00 − ȲtC0 )]
where T and C refer to the treatment and control group and (t0 , t00 ) is another time
interval, with t0 < t00 < k over which a similar macro trend has occurred. To be
more precise, the authors specify that they require a period for which the macro trend
matches the term (ω 1 − ω 0 )(θt1 − θt0 ). It is plausibile that the most recent cycle is the
most appropriate, with the minimum number of different effects across the target and
comparison group.
Finally it is important to note that the assumption behind this approach is significantly weaker than the one needed for the diff-in-diff estimator without trendadjustment; it is not required that the treated and the non-treated units have equal
expected growth, over the pre-post program period, in the counterfactual case in which
the treated individuals would not have been treated. The trend-adjusted estimator imposes that these two expectations have to be equal only after being corrected by the
pre-intervention difference in the growth trends.
In a similar way, this procedure can be extended to a larger number of temporal observations improving the correction of the growth trends used to estimate the
counterfactual part. As a consequence, the estimator that can be implemented require
increasingly weaker conditions to yield unbiased estimates.
2.7.2
Difference in difference in practice
An application of the diff-in-diff estimator can be found in the work of Duflo (2001),
which refers to a program of school construction launched in 1973 from the Indonesian
government. To incentivate schooling and education and to increase the average number
of schools, more than 61,000 primary schools were constructed.
As regards the treatment variable, exposure of an individual to the program was
2.8. MATCHING ESTIMATORS
39
determined by the number of schools built in his region of birth and his age when the
program was launched. The question the work wants to answer is about the effect of
the Indonesian school construction program on education and earnings. The diff-in-diff
methodologies applied here refer to the differences between the region and the cohort
of birth.
The starting point of the study is the specification of a model for the return to
education on earnings. The main assumptions the model relies on regards some considerations about ages and regions of birth of the children. In particular, since Indonesian
children normally attend primary school during age 7-12, all children born in 1962 or
earlier were 12 or older in 1974, when the first schools were constructed, and hence were
not affected by the program. On the other hand, for younger children, exposure to the
program is an increasing function of their date of birth.
The diff-in-diff approach is applied to find the causal effect of the program, distinguishing between children with a little or no exposure to the program and children
exposed to the entire time in primary school.
Another application of the diff-in-diff approach can be found in Eissa and Liebman
(1995) who examine the impact of U.S. Earned Income Tax Credit for single women
with children. The aim is to evaluate how the EITC reforms affected single women
parents’ labor supply. The authors suggest to use as control group the population of
single women without children, because this group was not eligible for the EITC. The
final results are that labor force participation rises for the treatment group after the
reform, but there are no significant response in annual hours and annual weeks from
reform.
2.8
Matching Estimators
The matching estimator tries to solve the problem of identifying the treatment effect
on outcomes, in a non parametrical way. Like the diff-in-diff method, it is a general
approach; that is, it does not require any particular specification of the outcome equation or of the participation decision. Furthermore, it does not require the additive
specification of the error term and any exclusion restrictions. The fact that it is not a
parametric method makes it quite flexible in the sense that it can be combined with
other methods to obtain more precise estimates. On the other hand, it assumes that
analysts have access to a set of conditioning variables, X; that is, generous and good
quality meaningful data is needed.
40
CHAPTER 2. THE BINARY TREATMENT CASE
The main idea of this method is to replicate the condition of an experiment in a case
where the data are non-experimental and a no randomized group is available. This is
possible by constructing a correct sample counterpart for the no available information
on the outcomes had they not been treated. Each participant is paired with one or
more members of the non-treated group on the basis of the X values. Then, the main
purpose is to match treatment and comparison units that are similar in terms of their
observable characteristics. Then, it is possible to directly compare treated and nontreated units to obtain an estimation of the impact of the intervention, because the
only remaining difference between the two groups is the program participation, like in
the total random assignment case. Thus, the assumption necessary for the matching
estimators regards the existence of a set of variables such that, conditioning on them,
the counterfactual outcome distribution of the participants is the same as the observed
outcome distribution of the non-participant. To be more formal the matching method
is based on the following assumption:
M1:
Y 0 , Y 1 ⊥ D | X.
It means that the non-treated and treated outcomes are independent of the participation
status, conditioning on the set of variables X. As a consequence, the distribution of
outcomes F (Y 0 |D = 1, X) = F (Y 0 |D = 0, X) = F (Y 0 |X) and F (Y 1 |D = 1, X) =
F (Y 1 |D = 0, X) = F (Y 1 |X). In that sense, the available regressors “adjust” the
differences between treated and non-treated units.
This assumption was first presented in this form in Rosenbaum and Rubin (1983)
who called it as “ignorable treatment assignment” or “unconfoundedness”. Lechner
(1999) refers to this as the “conditional independence assumption”, while in the work
by Barnow et al. (1980) it is referred to as “selection on observables” in a regression
setting; this is why, given X, the non-treated outcomes are what the treated outcome
would have been had they not been treated, that is selection occurs only on observable.
To see the link with standard exogeneity assumptions, consider the case of constant
treatment effect: α = (Yit1 − Yit0 ) for all i. Furthermore, suppose that the outcome of
the control group is linear in Xi :
Yit0 = β0 + Xi0 β1 + εi t with εit ⊥ Xi
2.8. MATCHING ESTIMATORS
41
Then we can write
Yit = β0 + αDit + Xi0 β1 + εit
Given the constant treatment effect, unconfoundedness assumption is equivalent to
independence of Dit and εit conditional on Xi . That is, Dit is exogenous. Note that,
without the constant treatment assumption, unconfoundedness does not imply a linear
relation with independent errors, in mean.
Given the assumption above, it is possible to find for each treated observation Y 1
a non treated (set of) observation(s) Y 0 with the same X realization. That is why
the method is called matching: it tries to match each observation with another, or
more, which is similar. If the assumption holds, the outcome of the control group is the
required counterfactual. In that sense this approach tries to re-building an experimental
data set.
To ensure that this assumption has empirical content, it is also necessary to assume
that there are participants and non-participants for each X for which is possible to
make a comparison. Then, it is necessary to impose another assumption to guarantee
the existence of the required counterfactual. To be more precise,
M2:
0 < P (D = 1|X) < 1.
It means that all treated units have a counterpart on the population of the non-treated
and anyone is a possible participant. In a finite sample, of any size, we replace this
condition by the empirical probability. It means that, this last assumption does not
ensure that it holds in any sample. It is a strong assumption, especially where programs
are direct to a specific groups. Rosenbaum and Rubin (1983) called it as “strong ignorability”. It has important practical consequences for program evaluation in the sense
that failure to satisfy it appears to be one of the most important reason why matching
methods produce biased estimates of the impact of a program. For some comments on
the plausibility of these assumption see Imbens (2003).
Under the two assumptions above, it is possible to create a comparison group that
replaces an experimental control group in one key respect: conditional on X the distribution of the counterfactual outcome Y 0 for the participants is the same as the observed
distribution of Y 0 for the comparison group.
In the literature there has been some controversy about the plausibility of these
two assumptions in economic settings. The main debate regards the possibility of
dependence between the potential outcome and the choices taken by the agents in
42
CHAPTER 2. THE BINARY TREATMENT CASE
order to optimize their behavior, even if conditional on covariates. To a more detailed
description of these remarks see Imbens (2003).
However, it could be important to note that the assumptions of the matching approach do not imply no selection bias; indeed, as long as the means exist, they imply
that
E[Y 0 |X, D = 1] = E[Y 0 |X, D = 0]
and that
E[Y 1 |X, D = 1] = E[Y 1 |X, D = 0]
This does not imply E[U 0 |X, D = 1] = 0, i.e. no selection bias. Instead, matching
balances the bias, like experiments:
E[U 0 |X, D = 1] = E[U 0 |X, D = 0] = E[U 0 |X].
To compute the method of matching outcomes in the treatment group, denoted
with Y T , are matched with the outcome of a sub-sample of persons in the comparison
group, Y C , to estimate a treatment effect. Given the common support, that is the set of
all possible values the vector of explanatory variables X may assume, individual gains
from the program among the subset of participants who are sampled and for whom
one can find a comparable non-participant, must be integrated over the distribution of
observables among treated units and re-scaled by the measure of the common support,
called S ∗ . Thus, the matching estimator for the ATT is the empirical counterpart of
R
S∗
E[Y T − Y C |X, D = 1] dF (X|D = 1)
R
.
S ∗ dF (X|D = 1)
This result represents the expected value of the program impact: it is the simple mean
difference in outcomes over the common support S ∗ , weighted by the distribution of
participants.
It is worth to note that to identify the TTE, the independence assumption might
refer only to the non-treated units. Thus, it can be written Y 0 ⊥ D | X. On the other
hand, the assumption Y 1 , Y 0 ⊥ D | X is necessary for the identification of the ATE
parameter.
In practice, to construct matches, it is needed a measure of distance between the
units with respect to the X, in order to define the units in the comparison sample who
are neighbors to each treated units i. Heckman et al. (1997b) present different kind
2.8. MATCHING ESTIMATORS
43
of alternative matching schemes proposed in literature. Here, only the most common
methods are introduced.
One simple algorithm to identify the most similar comparison units to be matched
to the treated units is the nearest neighbor matching, developed by Rubin (1973). In
this procedure for each unit i is selected only one “most similar” unit j, chosen among
the group of the non-participant units. To select the comparison unit some distance
metric must be minimized,
min ||Xi − Xj ||
j∈Nc
where Nc is the size of the subsample of the comparison units and || · || is a metric
measuring the distance in the X characteristics space. The most widely used one is the
Mahalonobis distance, where the metric used to define the neighborhoods for i is
|| || = (Xi − Xj )0 Σ−1
c (Xi − Xj )
where
P−1
c
is the covariance matrix in the comparison sample. As a result, the two
groups, the comparison one and the group of the treated units, have the same size.
Depending on the common support between the groups, two different versions of the
nearest neighbor matching can be considered: the nearest available matching and the
matching with replacement. The main difference is that the latter allows to many treated
units to be matched with the same excluded units. In this way, each participant will be
matched even when just few excluded units are comparable to the treated individuals,
because of the small common support.
Another possible procedure is the radius matching, which admits for each treated
unit to be matched with more than one excluded unit (see Dehejia and Wahba (1998a),
also for a comparison between the different algorithms). In this procedure the matches
are made only if
||Xi − Xj || < δ
where δ is a tolerance level chosen by the evaluator. Otherwise unit i is bypassed and
no match is made to this individual. Similarly to the matching with replacement this
method allows a given excluded unit to be matched more than one time.
If one wants to use the entire comparison sample, a possible solution is the kernel
matching, which is a smooth method that reuses and weights the comparison group
sample observation differently for each treated unit i with a different Xi . Let W (i, j)
the weight placed on observation j in forming a comparison with observation i, with
44
CHAPTER 2. THE BINARY TREATMENT CASE
this algorithm the weights are
K(Xj − Xi )
W (i, j) = PNc
j=1 K(Xj − Xi )
where K is a kernel function.
In general, independently of the algorithm used, the form of the matching estimator
is determined by computing the mean difference across the i. Thus,
α̂M


NT 
NC

X
X
=
Yi −
W (i, j)Yj wi


i=1
(2.7)
j=1
where W (i, j) is the weight for individual i with respect to the comparison observations
j, and wi takes into account the re-weighting that reconstructs the outcome distribution
for the treated sample. To be more clear, where the nearest neighbor algorithm is used,
the estimator is given by
αˆM =
NT
X
1
(Yi − Yj )
NT
i=1
where j is the nearest neighbor non-treated unit to i. More efficient estimators use also
the variance to construct the weight of the observations (see Heckman et al. (1997b)
and Heckman et al. (1998b)).
The main idea of the matching approach is to ensure the suitable set of observable
characteristics X is being used to obtain the correct counterfactual. As in the specification of conventional econometric model, there is the same uncertainty about which X
to use. Heckman et al. (1997b) discuss some tests in order to choice the appropriate X
regressors. Furthermore, in practice, it is really hard to find a similar control unit if a
very detailed information is available, because of the more restricted common support.
It is evident that there is a trade-off between the share of the common space and the
information to use. If however the correct amount of data is used, the only reason the
treatment effect, conditional on X, might not be identified, is selection on unobservables. It is important to note, however, that some problems can arise where there is a
situation of non-overlapping support of X and an incorrect weighting over the common
support. To be more precise, the bias term, due to the difference between treated and
2.8. MATCHING ESTIMATORS
45
non-treated, can be decomposed in three different components:
E[Y C |X, D = 1] − E[Y C |X, D = 0] = B 1 + B 2 + B 3 ,
(2.8)
where B 1 is the bias due to the non-overlapping support of X and B 2 represents the error part related to the misweighting on the common support of X. These two sources of
bias might be corrected through the matching process of choosing and reweighting observations. The third term B 3 is the econometric selection bias resulting from selection
on unobservables, that is assumed to be zero.
Another way to use the matching methods it is the so called regression-adjusted
matching proposed by Rubin (1979) and deepened in Heckman et al. (1997b) and Heckman et al. (1998b): the main idea is to compute a regression of the outcome on the X
regressor and to use the regression-adjusted Yi , computed as R(Yi ) = Yi − Xi β in place
of Yi in the above calculations.
A further development of the matching method is to use it in a parametric approach;
in general it does not require functional form assumptions for the outcome equation,
but, if a functional form is maintained it is possible to implement the matching method
using regression analysis (see Barnow et al. (1980)). To obtain an estimation of the
effect of the treatment, it is estimated first the relationship between the outcome and
the observable X for the treatment and control groups. After that, the predicted
outcomes are used to do a comparison between the two groups in order to obtain an
estimation of the impact. One advantage of this method is that it does not require the
common support condition for the distribution of X, that might be very different in the
treated and comparison group. The comparability is achieved imposing the functional
form.
2.8.1
The Propensity score
The use of the matching methods, like all the non-parametric methods, is seriously
limited if the dimensionality of the vector X is high. In practice, it could be difficult
to find some control units with similar values of X, if X is composed by a large vector
of variables. A solution can be to match on a function of the X. The most common
and useful is to compute the so called propensity score, p(x), which is probability of
participation,
p(x) = P (Di = 1|Xi ).
46
CHAPTER 2. THE BINARY TREATMENT CASE
It can be interpreted as the probability that a unit i is selected for treatment, given its
values of X. Thus, it summarizes in a single parameter the impact of all the observable pre-intervention units’ characteristics that differentiate the treated units from the
excluded ones.
The propensity score were proposed by Rosenbaum and Rubin (1983) who demonstrate that the conditional independence assumption remains valid controlling for p(x)
instead of X:
Y 0 , Y 1 ⊥ D | p (x).
Conditioning on p(x) reduces the dimension of matching problem down to the matching
on the scalar p(x).
As in the standard matching, when using the propensity score, the comparison group
for each treated units is chosen with a pre-defined measure of distance; once defined
the neighborhood for each units, the second choice regards the appropriate weights
to associate each selected individual of the control group with the treated one. The
several proposed solutions are the same presented above: from weight equal to one to
the nearest observation and zero for the others, to equal weights to all, otherwise, kernel
weights.
One of the most important consideration that has to be done on the propensity
score is about its estimation; an application of the matching method with the use of
the propensity score needs a demonstration that a suitable model for p(x) has been
selected. Rosenbaum and Rubin assumes that p(x) is known rather than estimated.
For some comparison between the two different situations see Heckman et al. (1998b):
they present the asymptotic distribution theory for the kernel matching estimator both
in the case where the propensity score is known and the case in which it is estimated,
parametrically and nonparametrically. On the other hand a study of Hahn (1998) shows
that p(x) is ancillary for the estimation of ATE, but its knowledge may improve the
efficiency of the TTE estimation by reducing the “dimensional” problem.
The propensity score can also be used as a control variable in an outcome regression,
i.e. conditioning on a propensity score method or also to stratify the data sample based
on similar pre-intervention characteristics, that is data stratification on a propensity
score. In the first case, the predicted probability p̂(x) is added to the outcome equation; as argued in Rosenbaum and Rubin (1983) this is a convenient way to deal also
with non-linearities in the relationship between outcome variables and pre-intervention
characteristics of units. The data stratification on p(x) method can be adopted to
evaluate program, separating data into strata based on the units’ propensity score and
2.8. MATCHING ESTIMATORS
47
to estimate the mean differences between treated and non-treated individuals of each
stratum (see details in Dehejia and Wahba (1998a)).
Finally in the weighting approach the propensity score is used as a weight to create
balanced samples of treated and control observations; units are weighted by the inverse
of the probability of receiving treatment and an estimation of the effect of the program
is simply
¶
N µ
1 X Di Yi (1 − Di )Yi
−
.
N
p(x)
1 − p(x)
i=1
For more details together with some combinations of these methods see Imbens (2003).
2.8.2
Matching Diff-in-Diff Approach
The assumption behind the matching approach is a really strong one if individuals can
decide according to their forecast outcomes. To overcome this drawback the matching is
combined with the diff-in-diff approach, in order to control for an unobservable determinant of participation (see Blundell et al. (2001)). However, it is worth to note that what
follows is valid as long as this unobservable component can be represented by separable
individual and/or time-specific components of the error term. To be more precise let
consider the model (2.6) specified above in a matching framework. The independence
assumption, conditional on the set X, takes the form
(εt1 − εt0 ) ⊥ D | X
where t0 and t1 stands for the before-after program period, (t0 < k < t1 ). The idea
behind this method is that only the individual-specific changes require additional control, since the diff-in-diff controls for the other determinants. This assumption implies
that control units have changed their outcomes from a pre- to a post-program time in
the same way treatment units would have done had they not be treated. Furthermore,
this is true both for the observable component and the unobservable time trend. The
estimator of the treatment effect (2.7) can now be extended and rewritten as
α̂M DID



X
X
[Yit1 − Yit0 ] −
Wij [Yjt1 − Yjt0 ] wi .
=


i∈T
j∈C
48
CHAPTER 2. THE BINARY TREATMENT CASE
It is obvious that longitudinal data are required. However, in the case where this type
of data are not available, the matching diff-in-diff can be extended for the repeated
cross-section data case. One needs to implement matching three times for each treated
units after being treated: once to find the comparable treated units before the program,
and the second and third time to find the control before and after the intervention. If
the same assumptions hold, the estimation of the treatment effect can now be computed
by
0
α̂M
DID


 

X
X
X
X
C
C
T



Yit1 −
w
−
[Y
−
W
Y
Y
W
=
Wijt
ijt0 jt0
ijt1 jt1
0 jt0
 i

i∈T1
j∈T0
j∈C1
j∈C1
where T0 , T1 , C0 and C1 stand for the treatment and comparison groups pre- and postintervention and Wijt is the weight attributed to unit j in the respective group, treated
or control, and time t when comparing with the treated individual i.
2.8.3
Matching approach in practice
One of the most important empirical study dealing with the application of the matching
method is the work of Blundell et al. (2001). The aim of the study is to estimate the
effect of a mandatory job assistance program in the UK, the “New Deal for the Young
Unemployed” that helps young unemployment people make their way into or back to
work. The program is addressed to all young people aged 18- 24, whose were claiming
Job seekers allowance for 6 months and thus were unemployed at least in the previous 6
months. For 4 months they are intensively monitored and given job search assistance.
If they still have not any job then they can get employer wage subsidy or, alternatively
a period, of training or full-time education.
The program started in January 1998 only for a tree-month experimental period,
when it was carried out in 12 regions, until April, when the program was launched in
the whole UK. The scope was to perform an experiment with the first 12 regions to
obtain a counterfactual for the rest of the UK.
The analysis of the authors deals with the impact of this program on employment
in the first 18 months of the scheme. In particular, it measures the effect on the
probability of moving into a job during the 4 month assistance period of job search,
conditional on 6 months as unemployment. Since the program was targeted to a specific
age group, a natural comparison group would be similar individuals with the same
unemployment status but slightly too old to be eligible. Using the diff-in-diff estimator,
2.9. REGRESSION DISCONTINUITY ESTIMATORS
49
a before and after comparison can be made; to improve the estimates also a matching
diff-in-diff approach was implemented in the study. Thanks to the pilot area, the diffin-diff approach has two possible comparison groups: the areas, because New Deal
was launched in some pilota areas for 3 months, and the ages. It means that one can
compare 18-24 year old individuals with 6 months unemployment in pilot areas versus
18-24 year old people with 6 months unemployment in non-pilot areas. Otherwise the
comparison can be carried out with reference to the age: the 18-24 group in pilot areas
is comparable with the 25-30 year old group in pilot areas.
Another important study that reveals the power of the matching approach is the
analysis of Heckman et al. (1997b). It evaluates matching under different assumption on the richness of available information. Data collected from the Job Training
Partnership Act (JTPA) was used to examine the empirical performances of matching
methods by comparing the parameter estimates from randomization with those from
non-experimental matching methods. They consider a variety of non-experimental control groups, such as eligible non-participants individuals (resident in the same narrow
geographical region) and no shows, that were experimental persons assigned to treatment who enrolled in JTPA but dropped out before receiving services.
A more recent study by Smith and Todd (2005a) is based on the same data collected
from the JTPA, that are also used from LaLonde (1986) to asses the reliability of
the non-experimental methods by comparing the results with the ones obtained using
experimental data. The results reveal that matching may improve the results when
only cross-section data are available. Obviously, the choice of the variable to use for
the match plays a fundamental role. However, where longitudinal data are available,
the quality and the precision of the estimates improve independently of the method
chosen. The discussion on these themes is not yet ended: the study of LaLonde (1986)
has encouraging a famous discussion on the effectiveness of the matching estimators
between Smith-Todd and Dehejia (see Dehejia (2005) and the reply of Smith and Todd
(2005b)).
2.9
Regression Discontinuity Estimators
Regression discontinuity design (RDD) constitutes a special case of “selection on observables”. In fact, the essential element of this model, originally introduced by Campbell
and Stanley (1963), is that the probability of assignment to treatment depends in a
discontinuous way on some observable variable S. That is, participants are assigned to
50
CHAPTER 2. THE BINARY TREATMENT CASE
program solely on the basis of an established cutoff score on a pre-intervention measure.
To better understand, consider the case in which a set of units willing to participate
are divided into two groups, according to wether the pre-program measure is above
or below a specified threshold. Those who score below the threshold are excluded to
the intervention, while those who score above are exposed. Note that, in some sense,
the assignment rule to treatment is the opposite here to that in random assignment;
it is a deterministic function of some observable variables. But, it will turns out that,
assignment to treatment is as good as random in the neighborhood of the discontinuity.
The reason that distinguishes it from randomized experiments, and from other quasiexperimental strategies, is its unique method of assignment. This cutoff criterion implies
the major advantage of RDD: first, it is appropriate when we wish to target a program
to those who most need or deserve it. Second, it is certainly more attractive than
a non-experimental design in the sense that in a neighborhood of the threshold the
RDD presents some features of a pure experiment. Moreover, other features distinguish
this method from the standard selection on observable and reveal all the powerful of
this approach. First, there is a common support for participant and non-participants.
Thus RDD is an attractive procedure when there is selection on observables but the
overlapping support condition required for matching breaks down. Second, the selection
rule is deterministic and known by assumption.
On the other hand, the design has two main limitations: the first one regards the
fact that its feasibility is only confined to those cases in which selection takes place
on an observable pre-program measure. Secondly, even when it is feasible, it only
identifies the mean effect at the discontinuity point for selection. If we are in a case
with heterogeneous treatment effects, it tells us nothing about units away from the
threshold. In this sense, RDD is able to identify only a local mean impact (LATE).
To understand clearly how the method works, let consider before the similarities
between a randomized experiment and a RDD. As stated before, the most attractive
property of the randomized experiment is that the impact of the program is simply
the difference between the mean outcomes for treated and non-treated units. Although
the RDD lacks random assignment of individuals, it shares some important features
with this experimental approach. If S is the variable, or the set of variables, according
to which units are selected into the program, and s̄ is the threshold for selection, the
dummy variable D takes value 1 only if units’ score is above s̄. Then,
D = I(S ≥ s̄)
2.9. REGRESSION DISCONTINUITY ESTIMATORS
51
where I is the indicator function. Thus, the probability to be treated, conditional on S,
steps from 0 to 1 as S crosses the threshold s̄. This represents the so called sharp RDD,
introduced by Trochim (1984): the treatment D is know to depend in a deterministic
way on some pre-program observable continuous variable S, D = f (S), and the point
s̄, where the function f (S) is discontinuous, is assumed to be known.
An alternative case, that is more general of the sharp design, is the so called fuzzy
RDD, where D is a random variable given S, but the conditional probability of receiving
treatment, P r(D = 1|S) is known to be discontinuous at s̄. This design differs from
the previous case in that the treatment assignment is not a deterministic function of
S; there are some variables which are unobserved by the evaluator that determine the
assignment rule. For details see Hahn et al. (2001). One example of the fuzzy RDD
arises when units do not comply with the mandated status, dropping out of the program
or seeking alternative treatments.
The common feature of the two designs is that the probability of receiving treatment,
P r(D = 1|S), viewed as a function of S, is discontinuous at s̄. It constitutes the first
assumption of the method. To be precise:
RDD1:
i) the limits
D+ ≡
D− ≡
lim E[D|S = s] and
S→s̄+
lim E[D|S = s] exists
S→s̄−
ii) D+ 6= D− .
In both cases the main idea is that conditioning on S allows to identify the average
effect of the program in a neighborhood of the cutoff point s̄, that is a local version of
the mean impact of the intervention.
The mean treatment effect at the point s̄ is identified if it is satisfied the following
assumption:
RDD2:
E[Y 0 |s̄+ ] = E[Y 0 |s̄− ]
Then, the mean value of Y 0 conditional on S is a continuous function of S at s̄. This
condition for identification requires that in the counterfactual world, no discontinuity
52
CHAPTER 2. THE BINARY TREATMENT CASE
takes place at the threshold for selection. This condition allows to identify only the
average impact for subjects in a right-neighborhood of s̄, that is the effect of treatment
on the treated (ATT). The identification of the effect of treatment on the non-treated,
requires a similar continuity condition on the conditional mean E[Y 1 |S]. In practice, it
is difficult to think of cases where condition RDD2 is satisfied and the same condition
does not hold for Y 1 .
From the second assumption it follows:
(Y 0 , Y 1 ) ⊥ D|S = s̄.
Because of this property the RDD is referred to as a quasi-experimental design.
An estimate of the impact of the program can be obtained under different assumptions regarding the heterogeneity of the effects among units. If a common treatment
effect among individuals is supposed, together with the assumption that in the absence
of treatment persons close to the cutoff point are similar, the effect of the treatment
can be written as
αRDD =
Y+−Y−
D+ − D−
where Y + ≡ limS→s̄+ E[Y |S = s] and Y − ≡ limS→s̄− E[Y |S = s].
In the sharp design, D+ = 1 and D− = 0; hence the common treatment effect is
identified by
αsRDD = Y + − Y −
Then, in the simple case of a constant treatment effect the regression line at the cutoff
point represents the effect of the program. For example, under the hypothesis of linearity
in the relationship between the set of variables S and the output, α can be estimated
without bias by OLS estimation of:
Y = β0 + αD + β1 S + U
On the other hand, if it is considered the case of variable treatment effects among
units, other assumptions have to be imposed, to generalize the identifying strategy
followed above:
• E[αi |S = s] regarded as a function of S, is continuous at s̄
• D is independent of αi conditional on S near to s̄.
2.9. REGRESSION DISCONTINUITY ESTIMATORS
53
Then an expression for the mean treatment effect might be:
E[αi |S = s̄] =
Y+−Y−
.
D+ − D−
As before, with a sharp design it is identified by
E[αi |S = s̄] = Y + − Y −
It is worth to note that, the second assumption regarding the conditional independence
implies that units do not select into treatment on the basis of expected gains from
exposure. This assumption, as viewed before, is often invoked in the literature of program evaluation, but it may be considered unrealistic in a situation where individuals
self-select into treatment. Furthermore, in this case of heterogeneous treatment, identification was possible by comparison units close to the threshold s̄ who did and did not
receive treatment. This means that the effect can only be identified at S = s̄.
For both designs, to obtain a consistent estimation of the parameter of interest it
is sufficient replace the ratio with consistent estimators of ŷ + , ŷ − , x̂+ , x̂− . They may
be computed with different methods, adopting parametric or non-parametric approach.
For details see Hahn et al. (2001).
2.9.1
Regression discontinuity Design in practice
An example of an empirical study that applies the regression discontinuity design for
an estimation of treatment effects, is the work of van der Klaauw (2002): it evaluates
the effect of financial aid offers of colleges and university on student enrolment decision.
The work shows how discontinuities in an East Coast college’s aid assignment rule can
be exploited to obtain credible estimates of the aid effect without having to rely on
arbitrary exclusion restrictions and functional form assumptions.
Following this work, another important study analyzes the effects of tuition fees
on graduation time adopting a RDD approach: it is the working paper proposed by
Garibaldi et al. (2007). They base their empirical analysis on detailed administrative
data from Bocconi University in Milan, a private institution that, during the period for
which they have information (1992-2000), offered a 4-years college degree in economics.
This dataset is informative on the question under study not only because more than
80% of Bocconi graduates typically complete their degree in more than 4 years, but
also because it offers a unique quasi-experimental setting to analyze the effect of the
54
CHAPTER 2. THE BINARY TREATMENT CASE
tuition profile on the probability of completing a degree beyond the normal time. Upon
enrollment in each academic year, Bocconi students in the sample are assigned to one of
12 tuition levels on the basis of their income. A RDD is used to compare students who, in
terms of family income, are immediately above or below each discontinuity threshold.
These two groups of students pay different tuitions to enroll, but should otherwise
be identical in terms of observable and unobservable characteristics determining the
outcome of interest, which is the decision to complete the program in time.
Chapter 3
The continuous treatment case
3.1
Introduction
The objective of this chapter is to give an overview of the literature on program evaluation in a continuous treatment setting. It is not so rare to find in practice situations
were treatment regimes need not to be binary and units might be exposed to different
levels or doses of treatment. This could be true both in economic or medical applications. In these situations studying the impact of such treatment as if it were binary can
mask some important features on it.
Our intention is to build a basic statistical framework for our research, starting by
an evaluation of the current studies on this topic. After a brief introduction on the
continuous treatment setting, the focus will be on the analysis of the relevant literature
provided by different authors on this issue. The most important studies on program
evaluation with a continuous treatment are presented and analyzed in order to find the
common characteristics, the main advantages and drawbacks of all approaches that will
constitute the starting point of our analysis.
3.2
From binary to continuous treatment
Most of the relevant literature on the program evaluation deals with the estimation
of causal effects of a binary treatment on one or more outcomes of interest in a nonexperimental framework. In practice, however, treatment regimes need not to be binary
and individuals might be exposed to different levels or doses of treatment. Then, in these
55
56
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
situations, it can be meaningful to use the information of the treatment level to estimate
different kinds of treatment effects as a function of the doses. In other words, studying
the impact of such treatment as if it were binary can mask some important features on it.
Moreover other parameters might be of interest. When a binary treatment is evaluated
the main focus is on the estimation of an average treatment effect; in a continuous
setting many parameters might be important and meaningful. For example, it could be
interesting learning about the form of the entire function of average treatment effects
over all possible values of the treatment levels. In other words, one might be interested
in studying how the effects change when the level of the treatment changes. More,
another interesting parameter might be the “optimal” dose: optimal in the sense that
it is the treatment level that maximizes the average effects. In other cases one could
be interested in the derivative of the average effects or in knowing if there is a level at
which the curve of the effects has a discontinuity point or a “turning” point.
In the last ten years the interest on the generalization of the program evaluation
framework from a binary treatment setting to a more general structure for the treatments has increased rapidly. The most important reason is perhaps the fact that, in
more and more implementations of public policies or interventions, there are cases with
a more complicate structure than an easy situation where there are only treated and
non-treated units. The most relevant cases might be classified in two groups: the multiple and the continuous treatment programs. The first group includes all the cases in
which the policy consists of a variety of different programs. An example might be active labor market policies, which comprehend job-search assistance, training programs,
public employment interventions, wage subsidies etc. Another case, that belongs to the
multiple treatment setting, is when there are different discrete levels of treatment. An
example is the evaluation of the effects of the years of schooling on individual earnings.
The specified model might distinguish the impact of many different education levels,
thus allowing the attainment of different educational qualifications to have separate
effects on earnings. On the other hand, there are many applications of public policies
that includes a strictly continuous treatment. It is the case, for example, of a firm incentives programs, that consist of a series of subsidies given to firms in order to achieve
some employment or business growth goals. Or again, the case of a medical treatment
to some patients which consists of different doses of a medicinal.
In general, the non-binary models would seem a more attractive frameworks since a
wide range of treatment levels with potentially very different effects might be of interest.
However, even if cases of multiple or continuous treatment represent a generalization
of the binary treatment framework, they have some particular features that make them
3.3. AN OVERVIEW ON THE LITERATURE
57
very different. Here, the focus is on the continuous treatment setting: the idea is to
study the relation between the effects of a policy and the treatment levels, identifying
and estimating the possible parameters of interest.
The common approach followed from the evaluation literature when analyzing a
binary treatment is the potential outcome approach developed by Rubin (1974). It
might be easily extended to the continuous case: just consider a random sample of
units, indexed by i = 1 . . . N , and for each unit i the existence of a set of potential
outcomes, yi (T ), for T ∈ [0, t1 ] = T . To be more clear, let consider an observed
value of the treatment level t: the potential outcome yi (t) represents the outcome unit
i would received if exposed to treatment level T = t, where T takes values in the
interval [0, t1 ]. Each individual receives exactly one of these treatment levels; before
participation in the policy, each potential outcomes is latent and could be observed if
the individual received the respective program. Ex-post, i.e. after the policy, only the
outcome corresponding to the dose the individual received is observed, that is yi (ti ).
This extension of the potential outcome approach constitutes the basis of all the studies
on the program evaluation in a more general treatment regime framework.
The following sections will present the most important works that deal with this
topics. For each contribution, it will be briefly described the main ideas and the empirical applications, followed by some personal discussions on the main advantages, limits
and possible developments. This analysis of the literature will constitute the starting
point for the further chapter in which the real contribution on the topic of the program
evaluation with continuous treatment will be presented.
3.3
An overview on the literature
Despite the interest on generalization of the program evaluation framework from a
binary treatment to a more general setting for the treatments has been recognized it
is not yet a topic that has been deeply defined and studied. Although it is rapidly
increasing, it represents a particular branch of the program evaluation that should be
better analyzed and developed, because it could reveal some important results.
As stated above, the common approach of the few works that deal with this issue
follows the generalization of the potential outcome setting. Furthermore, there is another common characteristic that guides these analysis: it refers to the generalization of
the propensity score approach of the binary treatment case. In fact, in all the studies,
58
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
even if in different ways, the objective is to develop the propensity score in a continuous setting in order to remove any bias associated with differences in observable and
unobservable characteristic among units. Then, the few works presented in literature
can be classified on the basis of the use of this propensity score or function. After the
definition of this quantity, some studies use it in a parametric structure, while other
follow a non parametric approach, such as a matching estimator. This distinction will
guide and characterize this overview of the literature.
Another important classification might be done on the basis of the parameter of
interest. Some works focus on the potential outcome at each level of treatment, Y (T ),
while others have as objective the estimation of some treatment effects, as the difference
of some potential outcomes at different treatment levels. This distinction will not guide
our presentation. However, the parameters of interest will have an important role
in the next chapter where a new methodological approach will be proposed. For that
reason, the estimated quantities of each work will be well specified, in order to underline
differences and potential developments of each alternative analysis.
3.4
Generalized propensity score: parametric approach
One of the first study that deals with the continuous treatment is the work of Imbens
(1999): an extension of the propensity score methodology is proposed that allows for
estimation of average casual effects with multi-valued treatments. This work represents
the starting point for the next analysis (Hirano and Imbens (2004)) where the propensity
score method is extended in a setting with continuous treatment. The key assumption
is the generalization of the unconfoundedness hypothesis for binary treatment to the
multivalued case:
Y (t) ⊥ T | X
∀ t ∈ [t0 , t1 ].
(3.1)
Next they define the Generalized Propensity Score (GPS) as R = r(T, X), where
r(t, x) = fT |X (t|x)
is the conditional density of the treatment given the covariates. Together with the
balancing property of the GPS, i.e. X⊥1{T = t} | r(t, X), similar to that of the standard propensity score, the unconfoundedness assumption implies that assignment to
3.4. GENERALIZED PROPENSITY SCORE: PARAMETRIC APPROACH
59
treatment is unconfounded given the generalized propensity score. Thus for every t,
fT (t|r(t, X), Y (t)) = fT (t|r(t, X)).
They use this result in order to remove any biases associated with differences in the
covariates. The estimation of the parameter of interest µt = E[Y (t)] is obtained with a
two steps procedure:
(i)
(ii)
β(t, r) = E[Y (t)|r, (t, X) = r] = E[Y |T = t, R = r].
µt = E[Y (t)] = E[β(t, r(t, X))].
For a practical implementation of the proposed methodology the authors discuss
estimation and inference in a parametric version of this procedure. In this sense this
work might be classified as one following a parametric approach: the estimation and
inference problems are handled with a parametric function while the basic framework
is more general, and nothing prevent to implement it with more flexible approaches, as
stated by the authors.
In the first stage what they propose is to use a normal distribution for the treatment
given the covariates:
r(T, X) = Ti |Xi ∼ N(β0 + β10 Xi , σ 2 )
The estimated GPS is:
µ
¶
1
2
0
ˆ
ˆ
R̂i = √
exp − 2 (Ti − β0 − β1 Xi )
2σ̂
2πσ̂ 2
1
where β0 , β1 and σ 2 are estimated by maximum likelihood.
In the second stage they model the conditional expectation of Y given T and R
using a quadratic approximation:
E[Yi |Ti , Ri ] = α0 + α1 Ti + α2 Ti2 + α3 Ri + α4 Ri3 + α5 Ti Ri
These parameters are estimated by OLS using the estimated GPS R̂i . Finally the
estimate average potential outcomes at treatment level t are obtained from:
N
1 X
\
E[Y
(t)] =
(αˆ0 + αˆ1 t + αˆ2 t2 + αˆ3 r̂(t, Xi ) + αˆ4 r̂(t, Xi )2 + αˆ5 t r̂(t, Xi ))
N
i=1
60
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
To obtain an estimation of the entire dose-response function this expected mean can be
estimated for each level of treatment one is interested in.
The last part of the paper refers to an application of the proposed method. The data
set consists of individuals winning the Megabucks lottery in Massachusetts in the mid1980’s. The interest is on the effect of the amount of prize on subsequent labor earnings.
The estimated average effect of the prize are obtained by adjusting the differences in
background characteristics using the propensity score. To see wether this specification
is adequate, the authors investigate how it affects the balance of the covariates. They
discretize both the level of the treatment and the GPS. First they divide the range of
the price into intervals and compute the quintiles of the GPS evaluated at the median
of the price in each group. Then, for each covariates it is investigated the balance
by testing wether the mean of the observations of each price interval with the GPS,
evaluated at the median price, that belongs to each quintile is different from the mean
of the observations in the same interval of the GPS but in the other treatment groups
combined.
This work represents one of the most important theoretical contribution to the
program evaluation with continuous treatment: it explicitly considers the effects of
different levels on the outcome, trying to deal with the selection on observable issue by
removing the bias associated with differences in the covariates. However, there are some
considerations that might be noted. First of all, let consider the parameter of interest:
in this work the focus is on the curve of the potential outcome at each level of treatment.
In the traditional literature on the program evaluation the parameter of interest regards
the estimation of an effect of the treatment. The theoretical construct of the potential
outcomes is traditionally used in order to estimate an effect by comparing treated and
non-treated units. Instead, in this work, a properly estimation of this kind of effects
is not possible to obtain. Following the potential outcome approach of Rubin (1974),
you need the information of the non-treated units in order to obtain some value of their
potential outcomes. Instead, in this work any comparison between participants and
non-participants is done: this is easy to understand if one considers that only the data
of the treated units are used. Thus, what is possible to obtain following the approach
of Hirano and Imbens (2004) is an estimation of the effects between units at different
treatment levels, by comparing the values of the estimated potential outcome equation
for different levels, but not an estimation of the effects between treated and non-treated
units.
A direct consequence of this first consideration reveals another important matter:
the selection rule is not considered. The paper does not make any distinctions between
3.5. SOME NON-PARAMETRIC APPROACHES
61
positive treatment level and the treatment at the level zero, that is the non-treatment
status. Non-treated units are ignored, and no considerations are made on which are
the factors that might influence the treatment status versus the opposite situation of
non-treatment.
Finally, the last personal consideration concerns with the practical implementation
of the method where a parametric approach is applied. This might cause the classical
problem of mis-specification of a model, related to the parametric assumption. Thus,
as suggested by the authors, the proposed method might be applied following a non
parametric approach.
3.5
Some non-parametric approaches
In this section the focus is on the studies following a non parametric approach: distinct
paragraphs will regard the use of matching and subclassification estimators.
A non parametric approach for the evaluation of a public policy with a continuous
treatment is proposed by the working paper of Flores (2004). The main focus here is on
more parameters of interest than the common average treatment effects. In particular,
the author focuses on three objects:
• µt = E{Y (t)}
for all t ∈ τ
the entire curve of average potential outcomes, or dose-response function.
• α0 = arg max E{Y (t)}
the treatment dose at which the curve is maximized
• µ(α0 ) = E{Y (α0 )}
the maximum value achieved by the curve.
The author points out the advantages of a non-parametric approach: it overcomes
the problem of an arbitrary choice of a discretization of the treatment variable, as
suggested by Royer (2003), and the sensitivity of the results to model specifications.
In the first part of the paper the three estimators are presented under the assumption
of randomization of the treatment doses. After the author moves to the case of nonexperimental data. In this setting the key assumption is again the unconfoundedness
assumption. But here a stronger version, with respect to equation (3.1) proposed by
Hirano and Imbens (2004), is used :
{Y (t)}t∈τ ⊥ T |X
62
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
The stronger form of independence is maintained in this work because in practice,
as argued by the author, it can be difficult to find applications in which the weak
assumption may be plausible but the stronger form may not. This hypotheses allows
to control for systematic differences in the observed covariates across treatment doses
because the dose-response function can be written:
E[Y (t)] = EX [E[Y (t)|X = x]]
= EX [E[Y (t)|T = t, X = x]]
= EX [E[Y |T = t, X = x]]
This suggests a regression approach to estimate E[Y (t)]. The author proposes nonparametric estimators based on a kernel function (Nadayara-Watson multiple regression estimator, or nonparametric mean regression estimator). However, for calculation
of the estimators under the independence assumption it is needed first to estimate the
non parametric regression of the observed outcome on the treatment dose and the covariates. This may be a problem if the dimension of the covariates is large. To deal
with this problem of “dimensionality”, the paper discusses the use of the GPS, such as
it is presented by Hirano and Imbens. However, the author proposes a non parametric
estimation of the propensity score using nonparametric kernel estimators. The estimators for the three parameters of interest are two-step nonparametric estimators, where
the GPS is estimated nonparametrically in the first step. To be more specific, the three
estimators are:
n
Ê[Y (t)] =
1X
λ(rˆi )gˆh (t, rˆi ) for all t ∈ τ
n
i=1
n
1X
λ(rˆi )gˆh1 (t, rˆi )
n
α̂ = arg max
t∈τ
i=1
n
Ê[Y (α0 )] =
1X
λ(rˆi )gˆh2 (t, rˆi )
n
i=1
where ĝ(t, r̂) is the Nadayara Watson multiple regression estimator,
Pn
ĝ(t, r̂) =
t−tj r̂−rˆj
j=1 Yj K( h , h )
,
Pn
t−tj r̂−rˆj
j=1 K( h , h )
r̂(t, x) is the nonparametric estimator of the GPS and λ(·) is a trimming function used
3.5. SOME NON-PARAMETRIC APPROACHES
63
to avoid the “denominator problem” by keeping a denominator bounded away from zero.
Then, the author moves from the regression context and discusses other ways to use
the GPS. For example in a matching framework. This approach will be discussed in the
part of this section dedicated to the matching estimators. Another proposed solution
is to adopt a weighting approach to estimate the dose-response function as a natural
extension to the continuous treatment case of the weighting-by-the-propensity-score
approach used in the literature when the treatment is binary.
Finally, the author illustrates the techniques developed in the paper by presenting an
empirical application to estimate non-parametrically the turning point of the inverted
U-type relationship between some indicators of environmental degradation and income
per capita, known in this literature as the “Environmental Kuznets Curve”.
The main advantage of this work is the non parametric structure, which allows to
relax any parametric assumption and to avoid any mis-specification problem. Moreover, the new parameters of interest proposed by the author are really meaningful and
interesting.
On the other hand, as in the work of Imbens and Hirano, a properly estimation of
an effect of the treatment is not obtained: the parameter of interest is always the curve
of the potential outcome and, another time, no comparison between participants and
non-participants is done. Also the consideration about the selection rule is the same as
before: there are no distinctions between positive treatment level and the treatment at
the level zero.
3.5.1
Subclassification on the propensity score
Another important contribution to the program evaluation with a continuous treatment
that deals with the generalization of the propensity score approach is the work of Imai
and van Dyk (2004). The aim of the analysis is to develop theory and methods that
encompass all the techniques applied to the binary, ordinal or categorical treatment case
and widen their applicability by allowing for arbitrary treatment regimes, categorical,
ordinal, continuous, semi-continuous, or even multi-factored. This is done by developing
the theoretical properties of the propensity function, which is a generalization of the
propensity score of Rosenbaum and Rubin.
To evaluate the effect of the treatment, the authors rely on the two standard assumptions: the SUTVA and the standard conditional independence assumption between
treatment and outcomes, given the observed covariates. The parameter of interest concerns the average potential outcome Y (t) at each level t, that is the average over the
64
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
population of the covariates of the distribution f (Y (t)|X). To overcome the problem of
a mis-specification adopting a parametric approach to model the variable of interest the
authors propose to use non-parametric techniques: matching and subclassification are
commonly used. However, as the dimensionality of X increases, these methodologies
become impossible in practice. Thus, they propose a generalization of the propensity
score method. After defining the propensity function as the conditional probability of
the actual treatment given the observed covariates, fψ (T |X) where ψ parameterizes
this distribution, this set of parameters ψ must be estimated because in practice is
unknown. This parametric model defines the propensity function, eψ (·|X) = fψ (·|X).
To simplify the representation of the propensity function, the authors make an
assumption: there exists a finite dimensional parameter θ that uniquely represents
eψ (·|X) = e[·|θψ (X)]. This implies that the propensity function depends on X only
through θψ (X), i.e. θ is sufficient for T . The main advantage of this approach is that
the parameter θ is typically of much lower dimension than is X.
The methodological contribution of the paper follows after these definitions: the
authors show that given the propensity function the conditional distribution of the
actual treatment does not depend on observed covariates, that is the balancing score
property, and that the strong ignorability assumption holds with X replaced by the
propensity function. Then, it can be averaged f (Y (t)|e(·|X)) over the distribution
of the propensity function to obtain f (Y (t)) as a function of t. To accomplish this
average two solutions are suggested: subclassification or matching. In this latter case,
the authors refer on the work of Lu et al. (2001), that will be discussed in the following
paragraph. They argue that, although matching methods may be useful in particular
setting, they believe subclassification is a more generally applicable strategy because it
allows for simpler implementation of more complex analysis models.
The subclassification solution implies that, once estimated θ̂ and compute for each
observation θ̂ = θψ̂ (X), the observations with the same or similar values of θ̂ are
subclassified into a moderate number of subclasses. Within each subclass f (Y (t)|T = t)
is modelled and the relevant causal effect is obtained, e.g. the regression coefficient of
Y (t) on t. Then, the average causal effect can be computed as a weighted average of the
within-subclass effects. This last parameter describes how the full distribution of the
potential outcome can be approximated at a particular level of the treatment. Although
this full distribution is sometimes appropriate in practice more often it is summarized
by its mean, E[Y (t)]: this is the approach the authors take in the example they present.
After presenting two Monte Carlo experiments to illustrate how controlling for the
propensity function can improve the statistical properties of estimated causal effects,
3.5. SOME NON-PARAMETRIC APPROACHES
65
the work ends with an application: the propensity function method presented is applied
to two dataset; it is estimated the effect of smoking on medical expenditure and the
effect of schooling on wages.
This work represents another important contribution to the literature on the propensity score approach with continuous treatment. What might be pointed out is that
the continuous dimension is in some sense underestimated. The proposed generalized
propensity function might be used to estimate the full distribution of the effects, but
what the authors suggest is to use its mean to summarize it; in this way the information
given by the continuous treatment is lost.
Another important consideration deals with the relation between the levels and the
effects: the authors propose to regress the outcome variable on the treatment dose,
within each subclass. This implies a proportional structure of the levels, that might be
a strong assumption. Another specification of this relation might be adopted.
Moreover, as in the previous work, the selection rule is not considered. There are
no distinction among the levels, in particular between non treated units and individuals
treated at any level. This indirectly justifies the fact that only the data of treated units
are used.
3.5.2
Matching estimators
This section will discuss that part of the non-parametric methods adopting a matching
approach to solve the evaluation problem in a continuous treatment setting. This part
of the literature plays an important role in our analysis where the empirical application
will be presented.
A contribution on the continuous treatment is the empirical work of Behrman et al.
(2004). The continuous dimension is given by the length of exposure of a treatment. The
authors do the analysis in the context of studying the effect of a preschool development
program targeted toward disadvantage children between the ages of 6 and 72 months
in Bolivia on some child outcome measures related to health, psycho-social skill an
cognitive developments. They mainly focus on estimation of two parameters of interest:
the average treatment effect on the treated, as in the binary case, and what they call
marginal program impacts, that are the effects of increasing the duration in the program.
In the first part of the paper they develop a model of enrollment that gives an
economic interpretation for the average treatment effects that they estimate later. In
order to control for potential bias due to non random selectivity into the program,
they propose to use matching estimators allowing for a continuous dose of treatment.
66
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
After identifying the main assumptions of this approach, that they specify as in the
binary case, keeping separated the group of the non treated from the set of the treated
units, without make any consideration about the treatment level, they present the
“cumulative” and “marginal” matching estimators. In the first case the parameter of
interest is the average impact of treatment on the treated given by:
E(∆T 0 |t > 0) = E[Y (t) − Y (0)|t > 0]
where t ∈ τ denote time spent in the program, with t = 0 for nonparticipants. The key
identifying assumption they use is:
E[Y (0)|ti = t, X = x] = E[Y (0)|ti = 0, X = x],
for all t ∈ τ.
A stronger version of this assumption is Y (0)⊥ t | X for all t ∈ τ . The next step is the
estimation of the expected values for participants and nonparticipants, conditional on
X and on the level t of treatment, using local nonparametric regression methods. To
be more specific they estimate:
X
Ê[Y (0)|xi , ti = 0] =
Yk (0) Wk (||Xk − Xi ||)
k∈{ti =0}
Ê[Y (ti )|xi , ti > 0] =
X
Yk (tk ) Wk (||tk − ti ||, ||Xk − Xi ||)
k∈{ti >0}
where Wk (||Xk − Xi ||) and Wk (||tk − ti ||, ||Xk − Xi ||) are weights that add to one
and come from the local nonparametric regression of Yk (0) on X, and of Yk (tk ) on
t and X respectively, and || · || is the euclidean distance. In this way, as suggested
by the authors, the weights depend on the distance between tk and ti , allowing the
impact to depend on the duration of time in the program. They also point out that
an alternative approach would be construct the weighted averages for the estimation
of the conditional expectation for participants over the set of observations that are
selected into the program and that receive a treatment level equal to ti . Instead they
do local averaging over durations t because there may not be many observation at any
individual duration value. Thus, they emphasize one of the most relevant problem
in the estimation of effects with a continuous treatment: due to the continuity of the
treatment variable it would be difficult in practice to find observations with a level value
3.5. SOME NON-PARAMETRIC APPROACHES
67
exactly equal to t.
Then, for the estimation of the average treatment effect on the treated they propose:
Ê(∆T 0 |t > 0) =
1
n
X
{Ê[Y (ti )|xi , ti > 0] − Ê[Y (0)|xi , ti = 0]}
i∈{ti >0}∩{ti ∈Sp }
where Sp is the region of common overlapping support an n is the cardinality of the set
{ti > 0} ∩ {ti ∈ Sp }.
For the estimation of the second set of parameters, what they called the “marginal”
estimators, the focus is on the marginal treatment effect of increasing duration in the
program from a level to another, to t0 to t1 :
E(∆t0 ,t1 ) = E[Y (t1 ) − Y (t0 )|t > 0]
The way the authors propose to compute it is to use only the data on participants,
drawing comparisons between program participants who have taken part in the program for different lengths of time. An advantage of this approach is that it does not
require assumptions on the process governing selection into the program. On the other
side, there is another potential source of non random selection: the process governing
selection into alternative program duration. Again, matching methods can be used to
solve this selection problem relating yo the choice of program duration, under the assumption that units who have taken part in the program for different lengths of time
can be made comparable by conditioning on observed unit with similar characteristics. The solution proposed is similar to the one presented before and the conditional
expectations are estimated by the same local regression method as described above.
The final consideration, before presenting some empirical results, is on the possibility of a dimensional reduction: they assume that the conditional mean independence
assumption holds with X replaced by the propensity score p(x) = P (T > 0|X = x),
where T = {t : t > 0}. Then, the conditional expectation can be estimated by threeand two-dimensional non parametric regression using the distance across the propensity
scores instead of the values of the covariates.
A final theoretical consideration regards the selection on unobservable issue: the
necessary assumptions for the matching estimators proposed are not likely to be satisfied
if unobservables that are related to outcomes are important determinants of program
selection. One option is to use a difference-in-difference matching strategy that allows
for time-invariant unobservable differences in the outcome between participants and
68
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
non-participants. However the data used in the analysis do not allow application of this
estimator, because program participants are only observed after they already entered the
program. Only the marginal impact of short versus long durations might be estimated
using this estimator, allowing selection to be based on unobservables.
This work is one of the most important application of matching estimators in a
continuous treatment setting: it explicitly considers the continuous dimension of the
treatment and tries to develop and to modify the existing matching estimator in order to estimate causal effects and some relation between effects and treatment length.
Moreover, it gives some importance to the selection process. In the first part of the
paper it is presented a theoretical model for the enrollment decision, and it is recognized a selection process composed of two elements: the program participation and the
alternative program durations. However, the proposed theoretical model for the enrollment decision, is not the real selection process, but rather a useful interpretation for
the treatment impact estimates. As before, there are no distinctions across the levels
and between non treated and treated units. On the other hand, this is not relevant
in the marginal effects estimation case, as suggested by the authors. Instead, if the
parameter of interest is the estimation of the treatment effects when the counterfactual
is no treatment the assignment rule has a central role that might not be excluded form
the analysis.
As regards the dimensional reduction, it is important to note, as pointed out by
Flores (2004) that the independence assumption does not follow from the conditional
mean independence stated before. Or, it is not the case that Y (0) ⊥ t | X implies
Y (0) ⊥ t | p(X) for all t ∈ T . The assumption about p(X) made by the authors has no
relation to the unconfoundedness-given-X assumption discussed in the works of Hirano
and Imbens (2004) and Flores (2004).
Another important consideration has to be done with respect to the final estimates
presented in this work. In the first part of the analysis the focus is on the importance of
the continuous dimension of the treatment. However, what the authors propose for the
estimation of causal effects is an average treatment impact, that is a weighted average
impact of participating in the program relative to not participating for the treated units.
In this way all the information relative to the continuity of the treatment is lost and no
relation between treatment level and doses is obtained.
Another relevant work that deal with the estimation of causal effects using a matching approach is the study of Lu et al. (2001). The focus is on the evaluation of the
effects of a media campaign launched in the United States intended to reduce illegal
3.5. SOME NON-PARAMETRIC APPROACHES
69
drug. Since the campaign was implemented throughout the country there is no unexposed or control group available for use in evaluating the effect of the program. Hence,
in this case, only the data of the participants are available and, as a consequence,
only marginal effects might be estimated. The main idea of the authors is to compare
units who receive different exposures to the campaign, but who were similar in terms
of baseline characteristics. For that reason, they propose a matching approach.
Multivariate matching with doses of treatment differs from the usual treatmentcontrol matching mainly in two ways. First, pairs must not only balance covariates,
but also must differ markedly in dose, in such a way that the final high- and low-dose
group have similar of balanced distribution of observed covariates. Second, any two
subjects may be paired, so that the matching is nonbipartite, that is within a single
group. In this case the group is given by the treated units. Finally, a propensity score
with doses must be used in place of the conventional propensity score. Then, this
different approach affects three aspects of matching: the definition of the propensity
score, the definition of distance and the choice of optimization algorithm.
After discussing the relationship between the authors’ approach and that of Imbens
(1999), it is presented the optimal matching algorithm used to minimize total distance
and the definition of propensity score used in order to take in account the continuity
dimension of the treatment variable. The key issue is to use a model that allows the
distribution of doses given covariates to depend on these regressors via a scalar function
of the covariates. This happens in McCullagh’s ordinal logit model (McCullagh (1980)),
that is used in the work to obtain a balancing score, used in the matching. Finally, it
is presented the particular distance used: the goal of matching with doses is not only
to balance the observed covariates, but also to produce pairs with very different doses.
The authors propose a distance measure that decreases both as the propensity scores
become similar and as the assigned treatment become dissimilar.
This work is another example of an extension of the matching approach to a continuous treatment case: as before, only the data of treated units are used and the selection
process is not considered. In this case, however, there should be no other choices, because of the overall cover of the program. For that reason, the estimated parameter of
interest is not the impact of the campaign against the benchmark of no program, but
rather the marginal treatment effect of increasing exposure in the program from a low
to a high dose. In some sense, the continuous dimension of the treatment is reduced to
a comparison between two groups, such as in the traditional program evaluation case
with a binary treatment, with the difference that the comparison is across units that
receive some treatment. Again no relations between the doses and the treatment effects
70
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
are estimated.
The matching approach adopted by the authors of this work is different from the
other studies briefly summarized in this review. The relevant distinction is on the role
of the continuous treatment variable in the matching procedure: here the authors stress
the fact that matching has to be done between units with similar covariates but very
different treatment dose, such as in the binary treatment case when the matches are
computed between the group of the treated and non treated individuals. On the other
hand, the work of Behrman et al. (2004) and the one of Flores (2004), point out that
the matching has to be done between similar units, also in the treatment level received.
They justify this assumption arguing that matching is informative about the potential
outcomes, or the effects, if it is done by comparing units that receive a dose sufficiently
close to the level one is interested in. However, it is important to note that the works
are different also in the estimated parameters of interest and this might be justify the
different approach adopted.
As mentioned before, also the work of Flores (2004), presented above, discusses the
use of the generalized propensity score with a matching approach. He points out that,
since the continuity of the treatment it would be difficult in practice to find observations
with a dose value exactly t. Thus the matching has to be done not only on the GPS,
but also in the treatment level. A reasonable way in which this method could be done
is by matching observation on ||t − Tj , r(t, Xi ) − r(T, Xj )||, with || · || being a given
metric. A disadvantage is that one can end up predicting the dose-response function
at t using observation that received doses which are very far from t. As a consequence,
the author proposes another method to match the units. Consider a window of size
δn around t, where as usual δn is a sequence of positive real numbers tending to zero
as n → ∞. Then observed outcomes with Ti ∈ [t − δn , t + δn ] can be thought as an
approximation to the potential outcomes of those observations at t. For observations
with Ti ∈
/ [t−δn , t+δn ] one looks for matches based on the GPS to impute their missing
potential outcomes at the t level. Then the matching estimators can be written as
n
Ê[Y (t)] =
1X
Ŷi (t)
n
i=1
with
(
Ŷi (t) =
Yi ,
1
M
P
l∈SM (i) Yl ,
if Ti ∈ [t − δn , t + δn ]
if Ti ∈
/ [t − δn , t + δn ]
where SM (i) is the set of indices for the M closest matches for unit i in terms of
3.6. CONCLUSIONS: OUR STARTING POINT
71
|r(t, Xi ) − r(t, Xj )|, with i 6= j and Tj ∈ [t − δ˜n , t + δ˜n ] (δ˜n is a sequence tending zero as
n → ∞). Note that in this way the matching is done by comparing units that receive
a dose sufficiently close to t in order for them to be informative about the potential
outcomes at t.
3.6
Conclusions: our starting point
The aim of this review was to understand the state of the literature on program evaluation with continuous treatment. From this, some important personal considerations
and comments might be done, in order to better understand the motivations which will
guide the next work. In particular they deal with:
• comparison between treated and non-treated units;
• the participation decision process;
• the parameters of interest.
The first issue is the most relevant also because the other two depends on it. To be more
precise, it regards the type of comparison one is interested in. The studies presented
above mainly focus on the comparison among treated individuals. On the contrary,
what we are interested in regards a comparison between units treated at different levels
and non-treated units. It follows that the selection process that identifies participants
versus non-participants, becomes a fundamental source of non-random selection in the
identification of any policy effects, together with the process governing participation
into alternative program doses.
As regards the quantity to estimate, there are two important consideration that
might be pointed out. First, when the focus is on estimation of the policy effects,
there is often an underestimation of the continuity dimension; the information given
by the continuous treatment variable is lost because what is estimated is an average
effect among different levels of treatment. On the other side, when the estimation is
a function of the doses, the parameter of interest is the potential outcome Y (t) rather
than some policy effect.
Starting from these considerations, the main question our analysis want to answer is:
why do not focus on the policy effects on the treated versus the non-treated units at different treatment level? Thus, the idea is to compare participants with non-participants,
in order to estimate the effects of an intervention for each level of treatment one is interested in. Furthermore, why do not consider how these effects are related with the
72
CHAPTER 3. THE CONTINUOUS TREATMENT CASE
continuous treatment variable? The idea is to find some relation among treatment levels
and treatment effects.
The next chapter will deal with these issues. A new methodological approach will
be proposed: it refers to a development of matching estimators. The choice of this kind
of estimators, rather than some alternative approaches, comes from their well known
good properties in the binary treatment case (see Heckman et al. (1998b) and Heckman
et al. (1997b)).
Chapter 4
A new approach to empirical
estimation
4.1
Introduction
As mentioned before, the previous chapter regarding a review on the main contributions
on the continuous treatment program evaluation constitutes the starting point for this
part of the work. Following the main limits, drawbacks and considerations discussed
before, the aim is to present a different estimate approach on the topic of the program
evaluation with continuous treatment.
The idea is to study the relation between the effects of a policy and the treatment
levels, identifying and estimating the possible parameters of interest and trying to solve
the main inconsistencies of the analysis discussed before. Starting from a specification of
the continuous treatment setting, it will be presented a new specification for the selection
process, the parameters of interest and the new matching approach adopted to a new
empirical estimation of the treatment effects. The main differences and developments
with respect to the previous literature will be underlined in the course of the analysis.
4.2
The continuous treatment setting
As briefly mentioned before, the common approach followed from the evaluation literature is the potential outcome approach: yi (T ) represents the set of potential outcomes,
73
74
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
for each unit i, given a random sample indexed by i = 1 . . . N , and T represents the
continuous variable indicating the treatment level. As in the binary case, the definition
of potential outcome already made implicit uses the assumption of the stable-unittreatment-value (SUTVA), that is no interference between units. It assumes that all
the potential outcome yi (T ) of individual i are not affected by the allocation of other individuals to the treatments. Thus, it is assumed that the observed outcome yi depends
only on the treatment level to which individual i is assigned and not on the allocation
of other individuals. For each unit i there is also a vector of covariates Xi and the level
of treatment received, ti ∈ [0, t1 ] = T . Thus, the observed information are the vector
Xi , the treatment received ti and the corresponding potential outcome, yi = yi (ti ).
Then, with a continuous treatment variable, each unit is characterized by a set of
potential outcome. Following what is traditionally done in the literature on binary
treatment, the sets of potential outcome may be divided in two groups: yi (T ) for all
the outcomes under the active treatment level T , with T ∈ ]0, t1 ], and yi (0) otherwise.
The difference with the traditional binary case is that, in the continuous setting, the set
of active treatments yi (T ) includes an infinite number of potential outcomes, depending
on the treatment variable T .
The outcome Y is assumed to depend on a set of observable covariates X and on
the participation status. The equations of the potential outcomes for individual i can
be generically represented as:
yi (ti ) = f T (Xi , ti ) + ui (ti ), for T > 0
yi (0)
= f 0 (Xi , ti ) + ui (0),
for T = 0
The functions f T (·) and f 0 (·) represent the relationship between the set of covariates
X and the potential outcome Y (0) e Y (T ), while the terms U (0) and U (T ) identify the
mean zero error terms, assumed to be uncorrelated with the X. This vector of variables
is assumed known at the participation decision time, and not influenced by treatment.
The missing data problem, that is the impossibility of observing units under all the
treatment status, is here more complicated, because the treatment status are not more
only two, but are infinite. The general observed outcome Y can be written as:
yi = di yi (ti ) + (1 − di )yi (0)
where D is a dummy variable indicating the treatment status: in particular D = 1
if the individual has been treated and D = 0 otherwise, and yi (ti ) is the particular
4.2. THE CONTINUOUS TREATMENT SETTING
75
potential outcome at the observed level ti . In practice, when D = 1 we observe yi (ti ),
when D = 0 we observe yi (0).
4.2.1
The selection process
Selection into treatment determines both the treatment status di and the treatment
level ti ; then it may be considered as composed of two processes. The participation
decision assignment will determine the treatment status di , while the treatment level
process will determine the dose ti . For simplicity, the participation assignment will
be stated as the first processes. It is worth noting, however, that they occur at the
same time. Moreover, it is assumed they occur together at a fixed moment in time and
depends on the information available at that time. This information is summarized
by a set of observable variables X = {W, Z} and unobservable ε = {V, U }, where W
identifies the treatment status and Z the subsequent treatment level. Assignment to
treatment is then assumed to be made on the basis of
(
ti =
g(Zi ) + ui if di = 1
0
where
otherwise
(
di =
1 if Ii > 0
0 otherwise
and
Ii = h(wi ) + vi
This means for each units there is an index Ii , function of the set of variable W,
for which participation occurs when it raises above zero and, only for treated units, an
index ti , function of the set of variable Z, identifying the level of treatment.
The reason for adopting this structure represents the basis of the approach that
will be presented further: it assumes to consider separately the selection process that
determines the program participation from the further process that identifies the level
of treatment. As seen in the previous chapter, this approach is not what is commonly
followed by the literature: the works that deal with the continuous treatment setting
generally focus on the specification of a model for the treatment level, given the set of
pre-treatment variables, without explicitly consider the selection rule that identifies the
non-treated and the treated units, at any level. Thus, there are no distinctions between
all the different doses of treatment and the treatment at the level zero, that is no
76
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
treatment. Instead here, the idea is to specify first an assignment rule for the selectivity
process that determines if a unit is treated or not, and subsequently a model for the
specification of the level among the selected units. To justify this approach it might be
noted it is reasonable that the selection rule and the treatment level assignment can be
influenced by different variables: adopting different specifications for the two processes
might be helpful for considering these different factors. Moreover, in this way, there
is a distinction between the treatment at the level zero and any strictly positive dose
of treatment. Then, the treatment interval T is split in two parts: T 0 which includes
only the treatment level equal to zero. The other one, T + , regards the set of positive
treatment doses. In this way, the set of non participants is maintained separated from
the set of participants through the full analysis, and this allows to identify the two
different selection processes.
Actually, from a theoretical point of view, this approach could be embedded in the
concept of a generalized propensity score, where the probability of a given output is
estimated by a set of covariates and by a level of treatment, depending to the same
covariates. However, our proposal has some empirical advantages: it exploits the full
information set containing treated and not treated units; it contains a more efficient estimation of the two processes (the selection and the level of treatment); it can incorporate
some empirical recognized restrictions on the relation between the two processes.
It is worth to note, however, that this distinction becomes meaningful if it is referred
to two particular situations. The first one regards the parameter of interest one wants
to estimate. In particular, when the parameter regards the effects of the levels of
treatment with respect to the non treatment case, it might be important to consider
the two selection process, as mentioned before. Instead, if one wants to estimate the
effects of a level of treatment with respect to another level, the first selection process
that specifies the treatment status might be not considered and the evaluation might
be carried out only among the treated units. It might be the case, for example, of
the estimation of the effects between units treated at “high” level and units treated at
“low” level. On the other hand, this distinction becomes meaningful when one of the
two components of the selection process, or both, is known (or partly known). In these
situations, the evaluator might decide to use this information in order to better model
and predict the selection assignment rules and the causal effects.
4.3. THE PARAMETERS OF INTEREST
4.3
77
The parameters of interest
As in the binary treatment case, the possible counterfactual of interest, in a continuous
setting, might be different: for example one might be interested in the comparison of the
state of the world in the presence of an intervention at a particular level of treatment
to the state of the world if the program did not exist at all, or if alternative levels were
applied. A full evaluation should consider all outcomes of interest for all person, both in
the current state and in all alternative state of interest. However, when the treatment
is continuous this analysis is more difficult because the potential counterfactuals are
infinite and another source of variability is introduced, given the continuous treatment
variable.
To be more specific the treatment effects are now influenced by three components:
the treatment level, the heterogeneity among the units and the stochastic component:
αi = f (T, i, ε)
where αi represents the treatment effect for the i-th unit.
A complete evaluation analysis has to consider all these sources of variability. With
respect to the binary treatment case, there is an additional component, that is the one
induced by the treatment variable. Then, apart from the error term ε, the heterogeneity
issue can be interpreted in two ways. First, the effects might be different among the
levels, and that is what this work wants to focus on. Secondly, for each level, the effects
may vary among units. That is the traditional heterogeneity problem in the literature
of the program evaluation with binary treatment. Recent developments on this topic
deal with the estimation of the distribution of the effects. Hence, the focus is not
more on average treatment effects, but on other summarizes of the distribution, such
as quantile estimations. This matter is not considered in this analysis, not because it is
irrelevant, but rather because the focus is on the estimation of relation between effects
and treatment levels.
Then, the statistical solution to the causality problem might be, as in the binary
case, the transition from the individual to the group level counterfactual. Given the
impossibility of observing the same person in different states at the same time, the focus
might be on counterfactual means. This does not mean that the parameter of interest
is an average treatment effect but rather an average effect among units evaluated at
different treatment levels. The idea is to use the information of the treatment level to
estimate different kind of treatment effects as a function of the doses.
78
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
4.3.1
Average treatment level effects
In order to study the relation between effects and treatment doses, what is proposed
here is, first, an estimation of the average effect among units evaluated at different
treatment levels. Assuming to have an infinite number of observations, or a series of
finite number of observation for each treatment dose; a natural development of the
treatment effects estimation in the continuous case should be the difference between
the outcome of the units treated at each level with the outcome of the untreated units.
That is the average treatment effect for the t-th level,
α(T )AT E = E[Y (T ) − Y (0)]
for a person randomly drawn from the population, and the average treatment effect on
the treated at the t-th level,
α(T ) = E[Y (T ) − Y (0)|T = t]
for a person randomly drawn from the subpopulation of the participants at the level
t. In both cases, the expected value is taken over all the observations treated at the
same level. This latter parameter α(T ) will be called the average treatment level effect
(ATLE) and it will be the parameter this analysis will focus on. This may be justified
as it happens in the binary case, where the average treatment effect on the treated
(TTE) represents the parameter that had received more interest in the current literature
(Heckman and Robb (1985) and Heckman et al. (1997b)). In fact, it is reasonable to
argue that the subpopulation of treated units is often of more interest than the overall
population in the context of narrowly targeted programs.
As in the binary case, only under the assumption of homogeneous treatment effect
among units at each level t ∈ T , all these parameters are identical, but it is obviously not
true when treatment effects vary among individuals. However, as mentioned before, this
heterogeneity issue is left for further studies and the focus here is on average treatment
effect at each dose.
The framework presented above is obviously based on the potential outcome approach developed by Rubin (1974); it means that ex-post only the outcome corresponding to the program in which the individuals participate is observed. However, in this
setting another “missing data” problem arises: because of the continuous nature of the
treatment it would be difficult to cover all the possible levels and more to have different
4.3. THE PARAMETERS OF INTEREST
79
units treated at the same level.
To go over this problem and to obtain an appropriate estimation of the counterfactuals of interest this work proposes to adopt a matching approach, with the aim of
eliminating any biases associated with differences in the covariates by pairing similar
units.
4.3.2
Treatment dose function
Once estimated the average treatment level effects for each observed treatment level,
the next interesting object to estimate regards the specification of the relation between
effects and levels
α = f (t, ε)
in order to estimate the entire function of average treatment effects over all possible
values of the treatment doses. The idea is to study if the treatment level differently
influences the effects on the response variables.
To study this relation different approaches might be chosen, also with respect to
different hypothesis and structures one wants to impose. What is proposed here is a
parametric versus a non parametric approach.
In order to investigate if the treatment level differently affects the response variable,
what we propose is to use an OLS estimator imposing a quadric relation between effects
and level of subsidies:
α = β0 + β1 t + β2 t2 + ε
A non linear estimation instead of a simple linear regression model is preferable in
order to better detect some heterogeneity of the effects. A simple regression model would
turn out the correlation between the two variables: it might be a primary information,
but it implies a proportional structure of the treatment level and a curve of the effect
that is a straight line. This might be a very strong assumption. Our idea is that the
average impact can hide some relevant effects at some point of the subsidy distribution.
Some considerations on this issue will be also discussed in section 4.3. For that reason
we propose other specifications that can include some quadratic or higher order relations
between effects and levels.
To model the treatment level effects, we also adopted another parametric approach:
the quantile regression. The ability of quantile regression models (Koenker and Bassett
(1978)) to characterize the heterogeneity impact of variables on different points of an
80
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
outcome distribution, makes them appealing for evaluating the effects of policy interventions. The quantile regression has been recently used in DID models for evaluating
the effects of policy changes by Athey and Imbens (2006). The authors improve this
technique to estimate the entire counterfactual distribution of outcomes that would
have been experienced by the treatment group in the absence of the treatment and for
the untreated group in the presence of the treatment. Restating our problem of evaluating the effects of policy interventions in a quantile regression framework allows us
to investigate if treatment groups have been differently beneficiated from the treatment
and to provide an analytical description of the effects distribution with respect to the
treatment doses.
The relation between average effects and treatment level may be evaluated also by
adopting a non parametric approach that allows to estimate this relation regardless of
any functional form. What is proposed here is to use a non parametric mean regression
estimator, the Nadayara-Watson kernel estimator (see Nadayara (1964)):
Pm
i
α̂i K( t−t
h )
E[α|T = t] = Pi=1
m
t−ti
i=1 K( h )
where K is a kernel function, α̂i is the ATLE for the i-th level and m is the number of
the estimated ATLE.
A graphical analysis of the estimated function might be useful to easily interpreter
the relation between levels and effects of treatment and to see if there are some relevant
effects at some point of the subsidy distribution. However, it is worth to keep in mind
that in the analysis of the relation between impacts and treatment level, average causal
effects are computed by comparing treated against non treated units. That is, our
method is able to estimate the causal effect due to the participation ruling out the
differences between these two groups. This means that, in general, there could be some
differences among treated units (at different levels). That is, participants might be
different not only with respect to the level of treatment. Then, in order to evaluate the
impact of the amount of granted subsidy on treatment effects one should be careful in
the interpretation of this relation. With the proposed methods we are able to evaluate
if there is some heterogeneity with respect to treatment level, but this different impacts
can be interpreted as causal treatment effects only with respect to the non-treatment
status.
Another important consideration, that cannot be underestimated when a program
4.4. THE RANDOM TREATMENT LEVEL CASE
81
consists of a continuous treatment, regards the selection process for the level of treatment. In particular there are two possible opposite alternatives: the case of a random
versus a non random treatment level assignment. Let consider separately this two
different situations.
4.4
The random treatment level case
This situation represents the simplest case: the selection process that identifies the
level of treatment is random. This means that, given the first selection process that
determines which units are treated and which are not, the next level assignment rule is
random. Thus, it is assumed that, as in the binary case, the participation decision can
be identified by the following rules:
Ii = h(wi ) + vi
that means for each units there is an index I, function of the set of variable W, for
which participation occurs when it raises above zero. V is the error term and
(
di =
1 if Ii > 0
0 otherwise
where, in this case, D is a dummy variable indicating the treatment status: in particular
D = 1 if the individual has been treated, at any positive level, and D = 0 otherwise,
that is if the unit is not treated. The next process identifying the treatment level is
random: it means there is no more a function identifying the treatment level, but only
a function that determines the treatment status, that is the index I. It is not needed
anymore a continuous variable T indicating the treatment level, because the distinction
among units is only between treated and non treated, and all the levels, bigger than
zero, are equally likely. The treatment level assignment rule might be simply written
as:
(
ti =
RW
if di = 1
0
otherwise
where RW stands for a random walk process. This represents the fundamental assumption of that particular case:
Assumption: Random hypothesis:
82
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
the treatment level variable T is allocated at random
that means the doses are given to all sample members with equal probability. If this
assumption is valid it is implied, by design, that T will be independent of any other
influences, whether observed or unobserved. Then, it implies independence between the
levels and any observable variables. To be more specific, it can be written:
T ⊥X
(4.1)
where X is the set of observable covariates.
The main advantages of this type of data come from the randomization process:
controlling for the first selection assignment, treatment versus control, removes any
biases associated with differences in the covariates determining this process. The further
randomized process allows to rule out bias caused by self-selection among treated units,
as individuals are randomly assigned to the treatment level of the program and there
are no relations with other factors.
4.4.1
Estimation of the effects: a matching approach
Given the assumptions stated above, the estimation of the treatment effects might be not
so different with respect to the binary treatment case. In particular, if the parameter
of interest were the average treatment effect, it might be applied the methods used
in the traditional evaluation policy with a binary treatment. Because there are no
significant differences among the levels, it is sufficient to eliminate the bias associated
with differences in the observable and unobservable components between the group of
the treated and untreated units.
However, here the focus is on the ATLE parameter and on the curve of the treatment
effects, as a function of the levels. For that reason, what should be done is to find some
properly development of the traditional methods in order to estimate the new parameter
of interest, or equivalently, to take in account for the continuous dimension.
What is proposed here is an estimation of the curve using a matching approach: the
choice of this particular method comes from its nice features. In fact, it represents one
of the most popular methods in recent works that deal with the program evaluation,
because of its flexibility and its ability to simulate an experimental setting.
The idea is to use the properties of this method to eliminate the differences on the
covariates between the treated and untreated units by combining individuals of the two
4.4. THE RANDOM TREATMENT LEVEL CASE
83
groups. Then, the mean difference between treated and control units at each observed
level represents an estimation of the average treatment effect for that particular dose
and an estimation of the entire curve of the effects might be obtained by modelling the
estimated effects computed for each level.
In the random treatment level case, the assumptions the method relies on, are quite
similar to the hypothesis needed for the binary case. Only two relevant treatment status
are recognized: the non participants, that is D = 0, and the participants, that is D = 1,
that means ti > 0. Then, the main assumptions of this approach are:
Assumption 1: Conditional independence
Y (0) ⊥ D | X
(4.2)
conditional on the set of observables X, the non-treated outcomes are independent
of the participation status.
Assumption 2: Common support
0 < P (di = 1) < 1
all treated individuals have a counterpart on the non-treated population and anyone constitutes a possible participant.
As in the binary case, the first assumption has the implication that for each treated
observation, Yi (ti ), one can look for one or more non-treated observations Yi (0) with the
same X-realization and be certain that this Yi (0) constitutes the correct counterfactual.
The second assumption ensures that each participant can be reproduced among the
non-participants.
Let S represent the common support of X. Then, under assumption 2, S is the
whole domain of X. The matching estimator proposed for the average treatment effect
at the level t, ATLE, is the empirical counterpart of
α(t) = E[Y (T ) − Y (0)|X ∈ S, T = t]
R
E[Y (T ) − Y (0)|X, T = t] dF (X|D = 1)
R
= S
S dF (X|D = 1)
84
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
This result represents the expected value of the program impact for the t-th level: it is
the simple mean difference in outcomes over the common support S, weighted by the
distribution of participants. We can consider all the treated units together, because of
the random treatment level hypothesis. As before, the expected values is taken over
observations with the same treatment level, assuming there are available information.
Then, the general form of the matching estimator is given by

α̂(ti ) = E yi (ti ) −

X
aij yj (0)
(4.3)
j∈C
where C represents the comparison group, aij is the weight placed on comparison observation j for individual i and ti stands for the observed level of treatment for the i-th
unit. That is, the estimator can be computed only at the observed treatment levels
ti . The expected value is taken among units treated at the same level ti . Hence, it
might happen that the average among treated units will disappear: it might be the case
when it is impossible to empirically observe two individuals with the same treatment
dose realization. This has some implications on the choice of the matching procedure to
use: an accurate evaluation has to be done in order to properly consider the precision
and the robustness of the estimation. It is clear there is a trade-off between these two
aspects: considering separately each single treatment level t(i) allows to increase the
precision and the detail of the effects estimation that is evaluated at each level t of
interest. On the other hand, this might decrease the significance and the robustness
of the estimation, because the estimation of the treatment level effect is obtained only
comparing one treated observation. That is why the choice of the matching procedure
to use might become a very relevant issue for the analysis.
Propensity score matching
In order to improve the implementation of matching and to solve the dimensionality
problem a feasible alternative is to match on a function of X. Given the random
treatment level assumption, the conditional independence assumption (Assumption 1)
remains valid if controlling for the propensity score p(x) = P (Di = 1|Xi ), that is the
propensity to participate given the set of characteristics X, instead of X:
Y (0) ⊥ D | p (x).
4.4. THE RANDOM TREATMENT LEVEL CASE
85
As mentioned before, the relevant assignment rule is only the one that defines the
treatment versus the non treatment status: once obtained the treatment, the levels
are independent of any variables and the level effects can be simply identified by the
difference between participants and non participants.
As in the binary case, when using p(x), the comparison group for each treated individual is chosen with a pre-defined criteria of proximity between the propensity scores
for each treated and the controls. The next step is that of choosing the appropriate
weights to associate the selected set of non treated observations. The possibility are
the same as in the binary case: nearest neighbor matching, kernel matching, radius
matching, etc... Different weighting schemes define different estimators, but the form
of the matching estimators is again given by α̂(ti ) in (4.3).
The dimensionality problem, the choice of the measure of distance and of the matching algorithm are discussed deeper in the section 4.7, considering also the more general
case of non random treatment level selection.
4.4.2
Test for the independence assumption
The estimator presented above relies on the random hypothesis on the treatment level
variable. Then, a relevant issue regards a way to verify its validity. If the assumption is
satisfied, it is implied, that T will be independent of any observable variables X. That
is, observations that receive any positive level of treatment must have the same distribution of observable (and unobservable) characteristics. This is what is commonly called
covariate balance. In other words, treated units should be on average observationally
identical.
There are many issue involved with choosing appropriate tests. However, the focus
here is not on this issue and what is proposed here has not anything to add to this vast
literature.
In order to verify the independence assumption two solutions are proposed: the
first one tests that the means of each characteristic do not differ between treated units
at different levels. The second solution tests the absence of correlation between each
characteristic and the treatment variable.
The first solution might be implemented computing the following steps:
1. split the sample in k equally spaced intervals of the treatment level, for t > 0.
2. for each characteristic test that the means do not differ among the different intervals.
86
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
3. if the means of one or more characteristics differ, the independence assumption is
not satisfied.
As regards the first point, the number of intervals k has to be determined by the
user after an accurate evaluation on the size and on the support of the sample that
is available. A possible solution might be the one of repeating the tests for different
partitions of the full interval T + . If the treatment level assignment is really random any
partition should bring to not reject the hypothesis that the means of each characteristic
are equal among the intervals.
To test the null hypothesis
H0 : µ1 = µ2 =, · · · , = µj =, · · · , = µk
where µj , is the mean of a characteristic X for the j-th intervals, versus the alternative
hypothesis that at least one of the equality does not hold, a simple analysis of variance
can be computed, in order to test for differences in means. This test must be repeated
for each interval for the treatment level. To be more complete, because it should
be important not simply to test for differences of means some test of the equality of
distributions, or some graphical summaries such as quantile-quantile plots that compare
the empirical distribution of each covariate, might be performed.
In any case, if the null hypothesis is rejected, for one or more variable X, it follows
that, for these characteristics, at least one mean is statistically different from the others,
computed for each interval. This implies that units treated at different levels are not
on average observationally identical and that the independence assumption between the
treatment variable T and the set of observable characteristics X is violated.
Another solution to test the independence assumption is to specify a relation between the treatment variable T and the observable regressor X in order to verify the
absence of correlation between the variables. This solution might be implemented computing the following steps:
1. for each X regress the treatment variable T on this covariate X
2. test the coefficient of the regression
3. if at least one coefficient is different from zero, the independence assumption does
not hold
4. otherwise specify more complicated models, such as non linear models and test
again the coefficient of the covariate X
4.5. THE NON-RANDOM TREATMENT LEVEL CASE
87
As regards the first point, the linear regression between T and each X can be written
as:
T = β0 + β1 X + ε
Then, the null hypothesis to test is H0 : β1 = 0. It can be computed a simple t-statistic
for the coefficient β1 . If this hypothesis is rejected for at least one covariate, it is implied
that a linear relation exists between the treatment level and this regressor. This means
the independence assumption does not hold and the treatment level assignment is not
random. On the other side, if the null hypothesis is not rejected for all covariates, it
does not necessary mean that the independence assumption holds, but only that the
treatment variable is not linearly dependent from the covariate X. That is why what
is suggested is to specify other models that contains higher order relations or interactions between different regressors X. For each specification, just test the statistical
significance of the coefficient of the X.
Whatever solution is chosen to verify the random hypothesis, if the results reveal
that it might not hold, the estimator presented above could not be more appropriate,
because it does not consider the differences among units treated at different levels.
This means that an accurate evaluation of the random hypothesis on the treatment
level assignment rule might constitute a starting point for a complete analysis on the
treatment effects in a case of continuous doses. Once tested this assumption might not
hold, it has to be taken into account that there should be some bias associated with
differences in the treatment level. The next section will deal with this issue and will
extend the identification and estimation of causal treatment effects in the case of non
random level assignment.
4.5
The non-random treatment level case
As mentioned before, in a case of no random treatment level assignment the selection
process might be split in two parts. The first process, that is the participation decision,
might be represented, as in the previous case, by the following rule:
Ii = h(wi ) + vi
88
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
where W is again a set of variable, V is the error term and
(
di =
1 if Ii > 0
0 otherwise
where D is the dummy variable indicating the treatment status. Given a positive value
for the index function I, the next process deals with the identification of treatment
levels that are not more equally likely among treated units. Selection into the different
levels of treatment, among participants, might be represented by:
ti = g(Zi ) + ui
where T is the treatment level variable, Z is a set of variables and U is an error term.
As specified before, the full selection process might be written as:
(
ti =
g(Zi ) + ui if di = 1
0
where
otherwise
(
di =
1 if Ii > 0
0 otherwise
and
Ii = h(wi ) + vi
This framework follows the traditional literature on selection into treatment in the
program evaluation setting: it implies a linear additively separable relation between
the sets of covariates W and Z and the error terms V and U respectively. The main
advantage of this framework is that the sets W and Z can contain different variables:
this would mean allowing different factors to influence the two processes. This might
be not so rare in practise.
It is worth to remember that the first process involves all individuals, participants
and not, while the level assignment rule will be specified only for treated units, that is
what the previous literature on this topic concentrates on.
4.6. A NEW APPROACH: THE USE OF A MATCHING PROCEDURE
4.6
89
A new approach: the use of a matching procedure
Among methods used in literature to asses public policy interventions the one that
recently has received more attention is the matching method. As before in the random
level case, the main idea here, is to develop this traditional method in order to take into
account the treatment level variable and to estimate the new parameter of interest.
The proposed estimators will estimate the ATLE, considering the selection process
issue stated above, in two different ways:
• structured form of the selection process;
• non-structured form of the selection process.
The first case represents a situation where the selection process is partly known. To be
more precise, in this case the evaluator has some information on the selection process. It
might refers to one of the two components of the selection rule (participation decision
and treatment level assignment), or both. It might be the case, for example, of a
selection rule determined by some threshold of a ranking, or of a situation where some
constrains on variables are imposed by some authority. This not happen in the second
situation, where the selection process is a function only of the individuals’ choices and
the evaluator has not further information governing the assignment to treatment. If we
are in a situation were some aspects of the selection criteria are known, why does not
use them and try to estimate and predict the participation rule in a more detail and
efficient way?
In particular, this distinction might be relevant when a matching approach is used.
Consider, for example, the case where some constrains on the treatment level assignment are imposed: using a unique function of the observable variables summarizing the
selection processes, even if an ignorability assumption is satisfied, might bring into a
situation where some treated units are paired with control units with different values on
the variables governing the treatment level assignment, that might be impossible given
the imposed restrictions. In this case, what might be done is to better specifies the two
selection processes in order to reduce these “unlikely” matchings.
These two cases will be kept separated throughout the chapter and the results of
their implementation will be compared in the last chapter, where the empirical application will be presented.
90
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
4.6.1
Structured form of the selection process
The idea is to split the selection process in order to better predict it and to take into
account relations between these two processes, remembering also that some covariates
might affect each process, but with a different weight. The idea is to consider these
differences in order to avoid matching between units with similar values in the variables
that specify one process but different values in the others. The first step will identify
the participation decision and units will be matched on the basis of similar value of a
set of covariates. Among matched units in the first step, another matching procedure
will pair units with similar value in the second process that identifies the treatment
level. Let consider separately the two matching procedures.
The participation decision
The participation decision process and the subsequent matching procedure might be
considered not so different with respect to the random level case: the aim is again to
eliminate the bias associated with differences in the observable and unobservable components between the group of the treated and untreated units by combining individuals
of the two groups.
The main assumptions this procedure relies on are equal to the hypothesis needed
for the random level case. This comes from the fact that the selection process is split in
two components and this first one is identical to the previous case. Only two relevant
treatment status are considered: the non participants, that is D = 0, and the participants, that is D = 1, that means ti > 0. Then, the main assumptions of this approach
are the same as before:
Assumption 1: Conditional independence
Y (0) ⊥ D | W
Assumption 2: Common support
0 < P (di = 1) < 1
As before, the second assumption ensures each participant can be reproduced among
the non-participants. The first assumption has again the implication that non-treated
observations Yi (0) paired with each treated observation Yi (ti ), with the same Wrealization, constitutes the correct counterfactual. However, there is a fundamental
4.6. A NEW APPROACH: THE USE OF A MATCHING PROCEDURE
91
difference with respect to the previous case: for each treated observation Yi (ti ), it is
preferable that one looks for more than one non-treated observations Yi (0) with the
same W-realization. This is necessary, because the matching solution proposed is a
two-steps procedure. To be more precise, it is a sequential matching: for each treated
observation Yi (ti ), the set of non-treated observations Yi (0) paired in this first step will
constitute the non treated group to be used in the second step of the matching. This
second step will match units with a similar Z-realization, where the set Z describes the
treatment variable T . If in the first matching a one-to-one method is chosen, that is
one looks only for one non-treated observation, the next control group for the further
step, for each treated units, will be constituted only of one non-treated observation.
Then, if there are no similar control units to match with, these treated units have to be
excluded from the analysis. Instead, if one looks for more than one control observation
for each treated units, it will be possible to compute the second matching, among the
units chosen in the first step.
This obviously has some implications in the choice of the algorithm that assigns units
to matched sets: what has to be used is a method that pairs more than one control
unit to a single treated unit. However, this is not a strong limitation: in literature
there is a wide choice among this kind of procedures. The most popular ones are the
nearest neighbor one-to-k algorithm, the radius matching, the full matching and the
kernel matching. However, each solution has to be properly evaluated with respect to
the contest the analysis refers to.
To be more specific, let consider the nearest neighbor one-to-k matching: by this
algorithm, for each treated unit are selected the k most similar units among the group of
the non participants. Imposing the number k of controls to be matched might bring to
use control observations that are not so similar from the treated paired unit, especially
if k is big with respect to the available sample size. A possible solution to this problem
can be to use a full matching algorithm: by this method each matched set consists
of either a treated unit with at least one control or a control unit with one or more
treated units. Obviously, the first proposal is the one of interest in this work. The
main advantage of this algorithm is that it is not required every treated unit to have
the same number of control. On the other hand, there could be some treated units with
few or even only one control matched unit. This constitutes the main drawback of that
procedure: in order to apply the two steps matching algorithm proposed by this work
it is necessary that treated units matched at the first step have more than one paired
control units, otherwise these units have to be excluded from the analysis.
The radius matching might be a valid proposal: this algorithm admits for each
92
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
treated unit to be matched with more than one excluded unit. In particular, the matches
are made only if the distance between treated and control unit is lower a predetermined
level. The main disadvantage of this procedure is that the treated unit is bypassed
and no match is made to this individual if there are no excluded units within the fixed
radius. Moreover there could be the case again of few paired control units for each
participant. However, different size of the radius might be chosen and compared.
The solution proposed by the kernel matching might be in principle the more suitable
for this work: by this algorithm the entire comparison sample is used. It is a smooth
method that reuses and weights the comparison group sample differently for each treated
unit with different covariates. The weight given to non treated unit is in proportion to
the closeness between excluded and treated unit. The main advantage of this method
applied to the proposed analysis is that there is no lost information because the entire
control sample is used either in the first and in the second step matching. However,
this algorithm might be not so suitable in this application: we do not want to pair all
units, because some matching would be impossible, given the imposed restrictions of
the selection process.
These brief considerations about the choice of the algorithm that identifies the most
similar comparison units reveals that an accurate evaluation has to be carried out in
order to consider all the limits and advantages of each method, given the available
information. It is clear that there are some trade-off that should guide the choice. For
example in the nearest-neighbor one-to-k matching there is a trade-off between precision
and the number k of control units to match: the higher is k, the higher will be the size
of the control sample for the second step matching. On the other hand an high value of
k might raise the precision of the matching in term of closeness between units. For the
radius matching a serious problem might regard the choice of the width of the radius.
Thus, all these aspects have to be considered and possibly different solutions may be
computed and compared.
Independently from the algorithm, it is important to remember that the control
observations matched with each participant will constitute the control sample for this
unit in the further step of the matching. This implies a different control sample for
each treated individual i in the second step procedure and an accurate evaluation on
the weights to attribute at each paired unit. The new control sample selected for the
i-th unit, will be denoted by Mi , while M = ∪Mi denotes the full sub-sample of selected
non participants.
4.6. A NEW APPROACH: THE USE OF A MATCHING PROCEDURE
93
The treatment level assignment
The treatment level assignment process in the case of non random levels might induce
some bias associated with differences in the observable and unobservable components
between treated units at different levels. The focus of an accurate analysis is to rule out
these sources of bias, given the available information. Also if there are some imposed
restrictions, selection into different treatment levels is in part influenced and determined
by treated individuals. This kind of information are often not available from the researcher that conducts the analysis and this might bring some bias in the estimation of
the effects.
The aim is again to eliminate this source of bias following a matching approach, on
the basis of the matching procedure applied in the previous step between treated and
non treated units. The idea is again to constructs the correct sample counterpart for the
missing information on the treated outcome at different levels had they not been treated
by paring each participant with member of non-treated group. However, in this case, the
control group will be composed by the non participants selected with the first selection
process, M. Then, the key assumption is that, conditional on a set of observables Z,
treated at the level t and non-treated individuals are comparable in what respect to the
outcome in the non-treatment case. This implies an independence assumption between
the outcome of the selected non participants and the level of treatment conditional on
the set Z. To be more precise, the main assumption is:
Assumption 3: Conditional independence
Y (0)0 ⊥ T | Z
where Y (0)0 is the outcome of the non-treated units that belong to the sub-sample M,
that are the most similar in terms of the variables W identifying the first selection
process. This means it is no necessary to specify again the independence assumption of
the outcome variable and the treatment conditional on the set of variable W, because
it follows directly from assumption 1 (4.2). Moreover, it is not required independence
of the outcomes of the entire control sample C, but only on the selected sub-sample.
In some sense, it may be considered as a weak form of independence if compared to
the stronger assumption of independence between the outcome of the full set of nontreated outcomes Y (0). However, if the matching algorithm chosen in the first step uses
the entire comparison sample C, such as the kernel matching, the two assumption are
equivalent. It is not necessary to assume independence between the treatment level and
94
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
the potential outcomes for the treated individuals Y (T ) if the parameter of interest is
the ATLE.
Assumption 3 implies that it can be written:
E[Y (0)0 ] = EZ [E[Y (0)0 |Z]]
= EZ [E[Y (0)0 |T = 0, Z]]
(4.4)
= EZ [E[Y (0)0 |T = t > 0, Z]]
where the independence assumption is used in the second and third equality. Hence, it
is possible to express the counterfactual of interest as a function of the observed data.
Although the conditional independence assumption identifies the conditional potential outcome E[Y (0)0 |T = t > 0, Z] through observation of non participants, this
identification holds only for all z for which there is a positive probability that participants at the level t are observed with characteristic z. That is, we need to ensure
that each treated observation can be reproduced among the non-treated. The second
matching assumption is therefore a common support assumption. In particular in this
case, it is necessary that all participants treated at the level t have a counterpart on the
population of the non treated units Mi selected by the first matching algorithm. Then,
the common support assumption can be written as
Assumption 4: Common support
0 < P (ti = t|Zi ) < 1,
∀t∈T+
with i ∈ Mi and P (ti = t|Zi ) is the probability that an individual with characteristics
z is treated at the level t.
Assumptions 3 and 4 allow to apply the second step of the matching procedure proposed and to estimate the ATLE: participants and non participants are now comparable
in the sense that the only difference between the two groups is program participation.
The two-steps matching estimator
Before proceeding with the definition of the matching estimator, we would like to clearly
state how the parameter of interest ATLE is identified, given the assumptions discussed
above.
The first set of assumptions, relative to the participation decision process, allow
to identify and to estimate the first counterfactual of interest Y (0)0 , that is the set of
4.6. A NEW APPROACH: THE USE OF A MATCHING PROCEDURE
95
outcomes of the non-treated units if they were treated. Let S represent the common
support of W, that is the subspace of the distribution of W that is both represented
among the treated and the control groups, assumption 2 implies that S is the whole
domain of W. Together with assumption 1 it can be identified the set of potential
conditional outcomes by:
(Y (0)|D = 1, W ∈ S) = (Y (0)|D = 0, W ∈ S) = (Y (0)|W ∈ S)
The second term of the equality can be estimated from observed data by the outcome
values of the non-treated units adjusted for the values of the W variables. For each
treated unit i may be found the most similar set of non treated unit in term of W in
order to estimate the correct counterfactual for each participant:
Yiˆ(0) =
X
wij Yj (0)
j∈C
where C represents the comparison group and wij is the weight placed on comparison
observation j for individual i. Then, the estimator for Y (0)0 is simply given by:
ˆ 0=
Y (0)
[
Yiˆ(0)
i∈T
where T represents the treatment group.
The second set of assumptions, which regards the treatment level assignment, allows
to identify the parameter of interest ATLE. In particular, given assumptions 3 and 4
together with the equation (4.4) it can be written:
E[Y (T ) − Y (0)0 |T = t] = E[Y (T )|T = t] − E[Y (0)0 |T = t]
= E[Y (T )|T = t] − Ez {E[Y (0)0 |Z, T = t]|T = t}
= E[Y (T )|T = t] − Ez {E[Y (0)0 |Z, T = 0]|T = t}
Z
= E[Y (T )|T = t] − E[Y (0)0 |Z, T = 0]fZ|T =t (z)dz
(4.5)
where fZ|T =t (z) denotes the density of Z among the participants at the level t. The
former term is identified by the sample mean outcome of the participants at the t
dose, and the latter term can be estimated by adjusting the average outcomes in the
no treatment group for the distribution of Z among participants at the level t and by
96
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
ˆ 0 identified by the first matching estimator.
replacing Y (0)0 with its estimator Y (0)
Although the conditional independence assumption identifies the conditional potential outcomes E[Y (T |Z)] through observations on participants at the level t, this
identification holds only for all z for which there is a positive probability that treated
units at the dose t are observed with characteristic z. The common support assumption
deals with this issue: let St represents the common support of Z for the t-th level, that
is, the subspace of the distribution of Z that is both represented among the treated at
t and the control groups. Under assumption 4, St is the whole domain of Z for all t in
T.
After a proper identification strategy has been established, the parameter of interest
can be estimated. A class of matching estimators of equation (4.5) is obtained by
replacing the distribution fZ|T =t by its empirical distribution function in the sample
of treated units and estimating nonparametrically the conditional expectation function
E[Y (0)|Z = z, T = 0] from the non participants sample. Then, a general form for the
proposed two-step matching estimator is given by:


α̂(ti ) = E yi (ti ) −
X
mij yj (0)
(4.6)
j∈C
where C represents again the comparison group, the expected value is among units at
the same treatment level ti and mij is the new weight placed on comparison observations
j for individual i that comes from the two matching procedures. In particular,
mij = P
1 w2
wij
ij
j∈C
1 w2
wij
ij
(4.7)
1 and w 2 are the weight placed on comparison observations j for individual i in
where wij
ij
the first and second matching respectively. It is easy to understand that if one express
the outcome variable as first difference, before and after the program, this matching
estimator may be extended into a matching diff-in-diff estimator, as it will be in the
empirical application.
4.6.2
Non-structured form of the selection process
Where there are no information about some aspects of the selection process or any
particular restrictions or constrains are imposed to some selection variables, what might
4.6. A NEW APPROACH: THE USE OF A MATCHING PROCEDURE
97
be done is to follow that part of the literature that deals with continuous treatment
adopting the so called generalized propensity score approach (Hirano and Imbens (2004)
and Imai and van Dyk (2004)). The idea is to to use a unique function of the observable
covariates X = (W, Z). If an appropriate specification for the propensity score is found,
such that the balancing property is satisfied, and given a strong ignorability assumption,
it might be sufficient to pair units with respect to the values of this propensity function.
The differences with the strategies proposed in literature mainly refer to the parameters
of interest. Here the concern is on the differences between treated and non treated units,
that is on the comparison of some outcome variables in the presence of an intervention
(at a particular level of treatment) to the same variables if the program did not exists
at all, and not if alternative levels of treatment were applied.
To be more precise the underlying assumptions and the matching estimator will be
clearly defined. However, it is a case not so different from a binary treatment setting;
the main difference is that here, the set of observable covariates is composed of a higher
number of variables. It is worth to note that the sets W and Z might have some
common variables. It is clear that the set X will contain all these common variables
together with the specific covariates of each set.
Then, the main assumptions of this approach might be written as:
Assumption 1: Conditional independence
Y (0) ⊥ D, T | X
Assumption 2: Common support
0 < P (di = 1) < 1
These assumptions imply the existence of a set of variables such that, conditioning
on them, the counterfactual outcome distribution of the participants is the same as
the observed outcome distribution of the non-participant. Once tested this balancing
property, it is possible to find for each treated observation Y 1 a non treated (set of)
observation(s) Y 0 with the same X realization.
As in a binary treatment framework, given the common support, to obtain a measure of the treatment effect on the treated, individual gains from the program among
the subset of participants who are sampled and for whom one can find a comparable
non-participant, must be integrated over the distribution of observables among treated
units and re-scaled by the measure of the common support. The only difference with
98
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
respect to the binary case is that the effects have to be identified and estimated at each
treatment level value for which one is interested in. Thus, the matching estimator for
the ATLE at the t-th level is the empirical counterpart of
α(t) = E[Y (T ) − Y (0)|X ∈ S, T = t]
R
E[Y (T ) − Y (0)|X, T = t] dF (X|D = 1)
R
= S
S dF (X|D = 1)
where S ∗ is the common support. This result represents the expected value of the
program impact at the treatment level t: it is the simple mean difference in outcomes
over the common support S ∗ , weighted by the distribution of participants.
The matching estimator
In practise, to construct matches, it is needed again a measure of distance between the
units with respect to the X, in order to define the units in the comparison sample who
are neighbors to each treated units i. Then, the general form of the matching estimator
is given by

α̂(ti ) = E yi (ti ) −

X
aij yj (0)
(4.8)
j∈C
where C represents the comparison group, aij is the weight placed on comparison observation j for individual i and ti stands for the observed level of treatment for the i-th
unit.
4.7
Reduction of the multidimensionality
No considerations has been done yet on the distance measure to use in the matching
procedures. This is not an irrelevant issue, especially if the set of covariates has a large
dimension. This is what is commonly known as the dimensionality problem. Moreover,
it is worth to note that these considerations might be applied both in the case of nonstructured and structured form of the selection process. In particular, in the simpler
first case the multidimensional set to consider regards the variables X determining the
two selection processes together. In the case of a more structured specification form for
the selection process the dimensionality problem regards the two set of covariates W
and Z.
4.7. REDUCTION OF THE MULTIDIMENSIONALITY
99
The most common method of multivariate matching is based on Mahalanobis distance: it simply involves the estimation of the variance and covariance matrix of the set
of regressors but it has serious limitation when the number of regressor is large. If the
set of covariates consists of more than one continuous variable, multivariate matching
√
estimates contain a bias term which does not asymptotically go to zero at n. (Abadie
and Imbens (2006)). A more feasible alternative is to match on a function of the sets
of regressors to solve the problem of an high dimension vector of covariates.
4.7.1
Structured selection process
In this case, let consider separately the two selection processes and therefore the two
sets of regressors. In the first step of the proposed procedure, that deals with the
participation decision process, Rosenbaum and Rubin (1983) show that the conditional
independence assumption (Assumption 1) remains valid if controlling for the propensity
score p(w) = P (Di = 1|Wi ) instead of W:
Y (0) ⊥ D | p (W ).
The matters regarding the choice of a pre-defined criteria of proximity between the
propensity scores for each treated and the controls and of the weights to associate
for each participant have been discussed in the previous section 4.6.1 relative to the
participation decision. The main concern regards the choice of an algorithm that assigns
more than one control unit to a single treated unit because of the sequential structure
of the proposed matching procedure.
As regards the second selection process about the treatment level assignment, the
idea is to extend and generalize the propensity score method applying it to a continuous
treatment regime in order to reduce the dimension of the covariates set Z on which the
matching algorithm is carried out. Following the work of Imai and van Dyk (2004) what
is proposed here is to find an appropriate (“propensity”) function of the variables Z to
model the treatment variable T . As stated by Imai and van Dyk, the propensity function
is the conditional probability of the treatment given observed covariates Z. Let θ(Z) =
E(T |Z) the parameter that uniquely represents the propensity function they show that,
given the propensity function, the conditional distribution of the actual treatment does
not depend on observed covariates, that is the balancing score property, and that the
conditional independence assumption holds with Z replaced by the propensity function.
The parameter θ completely determines the propensity function. Then, matching on
100
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
the propensity function can be easily accomplished by matching on θ, regardless of the
dimension of Z. That is:
Y (0)0 ⊥ T | θ (Z).
The difference with the matching strategy proposed by Lu et al. (2001) regards the
groups on which the pairs are accomplished. In particular, while Lu et al. (2001) suggest
matching pairs among the group of treated units on the basis of similar values of θ but
dissimilar values of the treatment dose, here the pairs are performed among the group
of treated versus non treated group. Hence, the distance measure to use does not need
anymore to consider for the levels. This is consistent with the idea of Lu et al. (2001):
the pairs are computed among units with different treatment doses, that is treatment
versus no treatment.
As in the binary case, given an univariate function of the observable variables, the
comparison group for each treated individual is chosen with a pre-defined criteria of
proximity between values of these functions for each treated and the controls. This
criteria will also define the appropriate weights to associate the selected set of non
treated observations for each participant. The possibility are the same as in the binary
case: nearest neighbor matching, kernel matching, radius matching, etc..., with the
difference that here the matching is performed two times. More details are given above
in section 4.6.1. Again, different weighting schemes define different estimators, but the
form of the matching estimators is again given by α̂(ti ) in (4.6).
Relation between the two processes
It is not so unlikely to think that the two selection processes are in some sense related. In
particular, it is reasonable to argue that the treatment level assignment might depend on
the first participation decision rule. Consider, for example, the case of capital subsidies
to firms, that represents the empirical application of next chapters: given the auction
mechanism, higher is the probability of receiving the subsidy, lower is the amount of
requested subsidies. On the contrary, there could be cases of positive relation between
the two processes: higher the probability of being treated, higher the level.
To better identify and predict the selection process this relations might be taken
into consideration and this could be done in different ways. What is proposed here is
to condition the treatment level assignment to the participation decision rule. Thus,
the function of the observable variables Z, used to model the treatment variable T , is
4.7. REDUCTION OF THE MULTIDIMENSIONALITY
101
conditioned to the propensity score. To be precise,
θ(Z) = E(T |Z, p(W ))
becomes the new parameter that uniquely represents the propensity function modelling
the treatment variable T .
4.7.2
Non-structured selection process
This case is similar to the binary treatment setting: to improve the implementation of
matching and to solve the dimensionality problem, a feasible alternative is to match on
a function of X. The conditional independence assumption remains valid if controlling
for the propensity score p(x) = P (Di = 1|Xi ).
As in the standard matching, when using p(x), the comparison group for each treated
individual is chosen with a pre-defined criteria of proximity between the propensity
scores for each treated and the controls. Once defined the neighborhood for each units
the second choice regards the appropriate weights to associate each selected individual
of the control group with the treated one. The several solutions are again the same
discussed above: from weight equal to one to the nearest observation and zero for the
others, to equal weights to all, otherwise, kernel weights.
102
CHAPTER 4. A NEW APPROACH TO EMPIRICAL ESTIMATION
Chapter 5
The law 488/1992 in Italy
5.1
Introduction
The purpose of this chapter and the next one is to provide a statistically robust evaluation of the impact of subsidies allocated by law 488/92 in Italy on subsidized firm, using
the proposed nonparametric approach. We evaluate how the receipt of different level of
financial assistance from public funds actually makes a difference to firm performance
in terms of investment, new employment, profit and labor productivity.
In particular, in the first part of the chapter a detailed description of the law and of
its selection mechanism will be presented, while the rest of the chapter will deal with
the description of the data.
There are few studies concerning ex-post evaluation of the impact of L488. Furthermore, in the estimation of the counterfactual all the previous papers consider the 488
as a case of a binary treatment, and do not exploit the richness of the data set that
includes information on the level of the subsidy by firm. We use the paper of Pellegrini
and Centra (2006) to confront the results in the case of binary treatment approach with
respect to the case of continuous treatment approach.
5.2
Capital subsidies in Italy
State aids to manufacturing and to service sectors, in the form of grants and subsidies,
have been for many years a key component of regional policy in less developed Italian
areas, such as Mezzogiorno. The use of such policy instruments has been aimed at
103
104
CHAPTER 5. THE LAW 488/1992 IN ITALY
influencing the regional allocation of investments and employment, in order to increase
competitiveness, self-sustaining growth, and new employment in low income regions.
Since post-war period, several policy instruments have been implemented to overcame the gap between southern regions and the most developed areas located in NorthCenter of Italy. Among the different policy instruments, the capital subsidies have been
the most employed, either in the form of grants or reductions in the cost of borrowing. In the last two decades they became the core of the regional policy for the South.
The rationale for such interventions has to be found on the need to fill the gap in the
availability of private capital and, consequently, to increase productivity and production capacity. Subsidies were perceived as a “compensation” for the productivity gap
of the disadvantages areas, due to the limited availability of public infrastructure and
negative territorial externalities. As a consequence, the subsidized projects have never
been selected on the base of economic parameters and criteria: all investing firms could
participate and benefit, the only constrain was the availability of financial resources.
This approach has maximized the number of subsidized investment projects, but it also
has negative effects on policy additionality and efficiency Pellegrini (1999).
The end of the “extraordinary intervention”, the reduction in the financial resources,
and the new approach of regional policy, more oriented to efficiency, competitiveness
and self-sustaining growth imposed a radical change in the policy instrument to offer
financial support to private investments in Mezzogiorno. This long phase of uncertainty
ended with the design and the implementation of a new regional policy, named “new
programming” (Barca and Pellegrini (2002)). In this policy framework a new instrument
to subsidize investment in the so-called disadvantaged areas has been implemented: the
L488/1992, founding private capital accumulation by project-related capital grants. The
law allocates subsidies through a “rationing” system based on an auction mechanism
which guarantees compatibility of demand and supply of incentives.
L488/92 has been in the last ten years the main policy instrument to encourage
private investments. From 1996 (the first operative year) thought 2005, this law has
sustained about 40,000 projects with over 20 billions of Euros of subsidies, whereas
investments have added up to over 70 billions of Euros, 70% of which in the South. The
expected additional employment from these investments amounts to about 560,000 new
units. After ten operative years, and in view of the extent of spending on L488, it is
reasonable to investigate if the law made a difference (or not) to the industrial structure
of Mezzogiorno in term of growth, employment and productive efficiency.
The literature aiming at an evaluation of the impact of subsidies on firm behavior is
now extensive. It is generally accepted that regional capital incentives induce additional
5.2. CAPITAL SUBSIDIES IN ITALY
105
investment (Faini and Schiantarelli (1987), Harris (1991), Daly et al. (1993) and Schalk
and Untied (2002)), even if they can have unpleasant effects on income inequality across
different areas (Dupont and Martin (2003)). Besides, they have some effect in attracting plants to low income areas (Faini and Schiantarelli (1987) and Midelfart-Knarvik
and Overman (2002)). The employment impact of capital subsidies is more doubtful:
the question is if the size of substitution effect, associated to the reduction in the user
cost of capital relative to the labor cost, is larger or smaller than the output effect,
related to the increase in production (and therefore in local labor demand), due to the
reduction in total costs and to the attraction of new investment in the area (Schalk
and Untied (2002)). Several studies found that the substitution effect outweighs the
output effect (Driehuis and van den Noord (1998), Harris (1991) and Gabe and Kraybill (2002)), others found the opposite (Wren and Waterson (1991), Daly et al. (1993),
Schalk and Untied (2002) and Rooper and Hewit-Dundas (2001)). Few studies evaluate if the receipt of financial assistance from public funds actually makes a difference
to firm performance in terms of improved plant efficiency or productivity. Increase
in investment, both in additional productive capacity and in replacement investment,
modernizes the firm stock of equipment and results in higher efficiency and productivity. On the other side, capital subsidies can also have potential negative effects on
productivity, by increasing allocation inefficiencies if lower relative capital costs leads
firms to overinvest in capital, and by encouraging rent-seeking behavior by firms for
subsidies (Harris and Trainor (2005)). All the studies show that the effects of subsidies
on efficiency and productivity are negligible or negative (Lee (1996), Bergstrom (1998)
and Harris and Trainor (2005)).
There are few studies concerning ex-post evaluation of the impact of L488. A positive effect of L488 on investment is found in Ministero dell’Industria (2000) and in
Bronzini and de Blasio (2006). Carlucci and Pellegrini (2003) and Carlucci and Pellegrini (2005) present empirical evidence of a positive employment effect. Pellegrini and
Centra (2006) also on turnover but not on productivity. Bronzini and de Blasio (2006)
indicate the presence of (moderate) inter-temporal substitution: financed firms slowdown significantly their investment activity in the years following the program. In the
estimation of the counterfactual all the previous papers exploit the auction mechanism
that 488 uses to allocate the subsidies across firms. The group of subsidized firms are
compared with the group of firms that applied for the incentives but were non financed,
since they score low in the ranking. These non financed firms, then, are especially eligible to be part of a control group, as they show a propensity for investment and a need
106
CHAPTER 5. THE LAW 488/1992 IN ITALY
to invest which is very similar to that of subsidized ones. As suggested by different authors (Brown M. A. and Elliott (1995), Carlucci and Pellegrini (2003) and Bronzini and
de Blasio (2006)), the rejected application group is very similar to the treatment group
in terms of its characteristics, and allows to isolate the effects of policy intervention.
Moreover, all the papers consider the 488 as a case of a binary treatment, and do not
exploit the richness of the data set that includes information on the level of the subsidy
by firm. Therefore, in this work, we also check if the previous results are confirmed
using a continuous treatment estimator. We will use the paper of Pellegrini and Centra
(2006) to confront the results in the case of binary treatment approach. Moreover, the
application includes the estimation of the entire function of average treatment effects
overall possible values of the treatment levels. This function is estimated by adopting
parametric and non parametric methods, using a flexible functional form. Therefore we
can determine the fraction of treatment effect heterogeneity that can be attributed to
different levels of treatment.
5.3
The Law 488/92
The law operates in the Mezzogiorno and in all so-called disadvantaged areas; these areas are either designed as Objective 1, 2 or 5b for the purpose of EU Structural Funds
or subject to exemptions from the ban on state subsidies. The Law488 auctions are
run on a yearly basis and take the form of a project-related capital grants. Eligible
for assistance are manufacturing and extractive firms; starting from 2001, the L488
scheme has been extended though separate auctions to the tourism and transport sectors. Investment qualified for the intervention by the L488 are: setting-up, extension,
modernization, restructuring, reactivation and relocation.
Three main features of L488 are important for the evaluation analysis:
• the L488 makes clear the targets of the policy intervention;
• the selection mechanism of L488 identifies projects that are viable but cannot be
subsidized due to funds shortage;
• L488 operates at a regional level.
First of all, L488 is basically a national tender for incentives where the automatic
allocation is based on general criteria expressing the policy preference. For a detailed
description of Law L488 see, among others, Bronzini and de Blasio (2006), Carlucci and
Pellegrini (2003) and Carlucci and Pellegrini (2005).
5.3. THE LAW 488/92
107
Incentives are allocated on the basis of regional competitive auctions. In each auction the investment projects are ranked on the basis of these five pre-determined criteria:
1. quota of owner capital invested in the project;
2. number of new employees per unit of investment;
3. ratio between the subsidy requested by the firm and the higher subsidy applicable,
given the rules determined by area by the EU Commission;
4. a score related to the priorities of the region in relation to location, project type
and sector;
5. a score related to the environmental impact of the project.
The criteria 4 and 5 were introduced starting from the 1998 (3rd auction). The five
criteria carry equal weight: the values related to each criteria are normalized, standardized and added up to produce a single score that determines the position of the project
in the regional ranking. The rankings are drawn up through the decreasing order of the
score awarded to each project and the subsidies are allocated to projects until funding
granted to each region is exhausted. There are also special rankings for large projects
and reserved lists for small and medium-sized firms.
The five indicators are a clear expression of the policymakers’ preferences. The share
of the own funds invested in the project can be considered an (imperfect) proxy of the
entrepreneur assessment of the project viability and success: higher the share, greater
the commitment of the owner to the project (Chiri and Pellegrini (1995)). Moreover, the
share is highly correlated to the economic and financial situation of the firm: the more
profitable firms choose to assign a higher share of own funds to the project (Parascandolo
and Pellegrini (2001)). Hence, the subsidized firms tend to be more profitable (and
more efficient) than the non subsidized ones. The number of new job per unit of total
investment is a central indicator, used to re-equilibrate the negative substitution effect
of the capital subsidy to the firm labor demand. The policy makers express a preference
for new projects and for labor-intensive investments. In order to increase the probability
to receive the subsidy, the firms can choose to overshoot the optimal (i.e. the efficient)
number of people to employ in the project.
5.3.1
The higher applicable subsidy
The amount of aid requested by firms, relative to ceilings established by the European Union, is the key indicator that transforms the allocation procedure to an auction
108
CHAPTER 5. THE LAW 488/1992 IN ITALY
mechanism. The maximum ratio of subsidies on the total investment that a firm may
request, depends on geographic areas and sizes. It can be easily summarized by Table
5.1, where S-M stands for small and medium size firms, while L stands for large firms.
The quantity ESL stands for Equivalente Sovvenzione Lorda: it represents the equiva-
Table 5.1. Intensity of the subsidies
lent gross subsidy, that is the nominal subside expressed as the ratio between the total
financed amount and the admitted investment value a part from the tax regime. ESN
stands for Equivalente Sovvenzione Netta, that is the net benefit of enterprises: with
respect to the ESL, it also consider the tax regime imposed to the firms.
The indicator aims to “reveal” the minimum amount of subsidy regarded by the
firm as indispensable for the project realization. Therefore the firm can influence the
likelihood of obtaining the incentive, self-reducing the “rent” granted by the subsidy, and
the policy makers maximize the number of subsidized investments given the financial
resources available, reducing the welfare losses due to a unique subsidy rate.
It is easy to understand that these thresholds affect the selection process, in particular the part of the selection process relative to the received level of treatment. To
5.3. THE LAW 488/92
109
be more clear consider a firm with a small dimension in the Campania: it never can
request more than 35% of the total investment. It means that, in the matching procedure, it cannot be paired with a control unit belonging, for example, to another area.
If an exact matching algorithm is chosen, this would be impossible. But if a propensity
score approach is implemented, with a propensity function containing all the variables
that affect the two selection processes, it might happen, also if the balancing property
is satisfied. Then, we are in a case where there are some imposed restrictions on the
treatment level assignment rule. As mentioned in the previous chapter, it might be the
case of splitting the selection process in order to separately consider the participation
decision and the treatment level assignment. In the following chapter these two approaches will be compared: a “full” propensity score function versus a selection process
function composed of two related models. In other words, a constrained treatment level
versus an unconstrained treatment level framework.
5.3.2
Why not a RDD approach?
Another interesting features of the auction mechanism is the presence of a set of firms
willing to invest and having a valid investment project, as checked by an preliminary
screening carried over by a set of appointed banks. They are admitted into the ranking,
but they did not receive any subsidies because their scores were too low in the ranking. Consequently, for every auction, an own regional ranking and threshold is carried
out. The presence of thresholds might invite to use a regression discontinuity design
approach: near the threshold an experimental case might be reproduced and treated
and non-treated units might be directly compared. But the presence of multiple ranks,
depending on auctions and regions, implies unknown predetermined thresholds and reduces the applicability of the method because of the low number of observations near
each cut-off point. However, this characteristic of L488 implies that firms, with the
same propensity to be financed, can be subsides or rejected from the auction on the
base of their regional ranking.
The mentioned aspects reveal themselves an important features supporting policy
evaluation analysis. The procedure uses the indicators as selection variables: therefore
the indicators can explain most part of the differences between the group of subsidized
and the group of non subsidized firms. This is of paramount importance for the construction of the counterfactual scenario in the evaluation analysis. Being the indicators
observable, we can reconstruct the selection process, estimating the selection effect in
110
CHAPTER 5. THE LAW 488/1992 IN ITALY
the control group. Moreover, the non subsidized firms are eligible to be part of a control group, as they show a propensity for investment and a need to invest which is very
similar to that of subsidized firms (Brown M. A. and Elliott (1995), Carlucci and Pellegrini (2003) and Bronzini and de Blasio (2006)). The Law regulation also imposes that
financing under L488 can not be combined with other source of public financing. Firms
applying for the subsidies by L488 have to give up to other public subsidies, reducing the
possibility of a double subsidization and the use of other public subsidies. The regional
ranking produced by L488 reveals itself another important feature in implementing the
evaluation analysis. Firms with the same level in the selection variables can be financed
or not depending on the regional threshold. As a consequence, different groups of financed and not financed firms with respect to different regional thresholds are provided
and, at the national level, an overlapping area of firms with the same propensity to
be subsidized or not is available. This makes possible to compare firms with the same
characteristics but with different selection results, as the matching evaluation technique
requires. Therefore L488 mechanism allows to reproduced the treatment group among
the non treated, re-establishing the experimental conditions in a non-experimental setting, and to construct the correct counterfactual for the evaluation analysis by matching
method.
5.4
The data implementation
The data used in the analysis are collected in two different surveys: the L. 488 administrative dataset and the AIDA, which contains the budgets delivered by a subset of
Italian firms to Chambers of Commerce. The integration among the different sets of
data has required a complex process of cleaning and merging.
First of all, the identification of the eligible projects is carried out. The financed
projects group (treated group) consists of all the “winning” (i.e. funded) projects in
Mezzogiorno (ob. 1 regions, excluding Abruzzo) according to the ranks of all regional
auctions. Projects are eligible for control group if are in the manufacturing sectors and if
are admitted to evaluation to the regional auctions but non financed (“losers”). Projects
that were funded in other auctions (special actions dedicated to Northern regions, or
to areas devastated by an earthquake or to tourism and retail sectors) or via special
regional ranks were discarded from control group. Projects applied for more than one
L. 488 auction and financed once time have also been excluded from the control group.
Particular attention was dedicated to the selection of the projects that did not
5.4. THE DATA IMPLEMENTATION
111
present anomalies and irregularities for the analysis: anomalous projects were intentionally discarded, in order to not affect the results by programs inherited from previous instruments or by projects still in progress. Financed projects, whose investment
program have not yet concluded, have been discarded. Both treated and control group
were subsequently cleaned out of the projects which year of conclusion (actual and
scheduled, respectively) has preceded the year of the auction.
1
Another group of dis-
carded projects is represented by the programs started (or scheduled to start) before
the year preceding the publication of the auction. Since their activation cannot be
directly linked to L. 488, these projects must be regarded as anomalous. Finally, all
the projects started (or scheduled to start) after 1999 have been discarded: the choice
has been motivated by the impossibility to evaluate the projects, being missed project
information after a sufficient temporal lag following its conclusion.
The third step regards the integration of L. 488 dataset with AIDA budgetary data.
A more detailed description of this dataset integration can be found in Appendix A.
After verifying that the cleaning and integration procedures do not have a different
impact on financed projects and control group, the attention focused on the final dataset
on which the evaluation model has been implemented. It consists of 665 financed
projects and 1.228 non financed projects for the years 1996-2000.
For the validation of the control group, a comparison of the main characteristics of
the projects and of the ranking indicators in the sample of subsidized and non subsidized
firms are presented. Similarities in the two sample referring to budgetary data, both for
year before the start of the project (year 0) and for the year following the conclusion
(year 1) of the investment program are analyzed. The results are presented in tables
5.2 and 5.32 .
As far as the distribution of projects is concerned, a substantial homogeneity between the two groups is found, according to region, firm dimension, economic activity
sector and investment typology. Some difference are found for what concerns large
firms, which represents 11,2% in the control group and 18,5% in the treatment group;
similarly the share of small enterprises turns out to be smaller in the financed projects
group (54,1% vs. 66,8%). Regarding economic activity sector, the most difference regards food industry which share is double in the control group (8,6% vs. 19,4%). The
distribution according to project typology does not introduce remarkable differences.
1
The incongruity derives from the fact that in the first two auctions (years 1996 and 1997) the L.
488 has inherited applications from the previous incentive instrument (law 64/1986), that was closed
in 1992.
2
Source: elaboration of Pellegrini and Centra (2006) on L488 and AIDA data. We have deflated
112
CHAPTER 5. THE LAW 488/1992 IN ITALY
The levels of the indicators, crucial parameters for the application of the model,
show a substantial homogeneity between the two groups. The analysis of the indicators
median values does non indicate strong differences between the two sample in the budgetary data. The subsidized firms are slightly greater, more profitable and more capital
intensive, as expected, than the non subsidized firms.
As regards the different time span of the auctions, the dataset contains information
on the years of the effective beginning and the end of the investment project. Exploiting
this information, we can estimate the impact of the subsidy confronting the balance
sheet the year before the project is really started and the year after the investment is
really concluded3 . The information is relevant, because the L. 488 procedure requires
neither the investment project to be actually started by the time of the first subsidy
instalment, either to be over after two years since the beginning (in this case, however,
also the payments of the following instalments are lagged). In the analysis, the time span
can differ for each project, depending on the beginning and the end of the investment. A
correct matching procedure requires to impute an ending date also for the not subsidized
firms. The adopted hypothesis is that the ending date is equal to the date scheduled
for the investment starting augmented by the average investment period by auction
calculated for the subsidized firms sample.
profitability and leverage variables by the investment deflator (base=2000)
3
This is the date corresponding to the end of the inspection carried out by the appointed bank.
Therefore, it can overestimate the length of the investment period by 6-12 months.
5.4. THE DATA IMPLEMENTATION
113
Table 5.2. Distribution of projects according to main characteristics (1996-2000)
114
CHAPTER 5. THE LAW 488/1992 IN ITALY
Table 5.3. Summary of main covariates in the final dataset (1996-2000)
Chapter 6
Application
6.1
Introduction
This part of the thesis show the results of the empirical application: the impact of
subsidies allocated by law 488/92 in Italy on subsidized firm are estimated using the
proposed nonparametric approach. The continuous treatment setting is appropriate in
the case of firm incentive programs, where firms receive different amounts of subsidies.
Even more, in this situation, it is not so unlikely to find non random treatment level
assignment. That is, we are away from an experimental data framework because there
is a non random selection process not only on the participation decision but also with
respect to the treatment level assignment. This means law 488 determines a deliberate
selection process: the subsequent selection bias problem have been tackled using the
proposed estimation method. The empirical findings are the core of this chapter.
6.2
The treatment variable: amount of subsidies
As we said time after time, the treatment variable plays a fundamental role in the
evaluation program in a continuous treatment contest. Then, the concern of this section
is on the continuous treatment variable of the empirical application. As mentioned
before, the law 488 is a form of private capital accumulation: it consists of subsidies to
capital accumulation. Then, the continuous treatment variable of the study might be
the amount of these subsidies, received by treated firms. However, a simple descriptive
analysis of this variable (Table 6.1), may reveal some limitation by adopting this as
115
116
CHAPTER 6. APPLICATION
treatment variable. By construction, it depends on the investment the subsidies refer
to: this implies an heterogeneous distribution of its values, depending mainly on the
size of the firms.
1%
5%
10%
25%
50%
75%
90%
95%
99%
Granted Subsidies (Euros)
Percentiles
Smallest
44340.23
20965.89
103767.5
24106.71
167532.8
35717.29
Obs.
309356.7
38134.29
Sum of wgt.
587821.9
Mean
Largest
Std. Dev.
1187916
1.46e+07
2599559
1.47e+07 Variance
4428788
1.62e+07 Skewness
1.37e+07
1.44e+08 Kurtosis
665
665
1440201
5896441
3.48e+13
21.27131
509.3261
Table 6.1. Summary of the granted subsidies
In order to reduce this source of variability and to limit the range of the treatment
variable, what is proposed here is to use the ratio of the subsidies on the investment. In
this way the treatment variable takes value in the interval [0,1]. Throughout the analysis
this indicator will be called quota. Table 6.2 and Figure 6.1 report a descriptive and a
graphical analysis of this variable.
1%
5%
10%
25%
50%
75%
90%
95%
99%
Percent share of subsidies
Percentiles
Smallest
14.018
8.595
21.932
9.825
27.289
9.827
37.538
11.462
45.921
Largest
54.013
77.308
61.941
78.241
66.690
78.513
75.731
94.637
on the investment
Obs.
Sum of wgt.
Mean
Std. Dev.
665
665
45.478
13.132
Variance
Skewness
Kurtosis
172.448
-0.086
3.164
Table 6.2. Summary of the percent share of subsidies on the total investment
The distributional graph of the treatment variable shows that the share of requested
6.3. THE OUTCOME VARIABLES
117
subsidies on the total investment seems to have a symmetric distribution over its mean
0
.01
Density
.02
.03
.04
value equal to 46%.
0
20
40
60
80
100
quota
Figure 6.1. Distribution of the treatment variable
6.3
The outcome variables
As mentioned before, the aim of this application is to provide a statistically robust
evaluation of the impact of subsidies to capital accumulation on subsidized firms. In
particular, we want to evaluate if the receipt of financial assistance from public funds
actually makes a difference in the firm performances int terms of investment, new employment, profit and labor productivity. Then, as outcome variables, on which we
will estimate the treatment effects, four set of variables are considered: firm growth
(turnover, number of employees, fixed assets), profitability (gross margin/turnover),
productivity (per capita turnover) and leverage (debt charges/turnover). For turnover,
employment, fixed assets and per capita turnover estimates of treatment effects are
computed as the difference in (weighted average) growth rates between subsidized and
non subsidized firms, for the other variables as the difference in levels. Results are based
on the not subsidized firms sample where the ending year for the non subsidized firms is
imputed on the base of the average investment length by auction in the subsidized firms
sample. The presence of several anomalous data (as signalled by the large difference
118
CHAPTER 6. APPLICATION
Obs.
Financed
Turnover
Employment
Fixed Assets
Gr. Margin/Turnover
Per Capita Turnover
Debt Charges/Turnover
493
517
521
422
501
503
Not
Financed
780
742
776
675
753
759
Mean
Financed
37.95
49.94
116.40
0.000036
5.00
-0.00559
Not
Financed
34.32
42.51
97.35
0.000582
11.38
-0.00533
Table 6.3. Summary of outcome variables
between median and mean across indicators), suggests to select only firms with nonnegative values for turnover, employment and assets, and to trim the subsidized and
non subsidized firm samples at the 5 and 95 percentiles1 . Table 6.3 and Figures from
B.1 to B.6 in Appendix B report, for this selected sample, mean values of these outcome
variables, by treatment status. The selected sample of firms has a different distribution
for the outcome variables with respect to the treatment status. In particular, financed
firms seems to have higher values in terms of firm growth, substantially equal values for
profitability and leverage, and lower values for productivity. By applying our methods,
we will provide a statistical robust evaluation of this heterogeneity between the two
groups: we will be able to measure how much part of this difference is caused by the
program participation.
6.4
Structured from of the selection process
The first step in the evaluation procedure is the specification of the propensity score
model identifying the participation decision process. We adopted a logit specification
of the treatment variables. For the covariates identification, we take advantage of the
selection mechanism that is used to allocate the incentives under the L. 488 by including
the selection indicators in the equation of the propensity score. The main ranking indicators are introduced, either in level or squared and cubed. Dummy variables relative
to the different auctions are considered because they reflect some specific characteristics
especially relative to the admission of projects already concluded. Moreover, the interaction between main indicators and dimension (large dimension by European Union
1
A similar procedure has been applied by Bronzini and de Blasio (2006)
6.4. STRUCTURED FROM OF THE SELECTION PROCESS
119
Commission definition) is introduced in the model specification. The regional and sectorial indicators are also considered. The first set of indicators allows to control for
differences in regional ranking and threshold; the sectorial dummies take into account
of productive heterogeneity of firms attending to auctions. The final specification of
the logit model for propensity score and the parameter estimates are in Table 6.4. To
Table 6.4. Structured form: Propensity score estimate
test the "balancing hypothesis" we have followed the procedure proposed in Ichino and
Becker (2002). Splitting the sample by propensity score in 7 blocks, we tested that the
balancing hypothesis is satisfied.
The second step of the evaluation procedure is the specification of the treatment level
model. As mentioned before, to consider for the relation between participation decision
and treatment level assignment, different specification for the share of subsidies are
estimated at different point of the propensity score distribution. In particular, we split
the sample by equal average propensity score values between treated and non treated
units and estimate a linear regression model for the share of subsidies, for each chosen
blocks of the propensity score. In other words, we use the 7 blocks identified by the
balancing hypothesis procedure for the propensity score estimation. In order to select
which covariates to include in the analysis the standard goodness of fit statistics together
with the common specification model tests have been carried out. Some covariates
are common for each model: they mainly refer to the variable reflecting the subsidies
120
CHAPTER 6. APPLICATION
limitation imposed by the law (firm size and areas). While other variables may be found
in some specification but not in others: they are the sectorial dummies, the kind of
investment qualified for the intervention or the amount of debts charges on debts stock.
Table 6.5 reports the linear regression estimates for the 2-nd block of the propensity
score. Given the estimated coefficients of the models and the values of covariates, the
Table 6.5. Structured form: Level of treatment estimate
predicted values were computed also for the group of the non-participants, in order to
use this variable to compute the second step of the proposed matching procedure. To
test the "balancing hypothesis" we have followed the same procedure proposed in Ichino
and Becker (2002) properly adapted to the specific case. We tested that the balancing
hypothesis is satisfied for each model.
6.4.1
Impact of the Law 488
Once estimated the two models we computed the two-step matching (diff-in-diff) procedure. Among the matching with replacement methods proposed in literature we have
chosen, for each step, the radius matching, with four different size for the radius:
• a predetermined radius equal to the 10% of the propensity score (or function)
range (that is equal to 0,1)
• radius equal to the mean of all distances between treated and non treated units
6.4. STRUCTURED FROM OF THE SELECTION PROCESS
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=10%*
Subs.
Not
Subs.
410
592
430
555
442
591
342
503
416
568
435
573
TTE
16.811
24.796
39.898
-0.009
-6.373
0.004
121
Std.
Er.
9.412
11.017
26.318
0.008
7.840
0.003
T-test
1.786
2.251
1.516
-1.118
-0.813
1.567
*radius equal to 10% of the propensity score range
Table 6.6. Structured form: TTE estimates radius=fix
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=mean*
Subs.
Not
Subs.
490
592
514
555
518
591
417
503
500
568
501
573
TTE
8.893
10.995
27.470
0.0002
-5.312
0.0005
Std.
Er.
3.728
4.503
10.128
0.004
3.091
0.001
T-test
2.385
2.442
2.712
0.047
-1.718
0.385
*radius equal to the mean distance
Table 6.7. Structured form: TTE estimates radius=mean
• radius equal to the maximum of the minimum of all distances
• radius equal to the standard deviation of all distances between treated and non
treated units
Then average treatment level effects (ATLE) are estimated by equation (4.6). In order
to have a general overview of these effects, Tables from 6.6 to 6.9 reports the “total”
average treatment effect estimates for different kind of radius sizes. They have the same
interpretation as average treatment effects (TTE) in the traditional binary case.
As expected, the growth impact of 488 on subsidized firms is positive and statistically
significant: the turnover increases from 9 to 12 points higher in the subsidized firms than
in non subsidized ones, depending on the radius of the matching algorithm; the number
of employees is from 11 to 25 per cent points higher and fixed assets increases to 27 per
cent points. These results are all statistical significant. Being the average time span
122
CHAPTER 6. APPLICATION
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=maxmin*
Subs.
Not
Subs.
438
592
465
555
473
591
374
503
460
568
451
573
TTE
20.185
27.663
39.321
-0.011
-8.759
0.005
Std.
Er.
11.978
14.198
33.747
0.011
10.286
0.003
T-test
1.685
1.948
1.165
-0.980
-0.852
1.393
*radius equal to the maximum of the minumum of all distances
Table 6.8. Structured form: TTE estimates radius=std
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=std*
Subs.
Not
Subs.
471
592
497
555
501
591
400
503
480
568
183
573
TTE
12.195
16.496
26.065
-0.002
-5.615
0.001
Std.
Er.
5.080
6.283
14.401
0.005
4.267
0.002
T-test
2.401
2.625
1.810
-0.359
-1.316
0.916
*radius equal to the standard deviations of all distances
Table 6.9. Structured form: TTE estimates radius=mean
equal to around 3,5 years, the additional annual employment growth rate imputed to L.
488 fluctuates from 3,1 to 7,1 percent points. The impact on growth in fixed assets is also
very high (an annual average of more than 7,7 points). In general, the results confirm
the findings reported in Carlucci and Pellegrini (2003) using a parametric approach
and the ones of Bernini et al. (2006) using a matching approach but in a binary case.
Moreover, they does not necessarily contradict the presence of investment intertemporal
substitution shown by Bronzini and de Blasio (2006), even if the amount of additional
capital installed in subsidized firms in the period is large. However, for fixed assets the
statistical significance of the incentive additional effect is lower.
The effects of the 488 are in line whit the (explicit or less explicit) targets: the subsidized firms have invested more than the non subsidized ones, achieving more turnover,
employment and fixed assets. The question is if the subsidized firms have also increased
6.4. STRUCTURED FROM OF THE SELECTION PROCESS
123
their efficiency, measured by productivity and profitability, in order to maintain a positive long run growth rate. Profitability is measured by gross margin on turnover. The
results show a slight negative difference for the subsidized firms, but the difference is not
statistically significant. However, the total amount of profits, as turnover, is increased
faster in subsidized firms. Productivity is proxied by per capita turnover. The impact
of L. 488 is negative but not statistically significant. This negative average effect turns
out also in the work of Bernini et al. (2006) where a matching approach is followed.
However here the result is statistically significant: the labor productivity growth is 4-8%
higher in non subsidized firms. The authors give some explanation for this productivity
gap. They argue that if the investment productivity curve is decreasing, the reduction
in the investment cost generated by the subsidy drives the subsidized firm to invest
in projects with a lower than average productivity. Furthermore, if the option for a
given sector is investing or restructuring, the not subsidized firms can have chosen to
restructure, increasing the productivity, whereas the subsidized firm can have chosen
to invest, increasing production and employment. The fact that our method turns out
a not significant result can be attributed to the two step matching procedure: when
only one matching is computed, that is in the traditional binary treatment case, it may
be the case that pairs are computed among units which are more distant in term of
their propensity score values than in the double matching case. This can reduce the
significance of the average treatment effect in this latter case.
As expected, the debt cost on turnover is slightly higher in the subsidized than in
the not subsidized firms, but the average effects is not statistically significant. The
reason can be imputed to an increase in debt that the subsidized firms have had to face
in order to finance the new investment, whereas the non subsidized firms could have
refrained from investing. However, the subsidy has not radically changed the financial
state of the firm.
These results show only an average tendency of treatment effects, but it is easy to
understand that the effects of L. 488 can be non homogeneous by firm. On the basis
of the estimated ATLE, our method allows to compute the impact of the amount of
subsidy on the treatment effect. In order to investigate if the treatment level differently
affects the response variable, we use an OLS estimator imposing a quadric relation between effects and level of subsidies. We restrict the analysis only for outcome variables
with a significant average treatment effect (see Tables from 6.6 to 6.9). Estimates are
reported in Table 6.10. The results show a significant quadratic relation between effects
and treatment levels for turnover and employment. In particular, given the sign of the
coefficients for the level variable (quota), the analysis evidences an increasing positive
124
CHAPTER 6. APPLICATION
Employment
quota
quota2
const
Obs.
Ad. R2
Prob. F
radius
fix
2.998*
-0.035*
-31.91
58
0.089
0.029
radius
mean
3.439*
-0.034*
-68.55*
65
0.201
0.000
Turnover
radius
std
3.769*
-0.039*
-65.86*
61
0.127
0.007
radius
mean
1.159
-0.012
-15.38
65
-0.003
0.405
radius
std
3.348*
-0.040*
-49.39*
58
0.187
0.001
Fixed
Assets
radius
mean
0.108
-0.009
40.18
64
0.008
0.290
*p<0.05
Table 6.10. OLS estimates: impact of the amount of subsidies on the treatment level
(structured case)
impact of the amount of incentives for low values of treatment level with respect to
non treated firms. After reaching a maximum level, the effect of the subsides on the
outcomes is decreasing. This tendency can be better viewed by graphing the predicted
values of these OLS estimates (Figure 6.2). Apart for fixed assets, that returns not significant coefficients, the analysis evidences an increasing positive impact of the amount
of incentives until the grant is equal to 40-50% of the total investment. After this peak,
the effect of the subsides on the outcomes is decreasing: firms investing in the project
less than the half of their capital achieve less favorable performance. Then the results
evidence that the additional effect of large grants with respect to the level of the firm’s
investment is low.
In order to better detect some heterogeneity of the effects with respect to the level
of treatment, that is consistently with the recent literature on the heterogeneity of
the effects in the program evaluation contest (see among others Athey and Imbens
(2006)), we also estimate this relation by a quantile regression. Estimates of the quantile
regression are reported in Tables 6.11 and 6.12.
The ability of quantile regression models to characterize the heterogeneity impact
of variables on different points of an outcome distribution makes them appealing for
evaluating the effects of policy interventions. Restating our problem of evaluating the
effects of policy interventions in a quantile regression framework allows us to investigate
if treatment groups have been different beneficiated from the treatment and to provide
an analytical description of the effects distribution with respect to the amount of granted
subsides. The results confirm a significant quadratic relation between average effects
6.4. STRUCTURED FROM OF THE SELECTION PROCESS
Turnover (radius=std)
Employment (radius=10%)
20
40
60
80
100
20
40
60
80
20
0
average effect
0
−60 −40 −20
average effect
0
−80 −60 −40 −20
5
0
−15 −10 −5
100
0
20
40
60
80
100
quota
Employment (radius=mean)
Employment (radius=std)
Fixed Assets (radius=mean)
20
average effect
0
0
−20
−20
−60
−60
−40
−20
0
average effect
average effect
20
40
quota
20
quota
−40
average effect
0
10
20
40
Turnover (radius=mean)
125
0
20
40
60
quota
80
100
0
20
40
60
80
quota
100
0
20
40
60
80
100
quota
Figure 6.2. Predicted OLS estimates for treatment impact on the amount of subsidy
(structured case)
and levels for low percentile values of the dependent variable, especially for employment.
This tendency is confirmed also by adopting a non parametric estimator for the
relation between effects and treatment levels. Figures 6.3 and 6.4 show the results
of the non parametric mean regression estimator (using the Nadayara-Watson kernel
estimator, Nadayara (1964)) for the outcome variables.
Again, the results confirm heterogeneity of the treatment outcome with respect to
different levels of treatment, especially for the outcome variable relative to employment
growth. In particular, it is detected an increasing relation for low values of subsidies;
after a peak is reached, it follows a decreasing relation. This profile is robust to the
radius size adopted in the matching procedure. For turnover growth, this kind of relation
is detected only choosing a radius equal to the mean distance. The value on which this
maximum refers to is different for the two outcome variables: for turnover it is between
126
CHAPTER 6. APPLICATION
q5
q25
q50
q75
q95
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
radius
fix
8.898*
-0.104*
-174.5*
5.436*
-0.060*
-104.9*
2.341
-0.033
-9.5
-3.773
0.002
47.81
-3.174
0.025
158.3
Pseudo
R2
0.500
0.252
0.086
0.018
0.090
Employment
radius Pseudo
mean
R2
6.051*
0.373
-0.068*
-141.7*
5.990*
0.255
-0.063*
-133.9*
3.858*
0.119
-0.038*
-81.7*
1.268
0.069
-0.008
-22.9
2.667
0.047
-0.023
-17.3
radius
std
8.543*
-0.093*
-191.9*
4.759
-0.052*
-96.19
2.377
-0.028
-31.76
-0.052
0.002
25.7
2.345
-0.023
2.6
Pseudo
R2
0.536
0.197
0.031
0.009
0.011
*p<0.05, bootstrapped standard errors
Table 6.11. Quantile estimates: Employment (structured case)
25 and 35% of the share of subsidies on the investment. For employment the impact is
increasing until the grant is equal to 50-60% of the total investment. For fixed assets
the relation between average effects and treatment level seems to be not significant, as
confirmed by applying the parametric approaches above.
6.4. STRUCTURED FROM OF THE SELECTION PROCESS
q5
q25
q50
q75
q95
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
radius
mean
4.415*
-0.049*
-104.5*
3.084*
-0.035*
-64.3*
1.507
-0.016
-23.2
-1.377
0.017
41.5
-2.497
0.033
83.5
Turnover
Pseudo radius
R2
std
0.288
8.008*
-0.098*
-161.5*
0.159
3.063
-0.039
-51.5
0.030
2.770
-0.037
-33.1
0.015
-0.647
0.003
47.71
0.147
-0.889
0.007
68.5*
Pseudo
R2
0.428
0.177
0.107
0.033
0.101
Fixed assets
radius Pseudo
mean
R2
7.058*
0.108
-0.075*
-190.9*
3.041
0.049
-0.036
-65.7
1.152
0.010
-0.014
1.7
-3.879
0.030
0.038
143.5
-12.500
0.145
0.108
443.7
*p<0.05, bootstrapped standard errors
Table 6.12. Quantile estimates: Turnover and fixed assets (structured case)
127
128
CHAPTER 6. APPLICATION
(a) Employment (radius=fix)
(b) Employment (radius=mean)
(c) Employment (radius=std)
Figure 6.3. Kernel estimates for treatment impact on the amount of subsidy (structured case)
6.4. STRUCTURED FROM OF THE SELECTION PROCESS
(a) Turnover (radius=mean)
129
(b) Turnover (radius=std)
(c) Fixed assets (radius=mean)
Figure 6.4. Kernel estimates for treatment impact on the amount of subsidy (structured case)
130
CHAPTER 6. APPLICATION
6.5
Non-structured from of the selection process
By adopting a unique specification for the selection process, we include in the model
all the covariates entered in the two steps of the previous section (6.4). Again, for
the propensity score model we adopted a logit specification of the binary treatment
variables. The parameter estimates are in Table 6.13.
Table 6.13. Non structured form: Propensity score estimate
To test the "balancing hypothesis" we have followed the procedure proposed in
Ichino and Becker (2002). Splitting the sample by propensity score in 10 blocks, we
tested that the balancing hypothesis is satisfied.
6.5.1
Impact of the Law 488
Once estimated the propensity score model we computed the matching (diff-in-diff)
procedure. Again, among the matching with replacement methods proposed in literature we have chosen, for each step, the radius matching, with four different size for the
radius, as specified above. Then average treatment level effects (ATLE) are estimated
by equation (4.8). In order to have a general overview of these effects, Tables from
6.14 to 6.17 reports the “total” average treatment effect estimates for different kind of
radius sizes. They have the same interpretation as average treatment effects (TTE) in
the traditional binary case.
6.5. NON-STRUCTURED FROM OF THE SELECTION PROCESS
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=10%*
Subs.
Not
Subs.
493
614
517
576
521
610
422
525
501
594
503
596
TTE
22.175
29.242
37.923
-0.009
-6.286
0.005
Std.
Er.
11.492
13.161
29.964
0.010
9.149
0.003
131
T-test
1.930
2.222
1.266
-0.871
-0.687
1.430
*radius equal to 10% of the propensity score range
Table 6.14. Non-structured form: TTE estimates radius=fix
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=mean*
Subs.
Not
Subs.
493
614
517
576
521
610
422
525
501
594
503
596
TTE
8.598
11.410
24.180
0.001
-5.220
0.001
Std.
Er.
3.555
4.269
9.368
0.003
2.892
0.001
T-test
2.419
2.673
2.581
0.344
-1.805
0.771
*radius equal to the mean distance
Table 6.15. Non-structured form: TTE estimates radius=mean
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=maxmin*
Subs.
Not
Subs.
493
614
517
576
521
610
422
525
501
594
503
596
TTE
23.869
32.484
46.157
-0.013
-8.333
0.006
Std.
Er.
14.707
17.057
38.573
0.014
11.914
0.004
*radius equal to the maximum of the minumum of all distances
Table 6.16. Non-structured form: TTE estimates radius=std
T-test
1.623
1.904
1.197
-0.959
-0.699
1.412
132
CHAPTER 6. APPLICATION
Outcomes
Turnover
Employment
Fixed asset
Gr. Margin/Turnover
Per capita Turnover
Debt Charges/Debt Stock
radius=std*
Subs.
Not
Subs.
493
614
517
576
521
610
422
525
501
594
503
596
TTE
11.291
15.894
20.905
-0.000
-6.160
0.001
Std.
Er.
4.767
5.801
12.689
0.005
3.957
0.001
T-test
2.396
2.732
1.647
0.106
-1.557
0.843
*radius equal to the standard deviations of all distances
Table 6.17. Non-structured form: TTE estimates radius=mean
As above, the growth impact of 488 on subsidized firms is positive and statistically
significant: the turnover increases from 9 to 11 points higher in the subsidized firms
than in non subsidized ones, depending on the radius of the matching algorithm; the
number of employees is from 11 to 29 per cent points higher and fixed assets increases
to 24 per cent points. These results are all statistical significant. Also for productivity,
profitability and financial state the results are quite similar to the structured case: there
are no significant differences between financed and non financed firms.
Some differences between the two procedures (structured vs. non structured) come
out by analyzing the relation between treatment levels and response variables. Again,
we restrict the analysis only for outcome variables with a significant average treatment
effect. Table 6.18 reports the OLS estimates together with the graphical representation
of the predicted values (Figure 6.5). The results show a significant quadratic relation
between effects and treatment levels only for employment growth. The profile is the
same as before, with an increasing positive impact of the amount of incentives for low
values of treatment level with respect to non treated firms. After reaching a maximum
level, the effect of the subsides on the outcomes is decreasing. For turnover end fixed
asset variables the regression returns not significant values for the coefficients. In some
sense, the structured case is able to better detect the heterogeneity of the effects with
respect to different doses; this can be caused by a more efficient estimation of the
selection process by exploiting more information. This finding is also confirmed by
looking at the goodness of fit values: the structured case returns higher values for the
adjusted R2 (see Table 6.10 and Table 6.18).
As regards the quantile regression estimates (Table 6.19 and 6.20) the comparison of the structured with the non structured form of the selection process reveals no
6.5. NON-STRUCTURED FROM OF THE SELECTION PROCESS
Employment
quota
quota2
const
Obs.
Ad. R2
Prob. F
radius
fix
0.931
-0.016
21.72
66
0.131
0.005
radius
mean
1.822*
-0.019*
-30.00
66
0.060
0.053
Turnover
radius
std
1.592*
-0.018*
-16.91
66
0.051
0.071
radius
mean
-0.284
0.001
18.62
66
-0.020
0.699
radius
std
-0.221
-0.000
20.99
66
-0.015
0.594
133
Fixed
Assets
radius
mean
0.942
-0.006
-9.53
65
-0.016
0.605
*p<0.05
Table 6.18. OLS estimates: impact of the amount of subsidies on the treatment level
(non structured case)
differences between the two procedures in the coefficient estimates: again, the results
confirm a significant quadratic relation between average effects and treatment levels for
low percentile values of the dependent variable, especially for employment. However,
the share of the impact variance explained by differences in the subsides is higher in
the case of a more structured form for the selection process (see Pseudo R2 values in
Table 6.11, 6.12 and Table 6.19, 6.20).
Finally figures 6.6 and 6.7 reports the results of the non parametric mean regressor
estimator.
The graphical analysis confirm heterogeneity of the treatment effect with respect
to different levels of treatment for the employment outcome variable. The tendency
is quite similar to the structured case, with an increasing relation for low values of
subsidies. Again, this kind of relation is robust to the radius size adopted in the
matching procedure. Again, the value on which this maximum refers is between 50 and
60% of the share of subsidies on the investment. For turnover and fixed asset outcomes
the relation between average effects and treatment level seems to be not significant, as
confirmed by applying the parametric approaches above. Again, the structured case,
with a more efficient estimation of the selection process, is able to better detect a
significative relation and to better predict the heterogeneity of the impact with respect
to the amount of granted subsidy.
Finally, independently from the chosen form for the selection process (structured or
non structured) it is worth to keep in mind that in the analysis of the relation between
impacts and treatment level, average causal effects are computed by comparing treated
134
CHAPTER 6. APPLICATION
Employment (radius=10%)
20
40
60
80
100
20
0
average effect
−40
0
0
−20
15
10
5
average effect
10
5
average effect
15
40
Turnover (radius=std)
20
Turnover (radius=mean)
0
20
40
60
80
100
0
20
40
60
80
100
quota
quota
Employment (radius=mean)
Employment (radius=std)
Fixed Assets (radius=mean)
20
40
60
quota
80
100
25
15
average effect
20
10
0
10
5
0
average effect
0
−10
−20
−30
0
−30 −20 −10
average effect
10
20
quota
0
20
40
60
quota
80
100
0
20
40
60
80
100
quota
Figure 6.5. Predicted OLS estimates for treatment impact on the amount of subsidy
(non structured case)
against non treated units. That is, our method is able to estimate the causal effect
due to the participation ruling out the differences between these two groups. This
means that, in general, there could be some differences among treated units (at different
levels). That is, participants might be different not only with respect to the level of
treatment. Then, in order to evaluate the impact of the amount of granted subsidy on
treatment effects one should be careful in the interpretation of this relation. In the case
of the Law 488, the proposed methods can explain heterogeneity but cannot suggest
that, in the lower (higher) part of the curve, an increase in the amount of subsidy can
increase (decrease) the impact. In fact, the characteristics of firms at different level of
treatment can be different, and differences in the level of treatment can be imputed
to this heterogeneity. In particular, this is true in this application: by construction
the amount of granted subsidies is determined by the allocation procedure of Law 488,
6.5. NON-STRUCTURED FROM OF THE SELECTION PROCESS
q5
q25
q50
q75
q95
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
radius
fix
8.092*
-0.093*
-166.9*
3.260*
-0.042*
-39.8
0.400
-0.011
32.7
-0.453
0.001
63.4
1.198
-0.020
69.2
Pseudo
R2
0.223
0.205
0.087
0.047
0.068
Employment
radius Pseudo
mean
R2
6.838*
0.152
-0.077*
-154.1*
2.718*
0.133
-0.030*
-58.3*
1.969
0.079
-0.019
-39.1
0.471
0.015
-0.004
9.0
1.857
0.035
-0.018
11.0
radius
std
7.850*
-0.088*
-171.0*
2.156
-0.026
-36.8
1.521
-0.016
-20.9
0.530
-0.006
15.9
1.628
-0.018
27.4
135
Pseudo
R2
0.158
0.132
0.067
0.004
0.027
*p<0.05, bootstrapped standard errors
Table 6.19. Quantile estimates: Employment (non structured case)
that imposes some constrains as function of geographic area and size of firms. Only if
we use the same sample (the same firms’ mix) at the different level of treatment this
comparison is meaningful. We left this analysis for future research.
136
CHAPTER 6. APPLICATION
q5
q25
q50
q75
q95
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
quota
quota2
const
radius
mean
4.315*
-0.049*
-101.6*
1.903
-0.023*
-40.1
0.692
-0.008
-7.0
-0.816
0.010
32.4
-6.839*
0.072*
200.9*
Turnover
Pseudo radius
R2
std
0.219
4.656*
-0.053*
-105.8*
0.046
1.999
-0.024*
-38.3
0.019
0.684
-0.008
-2.9
0.017
-0.593
0.007
32.3
0.343 -6.778*
0.072*
203.6*
Pseudo
R2
0.245
0.048
0.026
0.009
0.344
Fixed assets
radius Pseudo
mean
R2
9.673*
0.226
-0.100*
-252.8*
5.833*
0.088
-0.059
-145.7*
4.104
0.029
-0.035
-88.8
-0.993
0.070
0.019
46.2
-5.372
0.097
0.039
272.8
*p<0.05, bootstrapped standard errors
Table 6.20. Quantile estimates: Turnover and Fixed assets (non structured case)
6.5. NON-STRUCTURED FROM OF THE SELECTION PROCESS
(a) Employment (radius=fix)
137
(b) Employment (radius=mean)
(c) Employment (radius=std)
Figure 6.6. Kernel estimates for treatment impact on the amount of subsidy (non
structured case)
138
CHAPTER 6. APPLICATION
(a) Turnover (radius=mean)
(b) Turnover (radius=std)
(c) Fixed assets (radius=mean)
Figure 6.7. Kernel estimates for treatment impact on the amount of subsidy (non
structured case)
Chapter 7
Conclusions
The ambitious aim of the thesis is to develop a matching estimator approach to evaluate
causal effects of a policy intervention on some outcome variables in a continuous treatment framework. Recently matching estimators has been included in the new frontier
tools for solving causal inference problems, such as program evaluation.
In order to solve an economic problem, that is the evaluation of the impacts of
the Law 488/92 in Italy, we develop a novel double matching estimator, easy to apply and computationally not heavy. The proposed method allows to estimate average
treatment effects at different level of treatment and subsequently to explore the impact
of differences in treatment dose on policy outcome. Our results basically support the
conclusions derived from methods based on the binary treatment (Pellegrini and Centra
(2006) and Bernini et al. (2006)). Using the double matching method, the impact of L.
488 on subsidized firms is positive and statistically significant: the turnover increases
from 9 to 12 points higher in the subsidized firms than in non subsidized ones, depending on matching algorithm; the number of employees is from 11 to 25 per cent points
higher and fixed assets increases to 27 per cent points. The effects of the L. 488 are in
line whit the (explicit or less explicit) targets: the subsidized firms have invested more
(in percent terms) than the non subsidized ones, achieving more turnover, more employment and more fixed assets. However, our methods show the strong heterogeneity
of the treatment outcome with respect to different levels of treatment. The share of
the impact variance explained by differences in the subsides is about the 20% using the
double matching method. We find that higher the level of incentive, higher the policy
effect until a certain point, from which the marginal impact is decreasing.
Several economic policies use continuous policy variables. Therefore, this method
139
140
CHAPTER 7. CONCLUSIONS
can have a wide application field. With respect to the methods proposed in literature
in a continuous treatment setting, the two step matching method we introduce seems
in some sense superior, because it offers more information: the impact to all different
treatment level of the treated firms can be derived. Moreover, it evaluates the effects at
each level of the subsides by comparing treated versus non treated units. Furthermore,
with the double matching procedure the selection process is estimated in a more efficient
way, by splitting it into two components: the participation decision and the treatment
level assignment. This “structured” form for the selection process is able to better detect
the heterogeneity of the effects on outcome variables with respect to different treatment
doses, rather than a case of a single selection process specification.
However, the method can explain heterogeneity but cannot suggest that, in the lower
(higher) part of the curve, an increase in the amount of subsidy can increase (decrease)
the impact. In fact, the characteristics of firms at different level of treatment can be different, and dissimilarities in the level of treatment can be imputed to this heterogeneity.
Only if we use the same sample (the same firms’ mix) at the different level of treatment
this comparison is meaningful. We left this analysis for future research. Moreover, as
ulterior further development, we would like to indicate a robustness analysis on the
matching algorithm in terms of distance measure among units.
Bibliography
Abadie, A. and Imbens, G. (2006).
tors for average treatment effects.
Large sample properties of matching estimaEconometrica, 74(1), 235–267.
available at
http://ideas.repec.org/a/ecm/emetrp/v74y2006i1p235-267.html.
Angrist, J. (1990). Lifetime earnings and the vietnam era draft lottery: Evidence from
social security administrative records. American Economic Review, Vol. 80(n. 3), pp.
313–36. available at http://ideas.repec.org/a/aea/aecrev/v80y1990i3p313-36.html.
Angrist, J. and Krueger, A. (1991). Does compulsory school attendance affect schooling
and earnings? The Quarterly Journal of Economics, Vol. 106(n. 4), pp. 979–1014.
available at http://ideas.repec.org/a/tpr/qjecon/v106y1991i4p979-1014.html.
Athey, S. and Imbens, G. (2006). Identification and inference in nonlinear differencein-difference models. Econometrica, vol. 74(n.2), pp:431–497.
Barca, F. and Pellegrini, G. (2002). Policy for Territorial Competitiveness in Europe:
Notes on the 2000-2006 Plan for the Italian Mezzogiorno. Real Effects of Regional
Integration in the European Union and the Mercosur: Inter-continental views on
intra-continental experiences. Buenos Aires.
Barnow, B., Cain, G., and Goldberger, A. (1980). Issue in the Analysis of Selectivity
Bias, in Evaluation Studies, volume Vol. 5. E. Stromsdorfer and G. Farkas, san
francisco: sage edition.
Bartik, T. and Bingham, R. (1995). Can economic development programs be evaluated.
W.E. Upjohn Institute for Employment Research.
Battistin, E. and Rettore, E. (2003). Another look at the regression discontinuity design.
CeMMAP working papers CWP01/03, Centre for Microdata Methods and Practice,
141
142
BIBLIOGRAPHY
Institute for Fiscal Studies. available at http://ideas.repec.org/p/ifs/cemmap/0103.html.
Behrman, J., Cheng, Y., and Todd, P. (2004). Evaluating preschool programs when
length of exposure to the program varies: a non parametric approach. Reviews of
Economics and Statistics, Vol. 86(N. 1), pp 108–132.
Bell, B., Blundell, R., and Van Reenen, J. (1999). Getting the unemployed back to
work: an evaluation of the new deal proposals. Inernational Tax and Public Finance,
Vol. 6, pp. 339–360.
Bergstrom, F. (1998). Capital subsidies and the performance of firms. Working Paper No. 285, SSE/EFI Series in Economics and Finance, Department of Economics,
University of Stockholm.
Bernini, C., Centra, M., and Pellegrini, G. (2006). Growth and efficiency in subsidized
firms. Mimeo.
Blundell, R. and Costa Dias, M. (2002).
in empirical microeconomics.
Alternative approaches to evaluation
CeMMAP working papers CWP10/02, Centre
for Microdata Methods and Practice, Institute for Fiscal Studies.
available at
http://ideas.repec.org/p/ifs/cemmap/10-02.html.
Blundell, R., Costa-Dias, M., Meghir, C., and Van Reenen, J. (2001).
Eval-
uating the employment impact of a mandatory job search assistance program.
IFS Working Papers W01/20, Institute for Fiscal Studies.
available at
http://ideas.repec.org/p/ifs/ifsewp/01-20.html.
Boarnet, M. and Bogart, W. (1996). Enterprise zone and employment. what lessons
can be learned? ICER Torino Working Paper Series 98.
Bondonio, D. (2000). Statistical methods to evaluate geographically-target economic
development programs. Statistica Applicata, Vol. 12(n. 2), pp. 177–204.
Bondonio, D. (2004). The employment impact of business investment incentives in
declining areas: an evaluation of the eu objective 2 area programs. Università del
Piemonte Orientale.
Bound, J., Jaeger, D., and Baker, R. (1995). Problems with instrumental variables
estimation when the correlation between the instruments and the endogeneous explanatory variable is weak. Journal of the American Statistical Association, Vol.
90(n. 430), pp. 443–450.
BIBLIOGRAPHY
143
Bronzini, R. and de Blasio, G. (2006). Evaluating the impact of investment incentives:
The case of the italian law 488. Journal of Urban Economics, Vol. 6(n. 2), pp.
Brown M. A., Curlee, R. T. and Elliott, S. R. (1995). Evaluating technology innovation
programs: The use of comparison groups to identify impacts. Research Policy, 24,
pp. 669–684.
Campbell, D. and Stanley, J. (1963). Experimental and Quasi-Experimental Designs.
Chicago: Rand McNally.
Card, D. and Robins, P. (1998). Do financial incentives encourage welfare recipients to
work? Research in Labour Economics, Vol. 17, pp. 1–56.
Carlucci, C. and Pellegrini, G. (2003). Gli effetti della legge 488/92: una valutazione
dell’impatto occupazionale sulle imprese agevolate. Rivista italiana degli Economisti,
Vol. 8(n. 2), pp. 267–286.
Carlucci, C. and Pellegrini, G. (2005). Nonparametric analysis of the effects on employment of public subsidies to capital accumulation: the case of law 488/92 in italy.
presented at the Congress AIEL 2004, Modena.
Chiri, S. and Pellegrini, G. (1995). Gli aiuti alle imprese nelle aree depresse. Rivista
economica del Mezzogiorno, n. 3.
Daly, M., Gorman, I., Lenjosek, G., MacNevin, A., and Phiriyapreunt, W. (1993). The
impact of regional investment incentives on employment and productivity. Regional
Science and Urban Economics, (23), pp. 559–575.
Dehejia, R. (2005). Practical propensity score matching: a reply to smith and todd.
Journal of Econometrics, vol. 125(issues 1-2), pp. 355–364.
Dehejia, R. and Wahba, S. (1998a).
ies:
Causal effects in non-experimental stud-
Re-evaluating the evaluation of training programs.
Papers 6586, National Bureau of Economic Research, Inc.
NBER Working
available at
http://ideas.repec.org/p/nbr/nberwo/6586.html.
Dehejia, R. and Wahba, S. (1998b). Propensity score matching methods for nonexperimental causal studies. Technical report.
Dowall, D. (1996). An evaluation of californa’s enterprise zone programs. Economic
Development Quarterly, Vol. 10(n. 4), pp. 352–368.
144
BIBLIOGRAPHY
Driehuis, W. and van den Noord, P. (1998). The effects of investment subsidies on
employment. Economic Modelling, 5(1), pp. 32–40.
Duflo, E. (2001). Schooling and labor market consequences of school construction in
indonesia: Evidence from an unusual policy experiment. American Economic Review,
91(4), 795–813. available at http://ideas.repec.org/a/aea/aecrev/v91y2001i4p795813.html.
Dupont, V. and Martin, P. (2003). Subsidies to poor regions and inequalities: Some
unpleasant arithmetic. Technical Report No. 4107, CEPR Discussion Paper.
Eissa, N. and Liebman, J. (1995). Labor supply response to the earned income tax
credit. NBER Working Papers 5158, National Bureau of Economic Research, Inc.
available at http://ideas.repec.org/p/nbr/nberwo/5158.html.
Faini, R. and Schiantarelli, F. (1987). Incentives and investment decisions: the effectiveness of regional policy. Oxford Economic Papers, 39, 516–533.
Fisher, R. (1951). The design of Experiments. Edinburgh: Oliver and Boyd, 6th ed.
edition.
Florens, J., Heckman, J., Meghir, C., and Vytlacil, E. (????). Instrumental variable, local instrumental variables and control functions. CeMMAP working papers
CWP15/02, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Flores, C. A. (2004). Estimation of dose response funcions and optimal doses with
continuous treatment. Working paper, University of Miami.
Gabe, T. and Kraybill, D. (2002). The effects of state economic development incentives
on employment growth of establishments. Journal of Regional Science, 42, pp. 703–
730.
Garibaldi, P., Giavazzi, F., Ichino, A., and Rettore, E. (2007). College cost and time
to complete a degree: Evidence from tuition discontinuities. working paper.
Greenberg, D. and Shroder, M. (1997). Digest of Social Experiments. Urban Institute
Press.
Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, 66(2), 315–332. available at
http://ideas.repec.org/a/ecm/emetrp/v66y1998i2p315-332.html.
BIBLIOGRAPHY
145
Hahn, J., Todd, P., and Van der Klaauw, W. (2001). Identification and estimation of
treatment effects with a regression-discontinuity design. Econometrica, 69(1), 201–09.
available at http://ideas.repec.org/a/ecm/emetrp/v69y2001i1p201-09.html.
Harris, R. (1991). The employment creation effects of factor subsidies: Some estimates
for northern ireland manufacturing, 1955-83. Journal of Regional Science, 31, pp.
49–64.
Harris, R. and Trainor, M. (2005). Capital subsidies and their impact on total factor
productivity: Firm-level evidence from northern ireland. Journal of Regional Science,
45(1), pp. 49–74.
Hausman, J. and Wise, D. (1985). Social experimentation. National Bureau of Economic
Research.
Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, n..
47(pp. 153-161).
Heckman, J. (1996). Randomization as an instrumental variable. The Review of Economics and Statistics, Vol. 78(N. 2), pp. 336–341.
Heckman, J. and Robb, R. (1985). Alternative methods for evaluating the impact of
interventions. Longitudinal Analysis of Labour Market Program.
Heckman, J., Smith, J., and Clements, N. (1997a). Making the most out of programme
evaluation and social experiment: accounting for heterogeneity in programme impacts. Review of Economic Studies, Vol. 64(n. 4), pp. 487–535.
Heckman, J., Ichimura, H., and Todd, P. (1997b).
metric evaluation estimator:
gramme.
Matching as an econo-
Evidence from evaluating a job training pro-
Review of Economic Studies,
64(4),
605–54.
available at
http://ideas.repec.org/a/bla/restud/v64y1997i4p605-54.html.
Heckman, J., Ichimura, H., Smith, J., and Todd, P. (1998a). Characterizing selection
bias using experimental data. NBER Working Papers 6699, National Bureau of Economic Research, Inc. available at http://ideas.repec.org/p/nbr/nberwo/6699.html.
Heckman, J., Ichimura, H., and Todd, P. (1998b).
evaluation estimator.
Matching as an econometric
Review of Economic Studies, 65(2), 261–94.
http://ideas.repec.org/a/bla/restud/v65y1998i2p261-94.html.
available at
146
BIBLIOGRAPHY
Heckman, J., LaLonde, R., and Smith, J. (1999). The economics and econometrics of
active labour market programs. Handbook of Economics, 3.
Hirano, K. and Imbens, G. (2004). The propensity score with continuous treatment.
Draft of a chapter for Missing data and bayesian methods in practise: contributions
by Donald Rubin’s Statistical family. forthcoming from Wiley.
Ichino, A. (2002). The problem of causality in the analysis of educational choices and
labour market outcomes. Lectures Notes. http://www.iue.it/Personal/Ichino/.
Ichino, A. and Becker, S. (2002). Estimation of average treatment effects based on
propensity scores. The Stata Journal, 2(4), 358–377.
Ichino, A., Mealli, F., and Nannicini, T. (2003). Il lavoro interinale in italia: trappola
del precariato o trampolino verso un impegno stabile? Rapporto di Ricerca IUE.
Imai, K. and van Dyk, D. (2004). Causal inference with general treatment regimes:
Generalizing the propensity score. Journal of the American Statistical Association,
Vol. 99(No. 467), pp. 854–866.
Imbens, G. (1999). The role of the propensity score in estimating dose-response functions. NBER Technical Working Papers 0237, National Bureau of Economic Research,
Inc. available at http://ideas.repec.org/p/nbr/nberte/0237.html.
Imbens, G. (2003). Nonparametric estimation of average treatment effects under exogeneity: a review. Technical working paper 294, NATIONAL BUREAU OF ECONOMIC RESEARCH.
Imbens, G. and Angrist, J. (1994).
age treatment effects.
Identification and estimation of local aver-
Econometrica, Vol. 62(n. 2), pp. 467–75.
available at
http://ideas.repec.org/a/ecm/emetrp/v62y1994i2p467-75.html.
Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica, 46(33 50).
LaLonde, R. (1986). Evaluating the economometric evaluations of training programs
with experimental data. American Econometric Review, Vol. 76, pp.604–620.
Lechner, M. (1999). Earnings and employment effects of continuous off-the-job training
in east germany after unification. Journal of Business and Economic Statistics, Vol.
17(1), pp. 74–90.
BIBLIOGRAPHY
147
Lee, J. (1996). Government intervention and productivity growth. Journal of Economic
Growth, 1, pp. 391–414.
Leuven, E. and Sianesi, B. (2003). Psmatch2: Stata module to perform full mahalanobis
and propensity score matching, common support graphing, and covariate imbalance
testing. Statistical Software Components, Boston College Department of Economics.
available at http://ideas.repec.org/c/boc/bocode/s432001.html.
Levitt, S. (1997). Using electoral cycles in police hiring to estimate the effect of police
on crime. American Economic Review, Vol. 87(n. 3), pp. 270–90. available at
http://ideas.repec.org/a/aea/aecrev/v87y1997i3p270-90.html.
Lu, B., Zanutto, E., Hornik, R., and Rosenbaum, P. (2001). Matching with doses in an
observational study of a media campaign against drug abuse. Journal of the American
Statistical Association, Vol. 96(n. 456). Application and Case Studies.
McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society, (42), pp. 109–142.
Midelfart-Knarvik, K. H. and Overman, G. (2002). Delocation and european integration: Is structural spending justified? Economic Policy, v.17(no. 35), pp. 321–359.
Ministero dell’Industria, d. C. e. d. (2000).
Relazione sulle leggi e sui provvedi-
menti di sostegno alle attività economiche e produttive. Technical report, Ministero
dell’Industria, del Commercio e dell’Artigianato, Roma. various years.
Nadayara, E. (1964). On estimating regression. Theory and probability and its applications, (vol. 9), pp:141–142.
Orr, L. (1999). Social Experiments: Evaluating Public Programs with Experimental
Methods. Sage Publication, thousand oaks, california edition.
Orr, L., Bloom, H., Bell, S., Doolittle, F., and Lin, W. (1996). Does Training for the
Disadvantaged Work? Evidence from the National JTPA Study. The Urban Institute
Press.
Parascandolo, P. and Pellegrini, G. (2001). Sistema d’asta ed efficienza nella valutazione
del metodo di selezione delle imprese agevolate attraverso la legge 488/92. Atti del
Convegno SIEP, Università di Pavia.
Pellegrini, G. (1999). L’efficacia degli aiuti alle imprese nel Mezzogiorno. Il vecchio e il
nuovo intervento. Il Mulino.
148
BIBLIOGRAPHY
Pellegrini, G. and Centra, M. (2006). Growth and efficiency in subsidized firms. Paper
prepared for the Workshop "The Evaluation of Labour Market, Welfare and Firms
Incentives Programmes", Istituto Veneto di Scienze, Lettere ed Arti - Venezia.
Rooper, S. and Hewit-Dundas, N. (2001). Grant assistance and small firm development
in northern ireland and the republic of ireland. Scottish Journal of Political Economy,
vol. 48(n. 1), pp. 99–117.
Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in
observational studies for causal effects. Biometrika, 70(pp. 41-55).
Royer, H. (2003). What all women (and some men) want to know: Does maternal age
affect infant health? available at: http://sitemaker.umich.edu/hroyer).
Rubin, D. (1973). Matching to remove bias in observational studies. Biometrics, Vol.
29, pp. 159–183.
Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, (n. 66), pp. 666–701.
Rubin, D. (1979). Using multivariate matched sampling and regression adjustment to
control bias in observational studies. Journal of the American Statistical Association,
Vol. 74(n. 366), pp. 318–328. available at http://links.jstor.org/sici?sici=0162-1459R
Schalk, H. and Untied, G. (2002). Regional investment incentives in germany: Impacts
on factor demand and growth. Annals of Regional Sciences, (34), pp. 173–195.
Smith, J. and Todd, P. (2005a). Does matching overcome lalonde’s critique of nonexperimental estimators? Journal of Econometrics, vol. 125(issues 1-2), pp. 305–353.
available at http://ideas.repec.org/p/uwo/hcuwoc/20035.html.
Smith, J. and Todd, P. (2005b). Rejoinder. Journal of Econometrics, vol. 125(issues
1-2), pp. 365–375.
Stock, J. and Watson, M. (2003). Introduction to Econometrics. Boston, MA ; London:
Addison Wesley.
Trochim, W. (1984).
Research Design for Program Evaluation: the Regression-
Discontinuity Approach. Beverly Hills: Sage Pubblication, CA.
BIBLIOGRAPHY
van der Klaauw,
149
W. (2002).
Estimating the effect of financial aid of-
fers on college enrollment:
national Economic Review,
A regression-discontinuity approach.
vol.
43(n. 4),
pp. 1249–1287.
Inter-
available at
http://ideas.repec.org/a/ier/iecrev/v43y2002i4p1249-1287.html.
Van Reenen, J. (1994). The creation and capture of rents: Wages and innovation in
a panel of uk companies. CEPR Discussion Papers n. 1071, C.E.P.R. Discussion
Papers. available at http://ideas.repec.org/p/cpr/ceprdp/1071.html.
Willis, R. and Rosen, S. (1980).
Education and self-selection.
ing Papers 0249, National Bureau of Economic Research, Inc.
NBER Workavailable at
http://ideas.repec.org/p/nbr/nberwo/0249.html.
Wren, C. and Waterson, M. (1991). The direct employment effect of financial assistance
to industry. Oxford Economic Paper, vol. 43(n. 1), pp. 116–138.
150
BIBLIOGRAPHY
Appendix A
Integration of datasets
Integration of L. 488 dataset with AIDA budgetary data
AIDA contains the budgets of firms whose turnover is more than 1 million euros, and
therefore can not be representative of the Italian firms population. The budget data
imputation procedure has produced an unavoidable reduction of the share of small
firms in the sample, introducing a strong risk of selection bias in the composition of the
treated and control groups. However, the analysis can be consistent if the selection bias
is similar between the two groups: the under-representation of small firms in the sample
could not affect the estimation of the policy impact if the variation in the small firms
share is the same in the financed projects group and in the control group. The data
suggest that the impact of the imputation procedure is basically the same between the
two groups. The data suggest that the impact of the imputation procedure is basically
the same between the two groups. The results show that the corrected sample, after the
reduction in the data set due to identification of the eligible projects and elimination
of anomalies, is equal to 34,9% of the original sample in subsidized firms and to 34,7%
in non subsidized firms (table A.11 ).
The procedure of matching between administrative and budgetary data also reduces
the sample: the matched sample used in the analysis is equal to 12,6% of the corrected
sample for the subsidized firms, to 11,9% for non subsidized firms. The final dataset
consists of 665 financed projects and 1.228 non financed projects.
Several careful checks of the matched data have also to be carried out in order
1
Source: elaboration of Pellegrini and Centra (2006) on L488 and AIDA data.
151
Table A.1. From the original to the matched dataset
to measure the impact of the matching procedure. Our results critically depend on
the absence of selection effect in the construction of the data set: the impact of the
selection criteria and the missing data imputation procedure on financed projects and
control group have to be deeply investigating. The analysis is presented in table A.22 .
The reduction of the dataset due to the exclusion of AIDA missing firms, even if
substantial in absolute value, has a slight impact on the regional distribution of firms.
The variations systematically maintain the same sign for both the financed projects and
the control group. Only in Puglia a higher reduction in financed projects with respect to
the non financed ones is detected; this result is opposite to the one found in Basilicata.
2
Source: elaboration of Pellegrini and Centra (2006) on L488 and AIDA data.
The impact on firm dimension distribution is analogous: the variations maintain the
same sign for the three considered dimensional classes and there is no evidence of major
differences.
The analysis of the distribution of projects according to firms economic activity
shows a less favourable scenario. The matching process with AIDA reduces the incidence
of firms operating in the extractive sector among financed projects group, while increases
their share in the control group; the same happens for some sectors of mechanical
industry. However, considering the whole sample of projects, this disparity on the sign
of the variation in the share of economic activity among treated and control group
regards only the 16% of the cases. The remaining 84%, has a distribution according
to economic activity sector that does non vary in a significant way. The impact of the
matching on the distributions of the two groups, subsidized and non subsidized, appears
similar: the sign is the same in all distributions and the levels of differences do not show
appreciable significant differences .These results support an high level of confidence on
the data set representativeness and, consequently, on the robustness of the analysis. It
is worth noting that activity sector and dimension are key variables in the evaluation
procedure, since they are highly correlated with the social budgetary data: a high bias
in these characteristics would cause a low reliability of results.
The matching with the AIDA dataset necessarily generates an asymmetry in the
projects sample towards larger firms, for which the budgets are available and, indirectly, on the distribution of the indicators ranking the projects and on the level of
the investment. As a consequence, a further step in the consistency analysis consists
in evaluating the impact of AIDA integration on the selection indicators. The analysis
shows a quite homogeneous impact on the indicators median value (less sensitive to
outliers) which indicate an appreciable level of homogeneity between financed projects
and control group.
Table A.2. Impact of data matching process on eligible projects distribution
Appendix B
Outcome variable distributions
Not Financed
.008
.006
0
0
.002
.004
Growth rate distribution
.006
.004
.002
Growth rate distribution
.008
.01
.01
Financed
−50
0
50
100
150
200
Turnover growth rate
−50
0
50
100
150
200
Turnover growth rate
Figure B.1. Distribution of the outcome variable: turnover
155
Not Financed
.008
.006
0
0
.002
.004
Growth rate distribution
.006
.004
.002
Growth rate distribution
.008
.01
.01
Financed
−100
0
100
200
−100
Employment growth rate
0
100
200
Employment growth rate
Figure B.2. Distribution of the outcome variable: employment
.006
.008
Not Financed
0
.002
.004
Growth rate distribution
.006
.004
0
.002
Growth rate distribution
.008
Financed
0
200
400
Fixed assets growth rate
600
0
200
400
600
Fixed assets growth rate
Figure B.3. Distribution of the outcome variable: fixed assets
Not Financed
6
0
0
2
4
Growth distribution
10
5
Growth distribution
8
10
15
Financed
−.1
−.05
0
.05
.1
−.1
Gr. margin on Tournover growth
−.05
0
.05
.1
Gr. margin on Tournover growth
Figure B.4. Distribution of the outcome variable: gr. margin on turnover
.01
.015
Not Financed
0
.005
Growth rate distribution
.01
.005
0
Growth rate distribution
.015
Financed
−50
0
50
100
Per capita Turnover growth rate
150
−50
0
50
100
150
Per capita Turnover growth rate
Figure B.5. Distribution of the outcome variable: per capita turnover
0
0
10
20
Growth distribution
20
10
Growth distribution
30
40
Not Financed
30
Financed
−.04
−.02
0
.02
.04
Debt Charges on Turnover growth
−.04
−.02
0
.02
.04
Debt Charges on Turnover growth
Figure B.6. Distribution of the outcome variable: debt charges on debt stock
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement