U N C O R R EC TED PR O O F Research article 2Instrumental variables methods in experimental criminological 3research: what, why and how 5JOSHUA D. ANGRIST* 6MIT Department of Economics, 50 Memorial Drive, Cambridge, MA 02142-1347, USA 7NBER, USA: E-mail: angrist@mit.edu 9 10Abstract. Quantitative criminology focuses on straightforward causal questions that are ideally 11addressed with randomized experiments. In practice, however, traditional randomized trials are difficult 12to implement in the untidy world of criminal justice. Even when randomized trials are implemented, not 13everyone is treated as intended and some control subjects may obtain experimental services. Treatments 14may also be more complicated than a simple yes/no coding can capture. This paper argues that the 15instrumental variables methods (IV) used by economists to solve omitted variables bias problems in 16observational studies also solve the major statistical problems that arise in imperfect criminological 17experiments. In general, IV methods estimate causal effects on subjects who comply with a randomly 18assigned treatment. The use of IV in criminology is illustrated through a re-analysis of the Minneapolis 19domestic violence experiment. The results point to substantial selection bias in estimates using treatment 20delivered as the causal variable, and IV estimation generates deterrent effects of arrest that are about 21one-third larger than the corresponding intention-to-treat effects. Key words 23Background 24I’m not a criminologist, but I`ve long admired criminology from afar. As an 25applied economist who puts the task of convincingly answering causal questions at 26the top of my agenda, I’ve been impressed with the no-nonsense outcome-oriented 27approach taken by many quantitative criminologists. Does capital punishment 28deter? Do drug courts reduce recidivism? Does arrest for domestic assault reduce 29the likelihood of a repeat offense? These are the sort of straightforward and 30practical causal questions that I can imagine studying myself. 31I also appreciate the focus on credible research designs reflected in much of the 32criminological research agenda. Especially noteworthy is the fact that, in marked 33contrast with an unfortunate trend in education research, criminologists do not 34appear to have been afflicted with what psychologist Tom Cook (2001) calls 35Fsciencephobia._ This is a tendency to eschew rigorous quantitative research de- 36signs in favor of a softer approach that emphasizes process over outcomes. In fact, 37of the disciplines tracked in a survey of social science research methods by Boruch 38et al. (2002), Criminology is the only one to show a marked increase in the use of 39randomized trials since the mid-sixties. 40The use of randomized trials in criminology is clearly increasing and, by now, 41criminological experiments have been used to study interventions in policing, 42prevention, corrections, and courtrooms (Farrington and Welsh 2005). Randomized Q1 DO00025126/JOEX2005-211; No. of Pages 22 Journal of Experimental Criminology (2005) 2: 1–22 # Springer 2005 U N C O R R EC TED PR O O F 43trials are increasingly seen as the gold standard for scientific evidence in the 44crime field, as they are in medicine (Weisburd et al. 2001). At the same time, a 45number of considerations appear to limit the applicability of randomized research 46designs to criminology. 47A major concern in the criminological literature is the possibility of a failed 48research design (see, e.g., Farrington 1983; Rezmovic et al. 1981; Gartin 1995). 49Gartin (1995) notes that two sorts of design failure seem especially likely. The 50first, treatment dilution, is when subjects or units assigned to the treatment group 51do not get treated. The second, treatment migration, is when subjects or units in the 52control group nevertheless obtain the experimental treatment. These scenarios are 53indeed potential threats to the validity of a randomized trial. For one thing, with 54non-random crossovers, the group the ends up receiving treatment may no longer 55be comparable to the remaining pool of untreated controls. In addition, if intended 56treatment is only an imperfect proxy for treatment received, it seems clear that an 57analysis based on the original intention-to-treat probably understates the causal 58effect of treatment per se. Although not unique to criminology, these problems 59most often arise when neither subjects nor those delivering treatment can be 60blinded and, must, in any case, be given some discretion for both practical and 61ethnical reasons.1 62The purpose of this paper is to show how the instrumental variables (IV) meth- 63ods widely used in Economics solve both the treatment dilution and treatment 64migration problems. As a by-product, the IV framework also opens up the pos- 65sibility of a wide range of flexible experimental research designs. These designs are 66unlikely to raise the sort of ethical questions that are seen as limiting the appli- 67cability of traditional experimental designs in crime and justice (see e.g., Weisburd 682003, for a discussion). Finally, the logic of IV suggests a number of promising 69quasi-experimental research designs that may provide a reasonably credible (and 70inexpensive) substitute for an investigator’s own random assignment.2 71Motivation: the Minneapolis domestic violence experiment 72Treatment migration and treatment dilution are features of one of the most in- 73fluential randomized trials in criminological research, the Minneapolis domestic 74violence experiment (MDVE), discussed in Sherman and Berk (1984) and Berk 75and Sherman (1988). The MDVE was motivating by debate over the importance of 76deterrence effects in the police response to domestic violence. Police are often 77reluctant to make arrests for domestic violence unless the victim demands an ar- 78rest, or the suspect does something that warrants arrest (beside the assault itself). 79As noted by Berk and Sherman (1988), this attitude has many sources: a general 80reluctance to intervene in family disputes, the fact that domestic violence cases 81may not be prosecuted, genuine uncertainty as to what the best course of action is, 82and an incorrect perception that domestic assault cases are especially dangerous for 83arresting officers. 84In response to a politically charged policy debate as to the wisdom of making 85arrests in response to domestic violence, the MDVE was conceived as a social JOSHUA D. ANGRIST2 U N C O R R EC TED PR O O F 86experiment that might provide a resolution. The research design incorporated three 87treatments: arrest, ordering the offender off the premises for 8 h, and some form of 88advice that might include mediation. The research design called for one of these 89three treatments to be randomly selected each time participating Minneapolis po- 90lice officers encountered a situation meeting the experimental criteria (some kind of 91apparent misdemeanor domestic assault where there was probable cause to believe 92that a cohabitant or spouse had committed an assault against the other party in the past 934 h). Cases of life-threatening or severe injury, i.e., felony assault, were excluded. 94Both suspect and victim had to be present upon the intervening officers` arrival. 95The randomization device was a pad of report forms that were randomly color- 96coded for each of the three possible responses. Officers who encountered a situ- 97ation that met the experimental criteria were to act according to the color of the 98form on top of the pad. The police officers who participated in the experiment had 99volunteered to take part, and were therefore expected to comply with the research 100design. On the other hand, strict adherence to the randomization protocol was 101understood by the experimenters to be both unrealistic and inappropriate. 102In practice, officers often deviated from the responses called for by the color of 103the report form drawn at the time of an incident. In some cases, suspects were 104arrested when random assignment called for separation or advice. Most arrests in 105these cases came about when a suspect attempted to assault an officer, a victim 106persistently demanded an arrest, or if both parties were injured. In one case where 107random assignment called for arrest, officers separated instead. In a few cases, 108advice was swapped for separation and vice versa. Although most deviations from 109the intended treatment reflected purposeful action on the part of the officers in- 110volved, sometimes deviations arose when officers simply forgot to bring their 111report forms. 112As noted above, non-compliance with random assignment is not unique to the 113MDVE or criminological research. Any experimental intervention where ethical or 114practical considerations lead to a deviation from the original research protocol is 115likely to have this feature. It seems fair to say that non-compliance is usually 116unavoidable in research using human subjects. Gartin (1995) discusses a number of 117criminological examples with compliance problems, and non-compliance has long 118been recognized as a feature of randomized medical trials (see e.g., Efron and 119Feldman 1991). 120In the MDVE, the most common deviation from random assignment was the 121failure to separate or advise when random assignment called for this. This can be 122seen in Table 1, taken from Sherman and Berk (1984), which reports a cross- 123tabulation of treatment assigned and treatment delivered. Of the 92 suspects 124randomly assigned to be arrested, 91 were arrested. In contrast, of the 108 suspects 125randomly assigned to receive advice, 19 were arrested and five were separated. The 126compliance rate with the advice treatment was 78%. Likewise, of the 114 suspects 127randomly assigned to be separated 26 were arrested and five were advised. The 128compliance rate with the separation treatment was 73%. 129Importantly, the random assignment of intended treatments in the MDVE does 130not appear to have been subverted (Berk and Sherman 1988). At the same time, it Q2 IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 3 U N C O R R EC TED PR O O F 131is clear that delivered treatments had a substantial behavioral component. The 132variable Ftreatment delivered_ is, in the language of econometrics, endogenous. In 133other words, delivered treatments were determined in part by unobserved features 134of the situation that were very likely correlated with outcome variables such as re- 135offense. For example, some of the suspects who were arrested in spite of having 136been randomly assigned to receive advice or be separated were especially violent. 137An analysis that contrasts outcomes according to the treatment delivered is there- 138fore likely to be misleading, generating an over-estimate of the power of advice or 139separation to deter violence. I show below that this is indeed the case.3 140A simple, commonly used approach to the analysis of randomized clinical trials 141with imperfect compliance is to compare subjects according to original random 142assignment, ignoring compliance entirely. This is known as an intention-to-treat 143(ITT) analysis. Because ITT comparisons use only the original random assignment, 144and ignore information on treatments actually delivered, they indeed provide 145unbiased estimates of the causal effect of researchers’ intention to treat. This is 146valuable information which undoubtedly should be reported in any randomized 147trial. The ITT effect predicts the effects of an intervention in circumstances where 148compliance rates are expected to be similar to those in the study used to estimate 149the ITT effect. At the same time, ITT estimates are almost always too small rel- 150ative to the effect of treatment itself. It is the latter that tells us the Ftheoretical 151effectiveness_ of an intervention, i.e., what happens to those who were actually 152exposed to it. 153An easy way to see why ITT is typically too small is to consider the ITT effect 154generated by an experiment where the likelihood of treatment turns out to be the 155same in both the intended-treatment and intended-control groups. In this case, there 156is essentially Fno experiment,_ i.e., the treatment-intended group gets treated, on 157average, just like the control group. The resulting ITT effect is therefore zero, even 158though the causal effect of treatment on individuals may be positive or negative. 159More generally, the ITT effect is, except under very unusual circumstances, diluted 160by non-compliance. This dilution diminishes as compliance rates go up. Thus, ITT 161provides a poor predictor of the average causal effect of similar interventions in the 162future, should future compliance rates differ. For example, if compliance rates go t1.1Table 1. Assigned and delivered treatments in spousal assault cases. Assigned treatment Delivered treatment TotalArrest Coddled t1.3 Advise Separate t1.4 Arrest 98.9 (91) 0.0 (0) 1.1 (1) 29.3 (92) t1.5 Advise 17.6 (19) 77.8 (84) 4.6 (5) 34.4 (108) t1.6 Separate 22.8 (26) 4.4 (5) 72.8 (83) 36.3 (114) t1.7 Total 43.4 (136) 28.3 (89) 28.3 (89) 100.0 (314) t1.8 t1.9The table shows statistics from Sherman and Berk (1984), Table 1. JOSHUA D. ANGRIST4 U N C O R R EC TED PR O O F 163up because the intervention of interest has been shown to be effective (as, for 164example, arresting domestic abusers was shown to be in the MDVE), the ITT from 165an earlier randomized trial will be misleading. 166Before turning to a detailed discussion of the manner in which IV solves the 167compliance problem, I’ll briefly describe an alternative approach that once favored 168in economics but has now largely been supplanted by simpler 2SLS methods. This 169approach attempts to model the compliance (or treatment) decision directly, and 170then to integrate the compliance model into the analysis of experimental data. For 171example, we might imagine modeling compliance as the result of a comparison of 172latent (i.e., unobserved) costs and benefits, and try to explicitly model the rela- 173tionship between these unobserved variables and potential outcomes, usually using 174a combination of functional form and distributional assumptions such as the joint 175Normality. Berk et al. (1988) tried such a strategy in their analysis of MDVE. In 176practice, however, this Fstructural modeling_ approach requires strong assumptions, 177which are likely to be unattractive in the study of treatment effects (Angrist 2001). 178One way to see this, is to note that if compliance problems could be solved simply 179merely by better econometric modeling, then we wouldn`t need random 180assignment in the first place. Luckily, however, elaborate latent-variable models 181of the compliance process are unnecessary. 182The instrumental-variables framework 183The simplest and most robust solution to the treatment-dilution and treatment- 184migration problems is instrumental variables. This can be seen most easily using a 185conceptual framework that postulates a set of potential outcomes that could be 186observed in alternative states of the world. Originally introduced by statisticians 187Fisher and Neyman in the 1920s as a way to discuss treatment effects in ran- 188domized agricultural experiments, the potential-outcomes framework has become 189the conceptual workhouse for non-experimental as well as experimental studies in 190medicine and social science (see Holland 1986, for a survey and Rubin 1974, 1977, 191for influential contributions). The intellectual history of instrumental variables 192begins with an unrelated effort by the father and son team of geneticists Phillip and 193Sewall Wright to solve the problem of statistical inference for a system of 194simultaneous equations. Their work can also be understood as an attempt to 195describe potential outcomes, though this link was not made explicit until much 196later. See Angrist and Krueger (2001) for an introduction to this fascinating story. 197In an agricultural experiment, the potential outcomes notion is reasonably 198straightforward. Potential outcomes in this context describe what a particular plot 199of land would yield under alternative applications of fertilizer. Although we only 200get to see the plot fertilized in one particular way at a one particular time, we can 201imagine what the plot would have yielded had it been treated otherwise. In social 202science, potential outcomes usually require a bit more imagination. To link the 203abstract discussion of potential outcomes to the MDVE example, I’ll start with an 204interpretation of the MDVE as randomly assigning and delivering a single alterIV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 5 U N C O R R EC TED PR O O F 205native to arrest, instead of two, as actually occurred. Because the policy discussion 206in the domestic assault context focuses primarily on the decision to arrest and 207possible alternatives, I define a binary (dummy) treatment variable for not arrest- 208ing, which I’ll call coddling. A suspect was randomly assigned to be coddled if the 209officer on the scene was instructed by the random assignment protocol (i.e., the 210color-coded report forms) to advise or separate. A subject received the coddling 211treatment if the treatment delivered was advice or separation. Later, I’ll outline an 212IV setup for the MDVE that allows for multiple treatments. 213The most important outcome variable in the MDVE was recidivism, i.e., the 214occurrence of post-treatment domestic assault by the same suspect. Let Yi denote 215the observed re-offense status of suspect i. The potential outcomes in the binary- 216treatment version of MDVE are the re-offense status of suspect i if he were cod- 217dled, denoted Y1i, and the re-offense status of suspect i if he were not coddled, 218denoted Y0i. Both of these potential outcomes are assumed to be well-defined for 219each suspect even though only one is ever observed. Let Di denote the treatment 220delivered to subject i. Then we can write the observed recidivism outcome as Yi ¼ Y0i 1jDið Þ þ Y1iDi: 223In words, this means we get to see the Y1i for any subject who was coddled, but we 224don’t know whether he would have re-offended if he had been arrested. Likewise, 225we get to see Y0i for any subject who was arrested, but we don’t know whether he 226would have re-offended had he been coddled. 227A natural place to start any empirical analysis is by comparing outcomes on the 228basis of treatment delivered. Because of the non-random nature of treatment 229delivery, however, such naive comparisons are likely to be misleading. This can be 230seen formally by writing E YijDi ¼ 1½ ŠjE YijDi ¼ 0½ Š ¼E Y1ijDi ¼ 1½ ŠjE Y0ijDi ¼ 0½ Š ¼ E Y1ijY0ijDi ¼ 1½ Š þ E Y0ijDi ¼ 1½ ŠjE Y0ijDi ¼ 0½ Šf g: 233The first term in this decomposition is the average causal effect of treatment on the 234treated (ATET), a parameter of primary interest in evaluation research. ATET tells 235us the difference between average outcomes for the treated, E[Y1i | Di = 1], and 236what would have happened to treated subjects if they had not been treated, E[Y0i|Di 237= 1]. The second term in is the selection bias induced by the fact that treatment 238delivered was not randomly assigned. In the MDVE, those coddled were probably 239less likely to re-offend even in the absence of treatment. Hence, E[Y0i | Di = 1] j 240E[Y0i | Di = 0], is probably negative. 241Selection bias disappears when delivered treatment is determined in a manner 242independent of potential outcomes, as in a randomized trial with perfect com- 243pliance. We then have E YijDi ¼ 1½ ŠjE YijDi ¼ 0½ Š ¼ E Y1ijY0ijDi ¼ 1½ Š ¼ E Y1ijY0i½ Š: JOSHUA D. ANGRIST6 U N C O R R EC TED PR O O F 246With perfect compliance, the simple treatment-control comparison recovers ATET. 247Moreover, because {Y1i,Y0i} is assumed to be independent of Di in this case, ATET 248is also the population average treatment effect, E[Y1i j Y0i]. 249The most important consequence of non-compliance is the likelihood of a 250relation between potential outcomes and delivered treatments. This relation con- 251founds analyses based on delivered treatments because of the resulting selection 252bias. But we have an ace in the hole: the compliance problem does not compromise 253the independence of potential outcomes and randomly assigned intended treat- 254ments. The IV framework provides a set of simple strategies to convert comparisons 255using intended random assignment, i.e., ITT effects, into consistent estimates of the 256causal effect of treatments delivered. 257The easiest way to see how IV solves the compliance problem is in the context 258of a model with constant treatment effects, i.e., Y1i j Y0i = !, for some constant, !. 259Also, let Y0i = " + "i, where " = E[Y0i]. The potential outcomes model can now 260be written Yi ¼ " þ !Di þ "i; ð1Þ 263where ! is the treatment effect of interest. Note that because Di is a dummy 264variable, the regression of Yi on Di is just the difference in mean outcomes by 265delivered treatment status. As noted above, this difference does not consistently 266estimate ! because Y0i and Di are correlated (equivalently, "i and Di are correlated). 267The random assignment of intended treatment status, which I"ll call Zi, provides 268the key to untangling causal effects in the face of treatment dilution and migration. 269By virtue of random assignment, and the assumption that assigned treatments have 270no direct effect on potential outcomes other than through delivered treatments, Y0i 271and Zi are independent. It therefore follows that E "ijZi½ Š ¼ 0; ð2Þ 273though "i is not similarly independent of Di. Taking conditional expectations of 274Equation (1) with Zi switched off and on, we obtain a simple formula for the 275treatment effect of interest: E YijZi ¼ 1  à À E YijZi ¼ 0  ÃÈ É E DijZi ¼ 1  à À E DijZi ¼ 0  ÃÈ É ¼ !: ð3Þ 277Thus, the causal effect of delivered treatments is given by the causal effect of 278assigned treatments (the ITT effect) divided by E[Di | Zi = 1]jE[Di | Zi = 0]. 279Note that in experiments where there is complete compliance in the comparison 280group (i.e., no controls get treated), Formula (3) is just the ITT effect divided by 281the compliance rate in the originally assigned treatment group. More generally, the 282denominator in Equation (3) is the difference in compliance rates by assignment 283status. In the MDVE, E[Di | Zi = 1] = P[Di = 1 | Zi = 1] = .77, that is, a little over IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 7 U N C O R R EC TED PR O O F 284three-fourths of those assigned to be coddled were coddled. On the other hand, 285almost no one assigned to be arrested was coddled: E DijZi ¼ 0½ Š ¼ P½Di ¼ 1jZi ¼ 0Š ¼ :01: 288Hence, the denominator of Equation (3) is estimated to be about .76. The sample 289analog of Formula (3) is called a Wald estimator, since this formula first appeared 290in a paper by Wald (1940) on errors-in-variables problems. The law of large 291numbers, which says that sample means converge in probability to population 292means, ensures that the Wald estimator of ! is consistent (i.e., converges in 293probability to !).4 294The constant-effects assumption is clearly unrealistic. We`d like to allow for the 295fact that some men change their behavior in response to coddling, while others are 296affected little or not at all. There is also important heterogeneity in treatment de- 297livery. Some suspects would have been coddled with or without the experimental 298manipulation, while others were coddled only because the police were instructed to 299treat them this way. The MDVE is informative about causal effects only on this 300latter group. 301Imbens and Angrist (1994) showed that in a world of heterogeneous treatment 302effects, IV methods capture the average causal effect of delivered treatments on the 303subset of treated men whose delivered treatment status can be changed by the 304random assignment of intended treatment status. The men in this group are called 305compliers, a term introduced in the IV context by Angrist et al. (1996). In a 306randomized drug trial, for example, compliers are those who Ftake their medicine_ 307when randomly assigned to do so, but not otherwise. In the MDVE, compliers were 308coddled when randomly assigned to be coddled but would not have been coddled 309otherwise. 310The average causal effect for compliers is called a local average treatment effect 311(LATE). Formal description of LATE requires one more bit of notation. Define 312potential treatment assignments D0i and D1i to be individual i`s treatment status 313when Zi equals 0 or 1. Note that one of D0i or D1i is necessarily counterfactual 314since observed treatment status is Di ¼ D0i þ Zi D1ijD0ið Þ: ð4Þ 316In this setup, the key assumptions supporting causal inference are: (1) conditional 317independence, i.e., that the joint distribution of {Y1i, Y0i, D1i, D0i} is independent of 318Zi; and, (2) monotonicity, which requires that either D1i Q D0i for all i or vice versa. 319Assume without loss of generality that monotonicity holds with D1i Q D0i. 320Monotonicity requires that, while the instrument might have no effect on some 321individuals, all of those affected are affected in the same way. Monotonicity in the 322MDVE amounts to assuming that random assignment to be coddled can only make 323coddling more likely, an assumption that seems plausible. Given these two iden- 324tifying assumptions, the Wald estimator consistently estimates LATE, which is 325written formally as E[Y1ijY0i | D1i 9 D0i].5 JOSHUA D. ANGRIST8 U N C O R R EC TED PR O O F 326Compliers are those with D1i 9 D0i, i.e., they have D1i = 1 and D0i = 0. The 327monotonicity assumption partitions the world of experimental subjects into three 328groups: compliers who are affected by random assignment and two unaffected 329groups. The first unaffected group consists of always-takers, i.e., subjects with 330D1i = D0i = 1. The second unaffected group consists of never-takers, i.e., subjects 331with D1i = D0i = 0. Because the treatment status of always-takers and never-takers 332is invariant to random assignment, IV estimates are uninformative about treatment 333effects for subjects in these groups. 334In general, LATE is not the same as ATET, the average causal effect on all treated 335individuals.Note fromEquation(4)that thetreatedcanbedividedintotwo groups:the 336set of subjects with D0i = 1, and the set of subjects with D0i = 0, D1i = 1, and Zi = 1. 337Subjects in the first set, with D0i = 1, are always-takers since D0i = 1 implies 338D1i = 1 by monotonicity. The second set consists of compliers with Zi = 1. By 339virtue of the random assignment of Zi, the average causal effect on compliers with 340Zi = 1 is the same as the average causal effects for all compliers. In general, 341therefore, ATET differs from LATE because it is a weighted average of two 342effects: those on always-takers as well as those on compliers. 343An important special case when LATE equals ATET is when D0i equals zero 344for everybody, i.e., there are no always-takers. This occurs in randomized trials 345with one-sided non-compliance, a scenario that typically arises because no one in 346the control group receives treatment. If no one in the control group receives 347treatment, then by definition there can be no always-takers. Hence, all treated 348subjects must be compliers. The MDVE is (approximately) this sort of experiment. 349Since we have defined treatment as coddling, and (almost) no one in the group 350assigned to be arrested was coddled, there are (almost) no always-takers. LATE is 351therefore ATET, the effect of coddling on the population coddled.6 352The language of 2SLS 353Applied economists typically discuss IV using the language of two-stage least 354(2SLS), a generalized IV estimator introduced by Theil (1953) in the context of 355simultaneous equation models. In models without covariates, the 2SLS estimator 356using a dummy instrument is the same as the Wald estimator. In models with 357exogenous covariates, 2SLS provides a simple and easily implemented general- 358ization that also allows for multiple instruments and multiple treatments. 359Suppose the setup is the same as before, with the modification that we’d now 360like to control for a vector of covariates, Xi. In particular, suppose that if Di had 361been randomly assigned as intended, we’d be interested in a regression-adjusted 362treatment effect computed by ordinary least squares (OLS) estimation of the 363equation: Yi ¼ Xi 0 " þ !Di þ "i: ð5Þ IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 9 U N C O R R EC TED PR O O F 366In 2SLS language, Equation (5) is the structural equation of interest. Note that the 367causal effect in this model is the effect of being coddled on recidivism, relative to 368the baseline recidivism rate when arrested. 369The two most likely rationales for including covariates in an equation like 370Equation (5) are: (1) that treatment was randomly assigned conditional on these 371covariates, and, (2) a possible statistical efficiency gain (i.e., reduced sampling 372variance). In the MDVE, for example, the coddling treatment might have been 373randomly assigned with higher probability to suspects with no prior history of 374assault. We_d then need to control for assault history in the IV analysis. Efficiency 375gains are a consequence of the fact that regression standard errors Y whether 2SLS 376or OLS Y are proportional to the variance of the residual, "i. The residual variance 377is typically reduced by the covariates, as long as the covariates have some power to 378predict outcomes.7 379In principle, we can construct 2SLS estimates in two steps, each involving an 380OLS regression. In the first stage, the endogenous right-hand side variable (treat- 381ment delivered in the MDVE) is regressed on the Fexogenous_ covariates plus the 382instrument (or instruments). This regression can be written Di ¼ X¶ i :0 þ :1Zi þ )i: ð6Þ 385The coefficient on the instrument in this equation, :1, is called the Ffirst-stage 386effect_ of the instrument. Note that the first-stage equation must include exactly the 387same exogenous covariates as appear in the structural equation.8 The size of the 388first-stage effect is a major determinant of the statistical precision of IV estimates. 389Moreover, in a model with dummy endogenous variables like the treatment dum 390my analyzed here, the first-stage effect measures the proportion of the population 391that are compliers.9 392In the second stage, fitted values from the first-stage are plugged directly into 393the structural equation in place of the endogenous regressor. Although the term 3942SLS arises from the fact that 2SLS estimates can be constructed from two OLS 395regressions, we don’t usually compute them this way. This is because the resulting 396standard errors are incorrect. Best practice therefore is to use a packaged 2SLS 397routine such as may be found in software like SAS or Stata. 398In addition to the first-stage, an important auxiliary equation that is often dis- 399cussed in the context of 2SLS is the reduced form. The reduced form for Yi is the 400regression obtained by substituting the first-stage into the causal model for Yi, in 401this case, Equation (5). In the MDVE, we can write the reduced form as Yi ¼ X0 i" þ ! X0 i:0 þ :1Zi þ )i  à þ "i ¼ X0 i %0 þ %1Zi þ vi: ð7Þ 404The coefficient %1 is said to be the Freduced-form effect_ of the instrument. Like the 405first stage, the reduced form parameters can estimated by OLS, i.e., by simply 406running a regression. JOSHUA D. ANGRIST10 U N C O R R EC TED PR O O F 407Note that with a single endogenous variable and a single instrument, the causal 408effect of Di in the causal model is the ratio of reduced-form to first-stage effects: ! ¼ %1=:1: 411In a randomized trial with imperfect compliance, the reduced-form effect is also 412the ITT effect. More generally, 2SLS second-stage estimates can be understood as 413a re-scaling of the reduced form. It can also be shown that the significance levels 414for the reduced-form and the second-stage are asymptotically the same under the 415null hypothesis of no treatment effect. Hence, the workingman’s IV motto: BIf you 416can’t see your causal effect in the reduced form, it ain’t there.[ 417On final reason for looking at the reduced form is that Y in contrast with the 4182SLS estimates themselves Y the reduced form estimates have all the attractive 419statistical properties of any ordinary least squares regression estimates. In 420particular, estimates of reduced form regression coefficients are unbiased (i.e., 421centered on the population parameter in repeated samples) and that the statistical 422theory that justifies statistical inference for these coefficients (i.e., confidence 423intervals and hypothesis testing) does not require large samples. 2SLS estimates on 424the other hand, are not unbiased, although they are consistent. This means that in 425large samples, the sample estimates can be expected to be close to the target 426population parameter. Moreover, the statistical theory that justifies confidence in- 427tervals and hypothesis testing for 2SLS requires that samples be large enough for a 428reasonably good asymptotic approximation (in particular, for application of central 429limit theorems). 430How large a sample is large enough for asymptotic statistical theory to work? 431Unfortunately, there is no general answer to this question. Various theoretical 432arguments and simulations studies have shown, however, that the asymptotic 433approximations used for 2SLS inference are usually reasonably accurate in models 434where the number of instruments is equal to (or not much more than) the number 435of endogenous variables (as would be the case in studies using randomly assigned 436intention to treat as an instrument for treatment delivered). Also, that the key to t2.1Table 2. First stage and reduced forms for Model 1. Endogenous variable is coddled t2.2 First stage Reduced form (ITT) t2.3 (1) (2)* (3) (4)* t2.4 Coddled-assigned 0.786 (0.043) 0.773 (0.043) 0.114 (0.047) 0.108 (0.041) t2.5 Weapon j0.064 (0.045) j0.004 (0.042) t2.6 Chem. influence j0.088 (0.040) 0.052 (0.038) t2.7 Dep. var. mean 0.567 (CoddledYdelivered) 0.178 (V Failed) t2.8 t2.9The table reports OLS estimates of the first-stage and reduced form for Model 1 in the text. *Other covariates include year and quarter dummies, and dummies for non-white and mixed race. IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 11 U N C O R R EC TED PR O O F 437valid inference is a strong first stage, say a t-statistic for the coefficient on the 438instrumental variable in the first-stage equation of at least 3. For further discussion 439of statistical inference with 2SLS, see Angrist and Krueger (2001). 4402SLS Estimates for MDVE with one endogenous variable 441The first-stage effect of being assigned to the coddling treatment is .79 in a model 442without covariates and .77 in a model that controls for a few covariates.10 These 443first-stage effects can be seen in the first two columns of Table 2, which report 444estimates of Equation (6) for the MDVE. The reduced form effects of random 445assignment to the coddling treatment, reported in columns 3 and 4, are about .11, 446and significantly different from zero with standard errors of .041Y.047. The first- 447stage and reduced-form estimates change little when covariates are added to the 448model, as expected since Zi was randomly assigned. The 2SLS results derived from 449these first-stage and reduced form estimates are reported in Table 3. 450Before turning to a detailed discussion of the 2SLS results, one caveat is in 451order: for simplicity, I discuss these estimates as if they were constructed in the 452usual way, i.e., by estimating Equations (5), (6), and (7) using micro-data. In 453reality, however, I was unable to locate or construct the original recidivism var- 454iable from the MDVE public-use data sets (Berk and Sherman, 1993). I therefore 455generated my own micro-data on recidivism from the Logit coefficients reported 456in Berk and Sherman (1988, Tables 4 and 6). Note that the Logistic regression, 457of, say Yi on Di implicitly determines the conditional mean of Yi given Di (by 458inverting the logistic transformation of fitted values, a simple mathematical 459operation). Because Yi in this case is dummy variable, this conditional mean is also 460the conditional distribution of Yi given Di. It is therefore straightforward to 461construct, by sampling from this distribution, a sample with same joint distribution 462of Yi and Di (or Yi and Zi) as must have appeared in Berk and Sherman’s original 463data set. 464By virtue of this re-sampling scheme, my data set indeed has the same joint 465distributions of {Yi, Di), and {Yi, Zi} as the original Berk and Sherman (1988) data. t3.1Table 3. OLS and 2SLS estimates for Model 1. Endogenous variable is coddled t3.2 OLS IV/2SLS t3.3 (1) (2)* (3) (4)* t3.4 CoddledYdelivered 0.087 (0.044) 0.070 (0.038) 0.145 (0.060) 0.140 (0.053) t3.5 Weapon 0.010 (0.043) 0.005 (0.043) t3.6 Chem. influence 0.057 (0.039) 0.064 (0.039) t3.7 t3.8The Table reports OLS and 2SLS estimates of the structural equation in Model 1. *Other covariates include year and quarter dummies, and dummies for non-white and mixed race. JOSHUA D. ANGRIST12 U N C O R R EC TED PR O O F 466My data set also has the same distribution of {Di, Xi} and {Zi, Xi} as in the original 467data since the observations I use on {Di, Zi, Xi} are taken directly from the original 468data set, available from the ICPSR web site. Importantly, my first-stage estimates 469are therefore unaffected by the use of the data on Yi that I had to construct by 470sampling from the probability distributions implied by their models (a consequence 471of the fact that the first stage does not involve Yi. The only information lost in my 472reconstruction of the Berk and Sherman outcomes data is a consequence of the fact 473that I must assume that the conditional distributions of Yi given {Di, Xi} and of Yi 474given {Zi, Xi} do not depend on the covariates, Xi. Thus, for models without 475covariates, estimates using my data should be identical to those that would have 476been generated by the original data set. Given the random assignment of Z, 477however, the estimates using my data should also be similar even for models with 478covariates. 479The 2SLS estimates associated with the first stage and reduced form estimates 480in Table 2 are .14Y.145. The 2SLS estimates, reported in columns 3Y4 of Table 3, 481are about double the size of the corresponding OLS estimates of the effects of 482delivered treatments, reported in columns 1Y2 of the same table. Recall that the 4832SLS estimates in columns 3 an 4 of Table 3 are essentially a rescaling of the 484reduced form estimates reported in columns 3 and 4 of Table 2. The 2SLS 485estimates are implicitly calculated by dividing the reduced form (or ITT) estimates 486by the first-stage estimates (or difference in compliance rates between the original 487treatment and control groups). 488The OLS estimates are almost certainly too low, probably because delivered 489treatments were contaminated by selection bias. The reduced form effect of t4.1Table 4. First stage and reduced forms for Model 2. Two endogenous variables: Advise, separate t4.2 First stages Reduced form (ITT)Advised Separated t4.4 (1) (2) (3) (4) (5) (6) t4.5 Advise- assigned 0.778 (0.039) 0.766 (0.039) 0.035 (0.043) 0.035 (0.043) 0.097 (0.054) 0.088 (0.046) t4.6 Separate- assigned 0.044 (0.038) 0.031 (0.039) 0.717 (0.042) 0.715 (0.043) 0.130 (0.053) 0.127 (0.046) t4.7 Weapon j0.038 (0.036) j0.031 (0.039) j0.001 (0.042) t4.8 Chem. influence j0.068 (0.032) j0.018 (0.035) 0.051 (0.038) t4.9 Dep. var. mean 0.283 (Adv.-deliver) 0.283 (Sep.-deliver) 0.178 (Failed) t4.10 t4.11The table reports OLS estimates of the first-stage and reduced form for Model 2 in the text. In addition to the covariates reported in the table, these models include year and quarter dummies, and dummies for non-white and mixed race. IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 13 U N C O R R EC TED PR O O F 490coddling is also too small, relative to the causal effect of coddling per se, because 491non-compliance dilutes ITT effects. As noted above, the 2SLS estimates in this 492case capture the causal effect of coddling on the coddled, undiluted by non- 493compliance and unaffected by selection bias. The 2SLS estimates point a dramatic 494increase in re-offense rates due to coddling (the mean re-offense rate was .18). The 495magnitude of this effect is clearly understated by alternative estimation strat- 496egies.11 497At this point, it bears emphasizing that even though treatments and outcomes 498are dummy variables, I used linear models for every step of the analysis underlying 499Tables 2 and 3 (and Table 4, discussed below). To see why, it helps to bear in mind 500that the purpose of causal inference is the estimation of average treatment effects 501and not prediction of individual outcomes per se. First, whenever you have a 502complete set of dummy variables on the right hand side of a regression equation (a 503scenario known as a saturated model), linear probability models estimate the 504underlying conditional mean function perfectly. A model for the effect of a single 505dummy treatment or a set of mutually exclusive dummy treatments is the simplest 506sort of saturated model. Hence there is no point to the use of more complex 507nonlinear models. You cannot improve on perfection. 508Another way to see why linear models are appropriate in this context is to 509suppose that instead of an OLS regression of Yi on Di, we were to estimate (for 510example) the corresponding Probit regression. The Probit conditional mean 511function in this case is E[Yi | Di] = 6[.0 + .1Di], where 6[Á] is the Normal 512distribution function. But since Di is a dummy variable, this conditional mean 513function can be rewritten as a linear model: E YijDi½ Š ¼ 6 .0½ Š þ 6 .0 þ .1Š À 6½.0Š½ ÞDi:ð 516Thus, the Probit estimate of the treatment effect of Di is 6[.0 + .1] j 6[.0]). But 517since the conditional mean function is linear in Di, this is exactly what the OLS 518regression of Yi on Di, will produce. In other words, the slope coefficient in the 519OLS regression will equal 6[.0 + .1] j 6[.0]. In fact, all models, will generate 520the same marginal effect of Di. 521In more complicated models, with additional covariates, some of which are not 522dummy variables, or when the model is not fully saturated, it is no longer the case 523that Probit and OLS will produce exactly the same treatment effects (again, it’s 524worth emphasizing that it is these effects that are of interest; the Probit coefficients 525themselves mean little). But in practice, the treatment effects generated by 526nonlinear models are likely to be indistinguishable from OLS regression 527coefficients. See, for example, the comparison of Probit and regression estimates 528in Angrist (2001). This close relation is a consequence of a very general regression 529property Y no matter what the shape of the conditional mean function you are 530trying to estimate, OLS regression always provides the minimum mean square 531approximation to it (see, e.g., Goldberger, 1991). 532The case for using 2SLS to estimate linear probability models with dummy 533endogenous variables is slightly more involved than the case for using OLS JOSHUA D. ANGRIST14 U N C O R R EC TED PR O O F 534regression to estimate models without endogenous variables. Nevertheless, the 535argument is essentially similar in that use of linear models: even with binary 536outcomes like recidivism, linear 2SLS estimates have a robust causal interpretation 537that is insensitive to the possible nonlinearity induced by dummy dependent 538variables. For example, the interpretation of IV as estimating LATE is unaffected 539by the fact that the outcome is a dummy. Likewise, consistency of 2SLS estimates 540is unaffected by the possible nonlinearity of the first-stage conditional expectation 541function, E[Di | Xi, Zi]. For details, see Angrist (2001), which also offers some 542simple nonlinear alternatives for those who insist. 5432SLS estimates with two endogenous variables 544The analysis so far looks at the MDVE as if it involved a single treatment. I now 545turn to a 2SLS model that more realistically allows for distinct causal effects for 546the two types of coddling that were randomly assigned, separation and advice. A 547natural generalization of Equation (5) incorporating distinct causal effects for these 548two interventions is Yi ¼ X0 i" þ !aDai þ !sDsi þ "i; ð8Þ 550where Dai and Dsi are dummies that indicate delivery of advice and separation. As 551before, because of the endogeneity of delivered treatments, OLS estimates of 552Equation (8) are likely to be misleading. Again, the causal effects of interest are 553the effects of advice and separation relative to the baseline recidivism rate when 554arrested. The potential outcomes that motivate Equation (8) as a causal model 555describe each suspect’s recidivism status had he been assigned to one of three 556possible treatments (arrest, advise, separate). 557Equation (8) is a structural model with two endogenous regressors, Dai and Dsi. 558We also have two possible instruments, Zai and Zsi, dummy variables indicating 559random assignment to advice and separation as intended treatments. The cor- 560responding first-stage equations are Dai ¼ X0 i :0a þ :aaZai þ :asZsi þ ai ð9aÞ Dsi ¼ X0 i :0s þ :saZai þ :ssZsi þ si; ð9bÞ 564where :aa and :as are the first-stage effects of the two instruments on delivered 565advice, Dai, and :sa and :ss are the first-stage effects of the two instruments on 566delivered separation, Dsi. 567The reduced form equation for this two-endogenous-variables setup is obtained 568by substituting Equations (9a) and (9b) into Equation (8). Similarly, the second 569stage is obtained by substituting fitted values from the first stages into the structural 570equation.12 Note that in a model with two endogenous variables we must have at 571least two instruments for the second stage estimates to exist.13 Assuming the 572second stage estimates exist, which is equivalent to saying that the structural IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 15 U N C O R R EC TED PR O O F 573equation is identified, the 2SLS estimates in this case can be interpreted as 574capturing the covariate-adjusted causal effects of each delivered treatment on those 575who comply with random assignment. 576Random assignment to receive advice increased the likelihood of actually re- 577ceiving this treatment by .78. Assignment to the separation treatment also in- 578creased the likelihood of receiving advice, but this effect is small and not 579significantly different from zero. These results can be seen in columns 1Y2 of Table 5804, which report the estimates of first-stage effects from Equation (9a). The cor- 581responding estimates of Equation (9b), reported in columns 3Y4 of the table, show 582that assignment to the separation treatment increased delivered separation rates by 583about .72, while assignment to advice had almost no effect on the likelihood of 584receiving the separation treatment. The reduced form effects of random assignment 585to receive advice range from .088Y.097, while the reduced form estimates of ran- 586dom assignment to be separated are about .13. The reduced form estimates are 587reported in columns 5Y6 of the table. 588OLS and 2SLS estimates of the two-endogenous-variables model are reported in 589Table 5. Interestingly, the OLS estimates of the effect of delivered advice on re- 590offense rates are small and not significantly different from zero. The OLS estimates 591of the effect of being separated are more than twice as large and significant. Both 592of these results are reported in columns 1Y2 of the table. In contrast with the OLS 593effects, the 2SLS estimates of the effects of both types of treatment are substantial 594and at least marginally significant. For example, the 2SLS estimate of the impact of 595the advice intervention is .107 (SE = .059) in a model with covariates. The 2SLS 596estimate of the impact of separation is even larger, at around .17. 597As in the model with a single endogenous variable, the reduced-form estimates 598of intended treatment effects are larger than the corresponding OLS estimates of 599delivered treatment effects, and the 2SLS estimates are larger than the cor- 600responding reduced forms. The gap between OLS and 2SLS is especially large for t5.1Table 5. OLS and 2SLS estimates for Model 2. Two endogenous variables: Advise, separate t5.2 OLS IV/2SLS t5.3 (1) (2) (3) (4) t5.4 Advise-assigned 0.047 (0.052) 0.019 (0.046) 0.116 (0.068) 0.107 (0.059) t5.5 Separate-assigned 0.126 (0.052) 0.120 (0.046) 0.174 (0.073) 0.174 (0.063) t5.6 Weapon 0.015 (0.043) 0.008 (0.043) t5.7 Chem. influence 0.052 (0.039) 0.061 (0.039) t5.8 Test F = 1.87 F = 4.14 F = .64 F = 1.14 t5.9 Advise = separate p = .170 p = .043 p = .420 p = .290 t5.10 t5.11The Table reports OLS and 2SLS estimates of the structural equation in Model 2. In addition to the covariates reported in the table, these models include year and quarter dummies, and dummies for nonwhite and mixed race. JOSHUA D. ANGRIST16 U N C O R R EC TED PR O O F 601the advice effects, suggesting that the OLS estimates of the effect of receiving 602advice are more highly contaminated by selection bias than the OLS estimates of 603the effect of separation. Moreover, the difference between the separation and ad- 604vice treatment effects is much larger when estimated by 2SLS than in the reduced 605form. 606Does anything new come out of this IV analysis of the MDVE? Two findings 607seem important. First, a comparison of 2SLS estimates with estimates that ignore 608the endogeneity of treatment delivered indicate considerable selection bias in the 609latter. In particular, the 2SLS estimates of the effect of coddling are about twice as 610large as the corresponding OLS estimates, largely due to the fact that the suspects 611who were coddled were those least likely to re-offend anyway. The IV framework 612corrects for this important source of bias. A related point is that the ITT effects Y 613equivalently, the 2SLS reduced form estimates Y are not a fair comparison for 614gauging selection bias. Although ITT effects have a valid causal interpretation (i.e., 615they preserve random assignment), they are diluted by non-compliance. OLS es- 616timates of the effect for treatment delivered, while contaminated by selection bias, 617are not similarly diluted. The second major finding, and one clearly related to the 618first, is that non-compliance was important enough to matter; in some cases, the 6192SLS estimates are as much as one-third larger than the corresponding ITT effects. 620Based on these results, the evidence for a deterrent effect of arrest is even stronger 621than previously believed. 622Models with variable treatment intensity and observational studies 623In closing, it bears emphasizing that IV methods are not limited to the estimation 624of the effects of binary, on-or-off treatments like coddling, separation, or advice in 625the MDVE. Many experimental evaluations are concerned with the effects of in- 626terventions with variable treatment intensity, i.e., the effects of an endogenous 627variable that takes on ordered integer values. Applications of IV to these sorts of 628interventions include Krueger’s (1999) analysis of experimental estimates of the 629effects of class size, the Permutt and Hebel (1989) study of an experiment to 630reduce the number of cigarettes smoked by pregnant women, and the Powers and 631Swinton (1984) randomized study of the effect of hours of preparation for the GRE 632test. 633The studies mentioned above use 2SLS or related IV methods to analyze data 634from randomized trials where the treatment of interest takes on values like 0, 1, 2, 635... (cigarettes, hours of study) or 15, 16, 17 ... (class size). Although these papers 636interpret IV estimates using traditional constant-effects models, the 2SLS estimates 637they report also have a more general LATE interpretation. In particular, 2SLS 638estimates of models with variable treatment intensity give the average causal 639response for compliers along the length of the underlying causal response function. 640See Angrist and Imbens (1995) for details. 641The IV framework also goes beyond randomized trials and can be used to 642exploit quasi-experimental variation in observational studies. An example from my IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 17 U N C O R R EC TED PR O O F 643own work is Angrist (1990) , which uses the draft lottery numbers that were ran- 644domly assigned in the early 1970s as instrumental variables for the effect of Viet- 645nam-era veteran status on post-service earnings. Draft lottery numbers are highly 646correlated with veteran status among men born in the early 1950s, and probably 647unrelated to earnings for any other reason. 648A second example from my portfolio illustrates the fact that instrumental 649variables need not be randomly assigned to be useful.14 Angrist and Lavy (1999) 650used something called Maimonides’ Rule to construct instrumental variables for 651the effects of class size on test scores. The instrument in this case is the class size 652predicted using Maimonides rule, a mathematical formula derived from the prac- 653tice in Israeli elementary schools of dividing grade cohorts by integer multiples of 65440, the maximum class size (the same rule proposed by Maimonides in his 655Mishneh Torah biblical commentary). This study can be seen as an application of 656Campbell’s (1969) celebrated regression-discontinuity design for quasi-experi- 657mental research, but also as a type of IV. The extension of IV methods to quasi- 658experimental criminological research designs seems an especially promising 659avenue for further work. 660Acknowledgements 661Special thanks to Richard Berk, Howard Bloom, David Weisburd, and the par- 662ticipants in the May 2005 Jerry Lee Conference for the stimulating discussions that 663led to this paper. Thanks also to the editor and three anonymous referees for 664helpful comments. 666Notes 6671 Social experiments in labor economics, which are never double or even single-blind, 668often allow those selected for treatment to opt out (an example is the Illinois 669unemployment insurance bonuses experiment; see Woodbury and Spiegelman 1987). 670And even in double-blind clinical trials, clinicians sometimes decipher and change 671treatment assignments (Schultz 1995). 6722 The brief discussion in this paper glosses over a number of technical details. For a more 673comprehensive introduction to IV see Angrist and Krueger (2001, 1999), or the chapters 674on IV in Wooldridge (2003). 6753 The fact that those who comply with randomly assigned treatments are special can be 676seen in medical trials, where those who comply with protocol by taking a randomly 677assigned experimental treatment with no clinical effects Y i.e., a placebo Y are often 678healthier than those who don_t (as in the study analyzed by Efron and Feldman 1991). 679Efron and Feldman use the placebo sample in an attempt to characterize those who 680comply with treatment assignment directly, but placebo-controlled trials are unusual in JOSHUA D. ANGRIST18 U N C O R R EC TED PR O O F 681social science. Luckily, however, at least as far as solving the compliance problem goes, 682they are unnecessary. 6834 An estimator is said to be consistent when the limit (as a function of sample size) of the 684probability it is close to the population parameter being estimated is 1. In other words, a 685consistent estimate can be taken to be close to the parameter of interest in large samples. 686Note that consistency is not the same as unbiasedness; an unbiased estimator has a 687sampling distribution centered on the parameter of interest in a sample of any size. I 688briefly discuss this point further below. 6895 In econometrics, a parameter is said to be Fidentified_ when it can be constructed from 690the joint distribution of observed random variables. Assumptions that allow a parameter 691to be identified are called Fidentifying assumptions._ The identifying assumptions for IV, 692independence and monotonicity, allow us to construct LATE from the joint distribution 693of {Yi, Di, Zi}. 6946 The fact that a randomized trial with one-sided non-compliance can be used to estimate 695the effect of treatment on the treated was first noted by Bloom (1984). 6967 The causal (LATE) interpretation of IV estimates is similar in models with and without 697covariates. See Angrist and Imbens (1995) or Abadie (2003) for details. 6988 If the first stage includes covariates omitted from the second stage, then the covariates 699are, in fact, playing the role of instruments. If, on the other hand, any covariates included 700in the second stage are omitted from the first stage, then the first stage residuals, which 701necessarily end up in the second stage error term, are correlated with covariates, biasing 702all second-stage estimates. See e.g., Wooldridge (2003). 7039 Formally, this is because without covariates, E[D1ijD0i] = :1. With covariates, 704E[D1ijD0i | Xi] = :1 if the first stage is linear and additive in covariates, and, more 705generally, E{E[D1ijD0i | Xi]} % :1. 70610 The covariates are dummies for the presence of a weapon and whether the suspect was 707under chemical influence, year and quarter dummies for time of follow-up, and dummies 708for suspects_ race (non-white and mixed). 70911 Rossi et al. (1980) present an IV-type analysis of a stipend program for ex-offenders. 710Their analysis deviates from an orthodox 2SLS procedure in a number of respects, 711however. First, they include potentially endogenous outcome variables on the right-hand 712side as if these were covariates. Second, they use nonlinear models (e.g., Tobit) to which 713IV methods do not easily transfer and which are, in any case, not well-suited to the sort 714of question they are addressing. 71512 With multiple endogenous variables, the second stage estimates can no longer be obtained 716as the ratio of reduced form to first-stage coefficients, but rather solve a matrix equation. 717Again, the best strategy for real empirical work is to use packaged 2SLS software. 71813 The second stage has a regression design matrix with number of columns equal to 719dim(Xi) + 2. This matrix must be of full column rank for the second stage to exist. The 720rank of the design matrix is equal to the number of linearly independent columns in the 721matrix. This can be no more than dim(Xi) plus the number of instruments, since the fitted 722values used in the second step are linear combinations of Xi and the instruments. Hence 723the need for at least K instruments when there are K endogenous variables. 72414 A pioneering illustration of this point from criminology is Levitt_s (1997) study of the 725effects of extra policing using municipal election cycles to create instruments for 726numbers of police. See also McCrary (2002), who discusses a technical problem with 727Levitt_s original analysis. Recent applications of IV in criminology include Snow-Jones 728and Gondolf (2002), Gottfredson (2005), and White (2005). IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 19 U N C O R R EC TED PR O O F 730References 731Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response 732models. Journal of Econometrics 113(2), 231Y263. 733Angrist, J. D. (1990). Lifetime earnings and the Vietnam era draft lottery: Evidence from 734social security administrative records. American Economic Review 80(3), 313Y335. 735Angrist, J. D. (2001). Estimation of limited dependent variable models with dummy 736endogenous regressors: Simple strategies for empirical practice. Journal of Business and 737Economic Statistics 19(1), 2Y16. 738Angrist, J. D. & Imbens, G. W. (1995). Two-stage least squares estimates of average causal 739effects in models with variable treatment intensity. Journal of the American Statistical 740Association 90(430), 431Y442. 741Angrist, J. D. & Krueger, A. B. (1999). Empirical strategies in labor economics. In O. 742Ashenfelter & D. Card (Eds.), Handbook of labor economics, Volume IIIA (pp. 7431277Y1366). Amsterdam: North-Holland. 744Angrist, J. D. & Krueger, A. B. (2001). Instrumental variables and the search for 745identification. Journal of Economic Perspectives 15(4), 69Y86. 746Angrist, J. D. & Lavy, V. C. (1999). Using Maimonides’ rule to estimate the effect of class 747size on student achievement. Quarterly Journal of Economics 114(2), 533Y575. 748Angrist, J. D. & Lavy, V. C. (2002). The effect of high school matriculation awards Y 749Evidence from randomized trials,[ NBER Working Paper 9389, December. 750Angrist, J. D., Imbens, G. W. & Rubin, D. B. (1996). Identification of causal effects using 751instrumental variables. Journal of the American Statistical Association 91(434), 444Y455. 752Berk, R. A. & Sherman, L. W. (1988). Police response to family violence incidents: An 753analysis of an experimental design with incomplete randomization. Journal of the 754American Statistical Association 83(401), 70Y76. 755Berk, R. A. & Sherman, L. W. (1993). Specific Deterrent Effects Of Arrest For Domestic 756Assault: Minneapolis, 1981Y1982 [Computer file]. Conducted by the Police Foundation. 7572nd ICPSR ed. Ann Arbor, Michigan: Inter-university Consortium for Political and Social 758Research [producer and distributor]. 759Berk, R. A., Smyth, G. K. & Sherman, L. W. (1988). When random assignment fails: Some 760lessons from the Minneapolis spouse abuse experiment. Journal of Quantitative 761Criminology 4(3), 209Y223. 762Bloom, H. S. (1984). Accounting for no-shows in experimental evaluation designs. 763Evaluation Review 8(2), 225Y246. 764Boruch, R., De Moya, D. & Snyder, B. (2002). The importance of randomized field trials in 765education and related areas. In F. Mosteller & R. Boruch (Eds.), Evidence matters: 766randomized trials in education research. Washington, DC: Brookings Institution. 767Campbell, D. T. (1969). Reforms as experiments. American Psychologist 24, 409Y429. 768Cook, T. D. (2001). Sciencephobia: Why education researchers reject randomized experi- 769ments, Education Next http://www.educationnext.org), Fall, 63Y68. 770Efron, B. & Feldman, D. (1991). Compliance as an explanatory variable in clinical trials. 771Journal of the American Statistical Association 86(413), 9Y17. 772Farrington, D. P. (1983). Randomized experiments on crime and justice. In M. H. Tonry & 773N. Morris (Eds.), Crime and justice. Chicago: University of Chicago Press. 774Farrington, D. P. & Welsh, B. C. (2005). Randomized experiments in criminology: What 775have we learned in the last two decades? Journal of Experimental Criminology 1, 9Y38. JOSHUA D. ANGRIST20 U N C O R R EC TED PR O O F 776Gartin, P. R. (1995). Dealing with design failures in randomized field experiments: Analytic 777issues regarding the evaluation of treatment effects. Journal of Research in Crime and 778Delinquency 32(4), 425Y445. 779Goldberger, A. S. (1991). A course in econometrics. Cambridge, MA: Harvard University 780Press. 781Gottfredson, D. C. (2005). Long-term Effects of Participation in the Baltimore City Drug 782Treatment Court: Results from an Experimental Study,[ University of Maryland, 783Department of Criminology and Criminal Justice,[ mimeo, October 2005. 784Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical 785Association 81(396), 945Y970. 786Imbens, G. W. & Angrist, J. D. (1994). Identification and estimation of local average 787treatment effects. Econometrica 62(2), 467Y475. 788Krueger, A. B. (1999). Experimental estimates of education production functions. Quarterly 789Journal of Economics 114(2), 497Y532. 790Levitt, S. D. (1997). Using electoral cycles in police hiring to estimate the effects of police 791on crime. American Economic Review 87(3), 270Y290. 792McCrary, J. (2002). Using electoral cycles in police hiring to estimate the effects of police 793on crime: comment. American Economic Review 92(4), 1236Y1243. 794Permutt, T. & Richard Hebel, J. (1989). Simultaneous-equation estimation in a clinical trial 795of the effect of smoking on birth weight. Biometrics 45(2), 619Y622. 796Powers, D. E. & Swinton, S. S. (1984). Effects of self-study for coachable test item types. 797Journal of Educational Psychology 76(2), 266Y278. 798Rezmovic, E. L., Cook, T. J. & Douglas Dobson, L. (1981). Beyond random assignment: 799Factors affecting evaluation integrity. Evaluation Review 5(1), 51Y67. 800Rossi, P. H., Berk, R. A. & Lenihan, K. J. (1980). Money, work, and crime: Experimental 801evidence. New York: Academic. 802Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and non- 803randomized studies. Journal of Educational Psychology 66, 688Y701. 804Rubin, D. B. (1977). Assignment to a treatment group on the basis of a covariate. Journal of 805Educational Statistics 2, 1Y26. 806Sherman, L. W. & Berk, R. A. (1984). The specific deterrent effects of arrest for domestic 807assault. American Sociological Review 49(2), 261Y272. 808Snow Jones, A. & Gondolf, E. (2002). Assessing the effect of batterer program completion 809on reassault: An instrumental variables analysis. Journal of Quantitative Criminology 18, 81071Y98. 811Theil, H. (1953). Repeated least squares applied to complete equation systems. The Hague: 812Central Planning Bureau. 813Wald, A. (1940). The fitting of straight lines if both variables are subject to error. Annals of 814Mathematical Statistics 11, 284Y300. 815Weisburd, D. L. (2003). Ethical practice and evaluation of interventions in crime and justice: 816The moral imperative for randomized trials. Evaluation Review 27(3), 336Y354. 817Weisburd, D. L., Lum, C. & Petrosino, A. (2001). Does research design affect study 818outcomes in criminal justice? Annals of the American Academy of Political and Social 819Science 578(6), 50Y70. 820White, M. J. (2005). Acupuncture in Drug Treatment: Exploring its Role and Impact on 821Participant Behavior in the Drug Court Setting, John Jay College of Criminal Justice, City 822University of New York, mimeo. IV METHODS IN EXPERIMENTAL CRIMINOLOGICAL RESEARCH 21 U N C O R R EC TED PR O O F 823Woodbury, S. A. & Spiegelman, R. G. (1987). Bonuses to workers and employers to reduce 824unemployment: Randomized trials in Illinois. American Economic Review 77(4), 825513Y530. 826Wooldridge, J. (2003). Introductory econometrics: A modern approach. Cincinnati, OH: Thomson South-Western. About the author Joshua Angrist is a Professor of Economics at MIT and a Research Associate in the NBER’s programs on Children, Education, and Labor Studies. A dual U.S. and Israeli citizen, he taught at the Hebrew University of Jerusalem before coming to MIT. He holds a B.A. from Oberlin College and also spent time as an undergraduate studying at the London School of Economics and as a Masters student at Hebrew University. He completed his PhD in Economics at Princeton in 1989 and his first academic job was as an Assistant Professor at Harvard from 1989Y1991. Angrist’s research interest include the effects of school inputs and organization on student achievement, the impact of education and social programs on the labor market, immigration, labor market regulation and institutions, and econometric methods for program and policy evaluation. Although many of his papers use data from other countries, he does not especially like to travel and prefers to get data in the mail. He is also a Fellow of the Econometric Society, and a Co-editor of the Journal of Labor Economics. Angrist has a long-standing interest in public policy. In addition to his academic work, he has worked as a consultant to the U.S. Social Security Administration, The Manpower Demonstration Research Corporation, and for the Israeli government after the Oslo peace negotiations in 1994. He lives in Brookline with his wife Mira, and their two children, Adie and Noam. The Angrist family enjoys activities like hiking, skiing, skating, sailing, and eating. JOSHUA D. ANGRIST22 U N C O R R EC TED PR O O F AUTHOR QUERIES AUTHOR PLEASE ANSWER ALL QUERIES. Q1. Please provide keyword/s. Q2. Runninghead was derived from the full title. Please check if it is appropriate.