Education and Self-Selection Robert J. Willis Stale University of New York at Stony Brook and National Bureau of Economic Research Sherwin Rosen University of Chicago and National Bureau of Economic Research A structural model of the demand for college attendance is derived from the theory of comparative advantage and recent statistical models of self-selection and unobserved components. Estimates from NBER-Thorndike data strongly support the theory. First, expected lifetime earnings gains influence the decision to attend college. Second, those who did not attend college would have earned less than measurably similar people who did attend, while those who attended college would have earned less as high school graduates than measurably similar people who stopped after high school. Positive selection in both groups implies no "ability bias" in these data. I. Introduction In this paper we specify and estimate a model of the demand for college education derived from its effect on expected lifetime earnings compared with its cost. Attention is focused on specifying the role of earnings expectations in the derived demand for schooling; these are found to be empirically important determinants of the decision to attend college. In addition to including financial incentives, the model allows for a host of selectivity or sorting effects in the data that are related to "ability bias," family effects, and tastes that have occupied Thanks are due to Sean Becketti for excellent research assistance, to Lung Fei Lee for advice on statistical issues, and to Richard Layard and W. M. Gorman for criticism of an initial draft. This research was supported by NSF and the National Bureau of Economic Research, but this is not an official NBER publication. The order of the authors' names was selected by a random device. (/oumo/ of Pohacal Economy, 1979. vol. 87, no. 5, pt. 2] © 1979 by The University of Chicago. 0022-3808/79/8752-0002103.35 S7 S8 JOURNAL OF POLITICAL ECONOMY other researchers. Background and motivation are presented in Section II. The structure of the model, a variant of a simultaneous-equations problem involving discrete choices, is presented in Section III. The estimates, based on data from the NBER-Thorndike sample, appear in Section IV. Some implications and conclusions are found in Section V. II. Nature of the Problem Estimates of rates of return to education have been controversial because they are based on ex post realizations and need not reflect structural parameters necessary for correct predictions. For example, it is well understood that college and high school graduates may have different abilities so that income forgone during college by the former is not necessarily equal to observed earnings of the latter. Our objective here is twofold. One is to estimate life earnings conditioned on actual school choices that are purged of selection bias. The other is to determine the extent to which alternative earnings prospects, as distinct from family background and financial constraints, influence the decision to attend college. One would need to go no further than straightforward comparisons of earnings outcomes among school classes for structural rate of return estimates if educational wage differentials were everywhere equalizing on the direct, opportunity, and interest costs of schooling. For then the supplies of graduates (or "demands" for each level of education) would be nearly elastic at the equalizing wage differentials, and the distribution of human wealth would be approximately independent of the distribution of schooling.1 However, recent evidence on the structure of life earnings based on panel data strongly rejects this as a serious possibility. Total variance of earnings among people of the same sex, race, education, and market experience is very large, and more than two-thirds of it is attributable to unobserved components or person-specific effects that probably persist over much of the life cycle.2 The panel evidence therefore suggests that supply elasticities are substantially less than completely elastic at unique wage differentials and that there are inframarginal "ability rents." Put in another way, observed rates of return are not wholly supply deter- 1 The equalizing difference model originates with Friedman and Kuznets (1945). Jacob Mincer (1974) has developed it most completely in recent years. 1 See Lillard and Willis (1978) for additional detail and confirmation of these remarks. Related studies have reached similar conclusions, e.g., Weiss and Lillard (1978). Of course, it is conceivable, but unlikely, that educational wage differentials are exactly equalizing for each individual, although considerable lifetime income inequality exists among individuals. This possibility is rejected in the empirical findings presented below. EDUCATION AND SELF-SELECTION Sg mined and depend on interactions with relative demands for graduates as well. A natural approach has been to incorporate measures of ability into the statistical analysis, either directly or as indicators of unobserved factors, in order to, in effect, impute ability rent. But merely partitioning observed earnings into schooling and ability components does not use any of the restrictions imposed on the data by a school-stopping rule, and that decision embodies all the economic content of the problem. Some of that additional structure is incorporated here. Economic theories of education, be they of the human-capital or signaling varieties, are based on the principle of maximum capital value: schooling is pursued to the point where its marginal (private) internal rate of return equals the rate of interest. It is easy to show that this leads to a recursive econometric model in which (i) schooling is related to a person's ability and family background, and (ii) earnings are related to "prior" school decisions and ability. Earnings gains attributable to education do not appear explicitly in the schooling equation. Instead, the cost-benefit basis of the decision is embedded in cross-equation restrictions on the overall model, because the earnings equation is a constraint for the maximum problem that determines education attainment.3 There are many estimates of recursive models in the literature, but very few have tested the economic (wealth-maximizing) hypothesis.4 We begin with the assumption of marked heterogeneity and diversity in the population, as in the unobserved-component approach to panel data. Costs and benefits of alternative school-completion levels are assumed to be randomly distributed among people according to their capacities to finance education, tastes, perceptions, expectations, and an array of talents that affect performance in work activities associated with differing levels of schooling. Some of these things are observed, while others are unobserved. Individuals are sorted into educational classes according to the interaction of a selection criterion (such as maximum present value) and the underlying joint distribution of tastes, talents, expectations, and parental wealth. The selection 3 The basic model is discussed in Becker (1975). See Rosen (1977) for an elaboration of this argument and a survey of the relevant literature. Blaug (1976) also stresses the need for estimating structural demand for schooling relationships, and Griliches (1977) discusses the difficulty of doing so in conventional models. Part of Griliches's discussion is pursued in Griliches, Hall, and Hausman (1977). The model elaborated here is conceptually distinct from that work, though some of the statistical techniques are similar. A similar remark applies to the work of Kenny, Lee, Maddala, and Trost (in press). 4 There is aggregate-time-series evidence that earnings are important determinants of professional school enrollment (see Freeman [1971] and numerous subsequent studies by the same author); but there is virtually no micro evidence even though such data have been most often studied in the human-capital and signaling frameworks. Sio JOURNAL OF POLITICAL ECONOMY rule partitions the underlying joint density into a corresponding realized educational distribution. The supply function of graduates at any level of schooling is "swept out" of the joint taste, talent, parental wealth distribution as increased wage differentials enlarge the subset of the partition relevant for that class. Let Yn represent the potential lifetime earnings of person i if schooling level j is chosen, X,- a vector of observed talent or ability indicators of person i, and Tt an unobserved talent component relevant for person i. Similarly, split family-background and taste effects into an observed vector Z t and an unobserved component Let Vu denote the value of choosing school level; for person i. Then a general school-selection model is: Yu = yi (Xu Ti). j=l,...,n; (1) Vtl = g(y„Zl,ut); (2) i belongs toj if Vu = max (Vn, . . . , Vin); (3) and (T,o))~F(7,a>). (4) Equation (1) shows how potential earnings in any given classification vary with talent and ability.5 The earnings function differs among school classes because work activities associated with alternative levels of education make use of different combinations of talent. Equation (2) translates the earnings stream from choice j into a scaler such as present value and is conditioned on family background to reflect tastes and financial barriers to extending schooling. Equation (3) is the selection rule: the person chooses the classification that maximizes value and is observed in one and only one of the n possibilities open to him. Equation (4) closes the model with a specification of the distribution of unobservables. Since observed assignments of individuals to schooling classes are selected on (X, Z, t, cj), earnings observed in each class may be nonrandom samples of population potential earnings, because those with larger net benefits in the class have a higher probability of being observed in it. This formulation is suggested by the theory of comparative advantage.6 It allows for a rather eclectic view of the role of talent in 5 Actually, expository convenience dictates a more restrictive formulation than is necessary. The X and Z need not be orthogonal. They may have some elements in common, but identification requires that they not have all elements in common (see below). 'Roy (1951) gives a surprisingly modem and rigorous treatment of a selection problem based on the theory of comparative advantage. See Rosen (1978) for extensions and elaboration on this class of problems. Heckman (1976), Lee (1976), and Maddala (1977) develop the appropriate estimation theory. EDUCATION AND SELF-SELECTION SlI determining observed outcomes, since the X's may affect earning capacity differently at different levels of schooling (see eq. [1]) and covariances among the unobservables are unrestricted. Indeed, there may be negative covariance among talent components. For example, plumbers (high school graduates) may have very limited potential as highly schooled lawyers, but by the same token lawyers may have much lower potential as plumbers than those who actually end up choosing that kind of work. This contrasts with the one-factor ability-as-IQ specifications in the literature which assume that the best lawyers would also be the best plumbers and would imply strictly hierarchical sorting in the absence of financial constraints. In effect an IQ-ability model constrains the unobserved ability components to have large positive covariances—an assumption that is probably erroneous and is not necessary for our methods. Note also that population mean "rates of return" among alternative schooling levels have no significance as guides to the social or private profitability of investments in schooling. For example, a random member of the population might achieve a negative return from an engineering degree, yet those with appropriate talents who choose engineering will obtain a return on the time and money costs of their training which is at least equal to the rate of interest. There are difficult estimation problems associated with selectivity models. In brief, the unobservables impose distinct limits on the amount of structural information that can be inferred from realized assignments in the data. For example, it would be very desirable to know the marginal distribution of talents in (4), since it would then be possible to construct the socially efficient assignment of individuals to school classes, defined as the one that maximizes overall human wealth. Then the deadweight losses due to capital market imperfections could be computed by comparing optimal with observed assignments. However, the marginal density is not itself identified, since unobserved financial constraints and talent joindy determine observed outcomes. These issues will be made precise shortly, but, roughly speaking, we do not necessarily know if a person chose college education because he had talent for it or because he was wealthy. What can and will be done is to map out the joint effects of the unobservables embedded in the actual demand curve for college attendance, which embodies all constraints inherent in the actual market but which nevertheless is a valid structural basis for prediction. Selectivity or ability bias in unadjusted rate of return computations that do not take account of the sorting by talent inherent in observed assignments can also be computed. A few limitations to these methods must be noted at the outset. It is crucial to the spirit of the model, based as it is on human diversity, Sl2 JOURNAL OF POLITICAL ECONOMY that few covariance restrictions be placed on the distribution of unob-servables. This practically mandates the assumption of joint normality, since no other nonindependent multivariate distribution offers anything close to similar computational advantages. While the general selection rule specified below is likely to emerge from a broad class of economic models of school choice, it is not known how sensitive the results are to the normality assumptions. In addition, nonindepen-dence forces some aggregation in the number of choices considered for computational feasibility, even though the statistical theory can be worked out for any finite number.7 This rules out of consideration other selection aspects of the problem that should be considered, such as choice of school quality.8 All people in our sample have at least a high school education, and we have chosen a dichotomous split between choice of high school and more than high school (college attendance). Some internal diagnostic tests help check on the validity of this aggregation. Experiments with a college completion or more classification, compared with a high school graduation or some college classification, yielded results very similar to those reported below. III. The Model Specification of the econometric model is tailored to the data at our disposal. More details will be given below, but at this point the important feature is that earnings are observed at two points in the life cycle for each person, one point soon after entrance into the labor market and another point some 20 years later. The earnings stream is parameterized into a simple geometric growth process to motivate the decision rule. This is a reasonable approximation to actual life earnings patterns for the period spanned by the data. Two levels of schooling are considered, labeled level A (for more than high school) and level B (for high school). If person i chooses A, the expected earnings stream is yai (0 = 0, 0 < t « S, yai (0 = yai exp \gai (/ - S)], s *st < oo, 7 The problem is that the aggregates are sums of distributions that are themselves truncated and selected. Therefore the distributions underlying the aggregate assignments are not necessarily normal. We are unaware of any systematic analysis of this kind of aggregation problem. 8 Methods such as conditional logit have been designed to handle high-dimensioned classifications (McFadden 1973) but require independence and other (homogeneity) restrictions that are not tenable for this problem. Hausman and Wise (1978) have worked out computational methods on general normal assumptions for three choices. Note also that maximum-likelihood methods are available, but are extremely expensive because multiple integrals must be evaluated. Hence we follow the literature in using consistent estimators. EDUCATION AND SELF-SELECTION where S is the incremental schooling period associated with A over B and t — S is market experience. If alternative B is chosen, the expected earnings stream is yu (0 = Ju exp igu t), 0^t<°°. (6) Thus earnings prospects of each person in the sample are characterized by four parameters: initial earnings and rates of growth in each of the two alternatives. Diversity is represented by a random distribution of the vector (ya, ga, yb, g0) among the population.9 Equations (5) and (6) yield convenient expressions for present values. Assume an infinite horizon, a constant rate of discount for each person, rit with r{ > gai, gM, and ignore direct costs of school. Then the present value of earnings is Val = f yai (f) exp (-rj t) dt = [yai/(r, - gai)] exp (-rtS) (7) if A is chosen and /•OD VM=\ ybi (t) exp (-r( t) dt = yj(rt - gbi) (8) if B is chosen. These are likely to be good approximations, since the consequences of ignoring finite life discount corrections and non-linearities in earnings paths toward the end of the life cycle are lightly weighted for nonnegligible values of r. Selection Rule Assume that person i chooses A if Vat > Vbi and chooses B if Vai =£ Vw. DefineIt = In (yaJVM). Substitution from (5) to (8) yields/4 = In ya( — In yu — *i S - In (r, — gai) + In (rf - gM). A Taylor series approximation to the nonlinear terms around their population mean values (ga,gb, r) yields It = a0 + a, (In yai - In yM) + a2gai + a3^w + a4rj, (9) with «i = 1, a2 = dlldga = l/(f -ga)>0, «3 = dlldgb =-1/(7 " |») < 0, _ _ (10) «4 = - [S + (ga ~ gbV(r - ga) (7 - g„)]. 9 Wise (1975), Lazear (1976), and Zabalza (1977) have used initial earnings and growth of earnings to study life earnings patterns. The distribution of potential earnings and growth is not constrained in our model, thus, e.g., allowing the possibility that ya and g0 are negatively correlated (and similarly forjj andgt,), as in Mincer (1974). On this see Hause (1977). S14 JOURNAL OF POLITICAL ECONOMY Hence the selection criteria are Pr (choose A) = Pr (Va > Vb) = Pr (/ > 0), Pr (choose B) = Pr (V0 =s V6) = Pr (/ « 0). Earnings and Discount Functions Let Xj represent a set of measured characteristics that influence a person's lifetime earnings potential, and let uu, . . . , u4t denote permanent person-specific unobserved components reflecting unmeasured factors influencing earnings potential.10 Specify structural (in the sense of population) earnings equations of the form lnyai = Xi(la + uu, ^ gai = Xtya + U2i if A is chosen and In^w = Xtpb + u3i, gbi = Xtyb + u4i if B is chosen. The variables on the left-hand sides of (12) and (13) are to be interpreted as the individual's expectation of initial earnings and growth rates at the time the choice is made. In order to obtain consistent estimates of (J3a, ya, j86, yb) from data on realizations it is assumed that expectations were unbiased. Hence forecast errors are assumed to be independently normally distributed, with zero means. Let Z4 denote another vector of observed variables that influence the schooling decision through their effect on the discount rate. Then ri=Zi8 +usi, (14) where u5 is a permanent unobserved component influencing financial barriers to school choice. The vector (uj) is assumed to be jointly normal, with zero means and variance-covariance matrix 2 = [tru]. The £ is unrestricted. Reduced Form The structural model is (9), (12), (13), and (14). A reduced form of the selection rule is obtained by substituting (12)—(14) into (9): 10 The t's of Section II are related to (u, . . . , u,) by a set of implicit prices that vary across school classifications, as in Mandelbrot (I960). See Rosen (1978) for the logic of why these differences in valuation can be sustained indefinitely and cannot be arbi-t raged. EDUCATION AND SELF-SELECTION S15 / = a0 + X[a, (fia - p0) + a2ya + a3yb] + a^8 + at («, - u3) + a2u2 + a3u3 + ajUj (15) = Wtt - e, with W = [X, Z] and —e = a, (mi - m3) + a2u2 + ct3u4 + a6u5. Thus, an observationally equivalent statement to (9) and (11) is Pr (A is observed) = Pr (Wtt > e) = F p^-)> (16) where F(-) is the standard normal c.d.f. Equation (16) is a probit function determining sample selection into categories A or B, to be estimated from observed data.11 Selection Bias and Earnings Functions The decision rule selects people into observed classes according to largest expected present value. Hence the earnings actually observed in each group are not random samples of the population, but are truncated nonrandom samples instead. The resulting bias in observed means may be calculated as follows. Note that Pr [observing ya (t)] = Pr (/ > 0) = Pr (Wtt > e). Therefore, from (12),£(ln ya \l > 0) = Xpa + £(uj | Wtt > e). Define Pi = p (u1la1, e/cre) = 0) = X/3a + (tlpl E(e/0)=X/3a+^-Afl. (18) A parallel argument for ga, y6, and gb yields E(ga\l > 0) =Xya + ^ ka, (19) £(ln yb\I « 0) = X0„ + ^ k„, (20) 11 For completeness, —e should be redefined to take account of deviations between realizations and expectations at the time school decisions were made. Thus, let In Yai = In fa + vlt, where is realized initial earnings, is expected initial earnings, and vu is normally distributed forecast error. Similarly, forecast errors uw, va, and v« are defined forg,,,, In yw, andgH. Then the complete definition of -e is obtained from replacing uM with (Ujj + vn),j = 1,.... 4, in (15). Clearly this has no operational significance for the model, given the assumption of unbiased expectations. Si6 JOURNAL OF POLITICAL ECONOMY and E(gb\I « 0) = Xyb + ^X, (21) with Wit ) = f(WTr/(Tt)/[l - FQ/Virlaj] (22) and ct^ = -[aj(alfc - o-jjt) + a2tr2Jt + a3cr4fc + a5o-5fc], ft =1, . . , 4. (23) Note from (17) that Xa 0. Therefore the observed (conditional) means of initial earnings and rates of growth among persons in A are greater or less than their population means as cru and €/ e/o-e). (29) Since In ya(t) - In yb(t) = In ya - In yb + (ga - gb)i - gaS, the following restrictions are implied: (i - s)el + e2 + «2, (30) -tey + 03 = a3. Hence we have a check on the validity of the model. Of course, its main validation is the power to predict behavior and assignments on independent data. Identification Two natural questions regarding identification arise in this model. 1. Estimation of the selection rule or structural probit equation is possible only if the vectors X and Z have elements that are not in common. IfX andZ are identical, the predicted values of In ya — In yb, ga, and gb are colinear with the other explanatory variables in (26), and its estimation is precluded. Note, however, that even if X and Z are identical, the reduced-form probit (16) is estimable, and it still may be possible to estimate initial earnings and growth-rate equations and selection bias. The reason is that, although the X corrections in (24) are functions of the same variables that enter theX/3 orXy parts of these equations, they are nonlinear functions of the measured variables. Structural earnings equations might be identified off the nonlinearity, though in any particular application there may be insufficient nonlinearity if the range of variation in Wn (see [15]) is not large enough.15 15 Heckman (1979) raises some subtle issues regarding specification error in selection models. Elements of Z may be incorrectly specified in X and can be statistically significant in least-squares regressions because of truncation. Conversely, coefficients on selection-bias variables X„ and kb can be significant because variables are incorrectly attributed to selection when they more properly belong directly in X. E.g., some might argue that family background belongs in structural earnings equations and our selectivity effects work (see below) because family background comes in the back door through its indirect effect on X. However, a reversal of the argument suggests that family-background variables might have significant estimated direct effects on earnings merely because they work through selection and resulting truncation. There is no statistically satisfactory way of resolving this problem. In any event, we cannot be "agnosuc" about specification because both the economic and statistical theories require certain nontestable zero identifying restrictions. The problem is even more complicated EDUCATION AND SELF-SELECTION SlQ In the general discussion of Section II, X was tentatively associated with measured abilities and Z with measured financial constraints (and tastes), corresponding to the Beckerian distinction between factors that shift the marginal rate of return to investment schedule and those that shift the marginal supply of funds schedule. Evidendy, if one takes a sufficiendy broad view of human investment and in particular of the role of child care in the new home economics, easy distinctions between the content of X and of Z become increasingly difficult, if not impossible, to make. IfX and Z are indistinguishable, the economic theory of school choice has no empirical content. In the empirical work below a very strong dichotomy with no commonalities is maintained: X is specified as a vector of ability indicators and Z as a vector of family-background variables. This hypothesis is maintained for two reasons. First, it provides a test of the theory in its strongest form. Certainly if the theory is rejected in this form there is little hope for it. Second, there have been no systematic attempts to find empirical counterparts for the things that shift marginal rate of return and marginal cost of fund schedules that cause different people to choose different amounts of schooling. The validity of the theory rests on the possibility of actually being able to find an operational set of indicators, and this distinction is the most straightforward possibility. Given resolution of problem 1, not all parameters in the model can be estimated. Some are overidentified and some are underidentified. The selectivity-bias-corrected structural earnings equations (24) di-recdy estimate /3„, f3b, ya, yb, and the structural probit (26) provides estimates of (ati/cre, a2/o-f, a3l»«) Observations Limit observations Nonlimit observations -2 In (likelihood ratio) X2 degree freedom .0655 .2898 .2709 .1980 -.4411 .0047 -.2575 -.0070 -3.0236 .0244 -.7539 .0019 2.2797 .63 2.29 2.20 1.91 -6.14 1.67 -1.41 -4.29 -1.04 12.34 -5.75 .72 .47 3611 791 2820 579.5 28 0677 2884 2768 1990 4397 5.1486 138 3850 -44.2697 .65 2.30 2.02 1.92 -3.74 2.25 1.83 -1.28 3611 791 2820 568.8 23 0661 2888 2693 1966 4379 7 71 5 6632 8981 1501 3611 791 2820 576.6 23 Note.—l is asymptotic 1-staliilic: DK: Don't know, dummy variable; NR No response, dummy variable; other variable» are defined in Appendix a. S26 JOURNAL OF POLITICAL ECONOMY TABLE 3 Structural Earnings Estimates: Equations (24) and (28), OLS Dependent Variable In ya In >6 ga g» In ya(i) In yb(i) Regressor (1) (2) (3) (4) (5) (6) Constant 8.7124 2.8901 .1261 .2517 10.3370 7.5328 (16.51) (1.37) (3.90) (2.11) (5.52) (2.08) Read .0009 -.0019 .0001 .0003 .0027 .0057 (1.21) (-1.17) (111) (3.20) (2.80) (3.28) NR read .0791 .0506 -.0034 -.0046 .0033 -.0402 (1.24) (.58) (-.76) (-.89) (.04) (-42) Mech -.0002 -.0005 -.0001 -.0001 -.0021 -.0017 (-.48) (-.54) (-2.16) (-113) (-3.59) (-1.73) NR mech .1969 .0002 .2196 (.69) (.01) (.68) Math .0015 -.0013 .0001 -.0000 .0030 -.0019 (2.02) (.74) (1.18) (-20) (3.31) (-1.00) NR math -.1087 .0562 .0015 .0006 -.0877 .0712 (-1.94) (.83) (.38) (.15) (-1.24) (.96) Dext .0008 -.0019 -.0000 .0003 .0002 .0036 (1.03) (-1.21) (-78) (2.77) (.16) (2.19) NR dext .0751 -.0004 .1466 (28) (-.02) (43) Exp -.0523 .4260 -.0028 -.0154 -.0129 .0776 (-1.49) (3.10) (-1.11) (-1.93) (-■29) (.53) Exp2 .0015 -.0067 .0000 .0002 -.0000 -.0012 (2.22) (-2.95) (.21) (1.82) (-.01) (-49) Year 48 -.0020 -.0156 (-.48) (-1.72) Year 69 -.0067 .0039 (-.26) (.09) S13-15 .1288 -.0062 .0168 (5.15) (-3.49) (.52) S16 .0760 .0026 .1095 (3.82) (1.79) (4.26) S20 .1318 .0049 .2560 (4.10) (2.13) (6.15) K -.1069 .0058 .0206 (-3.21) (2.45) (.49) K -.0558 .0118 .2267 (-.66) (2.39) (2.48) R* .0750 .0439 .1578 .0513 .0740 .0358 Note.—NR. No response, dummy variable, other variables are denned in Appendix A; r-values are shown in parentheses initial earnings is math score for college attendees. Ability indicators are more important for earnings growth (cols. 3 and 4) and later earnings (cols. 4 and 5). Dexterity and reading scores have positive effects on gb and yt(t), while math and reading scores have positive effects on In ya(l) but exhibit much weaker effects on earnings growth. Interestingly enough, the effect on mechanical score is nega- EDUCATION AND SELF-SELECTION S27 tive in all cases, raising obvious questions about what it is that this test supposedly measures (recall, however, the sample truncation on high-ability military personnel). Even so, it seems to have a more important negative effect for members of group A. This, along with the results for dexterity and math scores, lends support to the comparative-advantage hypothesis. Selectivity biases are particularly interesting in that regard. The coefficients of \b show no selectivity bias for initial earnings of high school graduates, but positive bias for growth rates. Therefore, observed earnings patterns of high school graduates show higher rates of growth compared with the pattern that would have been observed for the average member of this sample had he chosen not to continue school. On the other hand, the coefficients of A.a show posidve selection bias for initial earnings of college attendees and negative bias for earnings growth. The latter is due to the fact that there are no selection effects for late earnings. Thus, the observed earnings pattern among members of group A is everywhere higher than the population mean pattern would have been and converges toward the population mean late earnings level. Positive selection among both A and B also lends support to comparative advantage. The most novel empirical results are the structural probit estimates in table 2, which show how anticipated earnings gains affect the decision to attend college. The predicted earnings variables are statistically significant except for gb in (26) andg-0 in (29).20 More striking, however, is the agreement of the sign patterns predicted by the theory (see eq. [10] and recall that the structural probit coefficients are normalized by tr€, from [26] and [29]). The model passes two internal consistency checks. The first is restriction (30). Working backward to normalized a estimates from directly estimated 0's in column 5 of table 2 yields" a predicted (a/trc) vector of (5.15, 155.90, -52.68), which is similar to the direct estimates in column 3 of (5.15, 138.39, 20 Recall (n. 14) that the (-statistics for the structural probit in table 2 are based on consistent estimates of the standard errors, as suggested by Lee (1977). The {-statistics on background variables are not very different from the biased values computed by a standard probit algorithm. However, the (-statistics on the predicted earnings and growth variables are substantially reduced when corrected for bias; e.g., the standard probit estimates of (-values for In (yjy&), ga, and gb in (26) are (10.8, 8.15, —4.81), compared with the unbiased values of (2.25, 1.83, -1.28) in table 2. 81 There are two ways of estimating (and (( - S) for these computations. First, a direct estimate of ( — S is obtained as the difference between average year of 1969job and average year of initial Job for members of group A in table 1. A direct estimate of (is the average difference between 1969 job and initial job for members of group B. However, an independent estimate of S is the average years of schooling among members of group A minus 12.0. Hence another estimate of (f-S)is the direct estimate of (i — S) minus the direct estimate of S; and another estimate of (i — S) is the direct estimate of ( minus the direct estimate of 5. The two estimates for each parameter were averaged for purposes of these checks. They are 24.19 for t and 19.68 for (t - S). S28 JOURNAL OF POLITICAL ECONOMY —44.27). Working forward from actual estimates of normalized a to predicted estimates of 0 gives prediction (5.15, 37.04, 80.31), compared with actual (5.15, 7.66, 71.90). These comparisons probably would not be so close if the two-parameter approximation to earnings patterns in (5) and (6) was not reasonably good. Second, equations (15) and (26) indicate that estimated coefficients on the Z variables in structural and reduced-form probits should be the same. Direct comparison of coefficients of Z in table 2 shows extremely close similarity of a48 in all three equations. In sum, the results give direct, internally consistent evidence on the validity of the economic theory of the demand for schooling derived from its (private) investment value. The economic hypothesis cannot be rejected. V. Conclusions The structural probit estimates of table 2 support the economic hypothesis that expected gains in life earnings influence the decision to attend college. They also show important effects of financial constraints and tastes working through family-background indicators, a finding in common with most other studies of school choice.2" Availability of the GI Bill might well be expected to dull the observed monetary effects, but they remain strong enough to persist for a significant fraction of the sample. The estimates also show positive sorting or positive selection bias in observed earnings of both high school graduates and college attendees. To be clear about the implications of these results it is necessary to distinguish between the effects of measured abilities and unmeasured components on earnings prospects in A or B. The selection results refer to unmeasured components of variance. If we examine a subpopulation of persons with given measured abilities (i.e., with the same values of X in [12] and [13]), the empirical results on selectivity imply that those persons who stopped schooling after high school had better prospects as high school graduates than the average member of that subpopulation and that those who continued on to college also had better prospects there than the average member of the subpopulation. That is, the average earnings at most points in the life cycle of persons with given measured characteristics who actually chose B exceeded what earnings would have been for those persons (with the same characteristics) who chose A instead. Conversely, average earn- 22 See Radner and Miller (1970) and Kohn, Manski, and Mundel (1976) for logit models of college choice. These models contain more detail in personal and college attributes but do not make any attempt to assess the effects of anticipated earnings in college attendance decisions. See Abowd (1977) for another approach to the selection problem focusing on school quality. EDUCATION AND SELF-SELECTION ings for those who actually chose A were greater than what earnings would have been for measurably similar people who actually chose B had they continued their schooling instead. This is a much different picture than emerges from the usual discussions of ability bias in the literature, based on hierarchical or one-factor ability considerations. The one-factor model implies that persons who would do better than average in A would also do better than average in B. That is, positive selectivity bias in B cannot occur in the strict hierarchical model.23 The most attractive and simplest interpretation is the theory of comparative advantage, because hierarchical assignments are not observed. While the results are consistent with comparative advantage, they do not prove the case because life-persistent luck and random extraneous opportunities could have played just as important roles in the observed assignments as differential talents did. For all we know, those who decided to stop school after high school may have married the boss's daughter instead, or made better career connections in the military, and so forth. The important point is that their prospects in B were higher than average. As noted above, the population average rate of discount, f, is an identifiable statistic in the model. Estimates are obtained by applying restriction (10) to the estimates in table 2. Maintain the hypothesis that a, = 1. Then the estimated coefficient of In (yjyb) in table 2 estimates (l/(re), from equation (26). Since all the equations of the structural probit are normed by » (01 4.8837 Observations 952 952 952 Limit observations 321 321 321 Nonlimit observations 631 631 631 -2 In (likelihood ratio) 184.446 179.419 184.446 X* degree freedom Mote.—t is asymptotic /-stanstk; DK: Don't know, dummy variable; NR: No reiponse, dummy variable: variables are defined in Appendix A. S34 EDUCATION AND SELF-SELECTION S35 References Abowd, John M. "An Econometric Model of the U.S. Market for Higher Education." Ph.D. dissertation, Univ. Chicago, 1977. Becker, Gary S. Human Capital. 2d ed. New York: Nat. Bur. Econ. Res., 1975. Blaug, Mark. "The Empirical Status of Human Capital Theory: A Slighdy Jaundiced Survey." /. Econ. Literature 14 (September 1976): 827-55. Freeman, Richard. The Market for College Trained Manpower: A Study in the Economics of Career Choice. Cambridge, Mass.: Harvard Univ. Press, 1971. Friedman, Milton, and Kuznets, Simon. Income from Independent Professional Practice. New York: Nat. Bur. Econ. Res., 1945. Griliches, Zvi. "Estimating the Returns to Schooling: Some Econometric Problems." Econometrica 45 (January 1977): 1-22. Griliches, Zvi; Hall, B.; and Hausman, Jerry. "Missing Data and Self-Selection in Large Panels." Discussion Paper no. 573, Harvard Inst. Econ. Res., 1977. Hanoch, Giora. "An Economic Analysis of Earnings and Schooling."J. Human Resources 2, no. 3 (1967): 310-29. Hause, J. "The Fine Structure of Earnings and the On-the-job-Training Hypothesis." Mimeographed. Univ. Minnesota, 1977. Hausman, Jerry, and Wise, D. "A Conditional Probit Model for Qualitative Choice." Econometrica 46 (March 1978): 403-26. Heckman, James J. "The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models." Ann. Econ. and Soc. Measurement 5 (Fall 1976): 475-92. -. "Sample Selection Bias as a Specification Error." In Female Labor Supply: Theory and Estimation, edited by J. P. Smith. Princeton, N.J.: Princeton Univ. Press, 1979. Kenny, L.; Lee, L.; Maddala, G. S.; and Trost, R. "Returns to College Education: An Investigation of Self-Selection Bias in Project Talent Data." Internat. Econ. Rev. (in press). Kohn, M. G.; Manski, C. F.; and Mündel, D. S. "An Empirical Investigation of Factors Which Influence College-going Behavior." Ann. Econ. and Soc. Measurement 5 (Fall 1976): 391-420. Lazear, Edward. "Age, Experience and Wage Growth." A.E.R. 66 (September 1976): 548-58. Lee, Lung Fei. "Estimation of Limited Dependent Variables Models by Two-Stage Methods." Ph.D. dissertation, Univ. Rochester. 1976. -. "On the Asymptotic Distributions of Some Two-Stage Consistent Estimators: Unionism and Wage Rates Revisited." Mimeographed. Univ. Minnesota, 1977. -. "Unionism and Wage Rates: A Simultaneous Equations Model with Qualitative and Limited Dependent Variables." Internat. Econ. Rev. 19 (June 1978): 415-33. Lillard, L., and Willis, Robert. "Dynamic Aspects of Earnings Mobility." Econometrica 46, no. 5 (1978): 985-1012. McFadden, D. "Conditional Logit Analysis of Qualitative Choice Behavior." In Frontiers in Econometrics, edited by P. Zarembka. New York: Academic Press, 1973. Maddala, G. S. "Self-Selectivity Problems in Econometric Models." In Applications in Statistics, edited bv P. R. Krishnaia. Amsterdam: North-Holland, 1977. S36 JOURNAL OF POLITICAL ECONOMY Mandelbrot, Benoit. "Paretian Distributions and Income Maximization." Q./.E. 76 (February 1960): 57-85. Mincer, Jacob. Schooling, Experience and Earnings. New York: Nat. Bur. Econ. Res., 1974. National Bureau of Economic Research. "The Comprehensive NBER-TH Tape Documentation." Mimeographed. March 1973. Radner, Roy, and Miller, L. S. "Demand and Supply in U.S. Higher Education." A.E.R. 60 (May 1970): 326-34. Rosen, Sherwin. "Human Capital: Relations between Education and Earnings." In Frontiers of Quantitative Economics, edited by Michael D. Intriligator. Vol. 3Ä. Amsterdam: North-Holland, 1977. -. "Substitution and Division of Labor." Economica 45 (August 1978): 235-50. Roy, Andrew D. "Some Thoughts on the Distribution of Earnings." Oxford Econ. Papers, n.s. 3 (June 1951): 135-46. Taubman, Paul. Sources of Inequality of Earnings. Amsterdam: North-Holland, 1975. Weiss, Yoram, and Lillard, Lee A. "Experience, Vintage, and Time Effects in the Growth of Earnings: American Scientists, 1960-1970."/.P. E. 86, no. 3 (June 1978): 427-47. Wise, D. "Academic Achievement and Job Performance." A.E.R. 65 (June 1975): 350-66. Zabalza, A. "The Determinants of Teacher Supply." Mimeographed. London School Econ., 1977.