NBER Working Paper Series SAMPLE SELECTION BIAS AS A SPECIFICATION ERROR (with an applicat ion to the estimation of labor supply functions) James J. Heckman Working Paper No. 172 CENTER FOR ECONOMIC ANALYSIS OF HUMAN BEHAVIOR AND SOCIAL INSTITUTIONS National Bureau of Economic Research, Inc. 204 Junipero Serra Boulevard, Stanford, CA 94305 March 1977 (revised) Preliminary; not for quotation. NBER working papers are distributed informally and in limited number for comments only. They should not be quoted without written permission of the author. This report has not undergone the review accorded official NBER publications; in particular, it has not yet been submitted for approval by the Board of Directors. The research reported in this paper was supported by an HEW grant to the Rand Corporation and a U.S. Department of Labor grant to the National Bureau of Economic Research. A previous version of this paper circulated under the title "Shadow Prices, Market Wages, and Labor Supply Revisited: Some Computational Simplifications and Revised Estimates," June 1975. I have greatly benefited from conversations with Paul Schultz and James Smith, and from detailed written comments of Mark Killingsworth on the first draft of this paper. Takeshi Amemiya, John Cogan, Reuben Gronau, Ed Learner, H. G. Lewis, and Bill Rodgers all made valuable comments on the second draft. Gary Becker, Tom MaCurdy and Arnold Zellner made valuable comments on this draft. Ralph Shnelvar performed the computations. I assume all responsibility for any remaining errors. In this paper, the bias that results from using nonrandomly selected samples to estimate behavioral relationships is shown to arise because of a missing data problem. La contrast with the standard omitted variable problem in econometrics, in which certain explanatory variables of a regression model are missing, the problem of sample selection bias arises because data are missing on the dependent variable of an analysis. Regressions estimated on the data available from the nonrandom sample will not, in general, enable the analyst to estimate parameters of direct interest to economists. Instead, such regression coefficients confound meaningful structural parameters with the parameters of the function determining the probability that an observation makes its way into the nonrandom sample. Sample selection bias may arise for two distinct reasons. First, there may be self selection by the Individuals being Investigated. One observes market wages for certain women because their productivity in the market exceeds their productivity in the home.^" Similarly, one observes wages for union members who found their nonunion alternative less desirable. Finally, the wages of migrants (or manpower trainees) do not, in general, afford an estimate of what nonmlgrants (nontralnees) would have earned had they migrated (participated in training). In each of these cases, wage functions fit on the available data do not estimate the wage function that characterizes a randomly selected member of the general population subject to the "treatment" of work, unionism, migration, and manpower training programs, respectively. ^Note that this does not imply that the more market productive women are the ones observed working. 1 2 Moreover, In each case, if it were possible to obtain the missing wage data for either the treatment or nontreatment population, it would be possible to utilize simple regression techniques to estimate the parameters of population functions. Simple comparisons between pre and post treatment wages would yield unbiased estimates of the economic benefits of the treatment. Sample selection bias may also arise as a direct consequence of actions taken by the analyst. In studies of panel data, it is common to require that "intact" observations be employed. For example, in analyses of the time series of the labor supply of married women, stability of the family unit is often required.for an observation sequence to be analyzed. The effect of such criteria operates in precisely the same fashion as self selection: fitted functions confound behavioral functions with sample selection functions. It is fair to say that most competent analysts have been aware of the possibility of both sources of selection bias. It is also fair to say that the accepted econometric practice has been to ignore the problem in making parameter estimates but to verbally qualify the estimates in light of possible selection biases. Recent work in econometrics has attempted to improve on previous work by making specific assumptions about the source of selection bias. In particular» this work assumes that both the missing data and the available data are drawn from a common probability distribution, typically assumed to be a normal law. Except for work by Amemiya (1973) and Gronau (1974), the authors of /these studies rely on m*-rimm likelihood estimators to produce parameter estimates free of selection bias. In this paper, I present a simple characterization of the sample also selection bias problem that is/applicable to the conceptually distinct 3 econometric problems that arise from truncated samples and from models 1 with limited dependent variables. The problem of sample selection bias is fit within the conventional specification error framework of Griliches and Theil. A simple estimator is discussed that enables analysts to utilize ordinary regression methods to estimate models free of selection bias. The techniques discussed here are applied to reestimate and test a model of female labor supply developed by the author. (1974). Besides providing an illustration of the methodology, this application is of interest in its own right for three reasons: (a) an important variable utilized in the author's previous analysis, the labor market experience of women, was incorrectly coded by the primary data source, (b) the simple estimators discussed here allow for much more extensive testing of the maintained hypotheses of the previous paper, (c) the method discussed here produces an initial consistent estimator for the likelihood equations of the previous paper. This last issue is important because the likelihood function proposed in the 1974 paper is not globablly concave and hence the Issue of selecting an initial starting value is an important one, since local optima will not yield consistent estimators. Four conclusions emerge from the empirical analysis of female labor supply that is conducted on the 1967 National Longitudinal Survey for women age 30-44. First, estimated coefficients of labor supply and wage functions are quite sensitive to alternative treatment of the labor market experience of the wife. Recent work (Heckman, 1977) suggests that unmeasured factors that determine participation also determine past work behavior. Treating the wife's labor force experience as an endogenous variable in participation probabilities, using standard instrumental variable estimation techniques, ^"This relationship is spelled out in greater detail in a companion paper (Heckman, 1976). 4 significantly alters the coefficients of estimated labor supply and wage functions. Second, in a model that treats the labor market experience of the wife as endogenous, there is evidence that selection bias is an important phenomenon in the estimates of labor supply functions, but there is little evidence of selection bias in estimates of the hourly wage function. Third, the empirical analysis casts some doubt on the validity of the simple model assumed in the 1974 paper. With a minor modification the basic structure of the model remains intact and concordant with data. Fourth, conventional measures of labor supply overstate the amount of measured work, create the statistical Illusion of a standard work week and work year, and considerably understate the true sample variation in labor supply. This paper is in three parts. In the first section, selection bias is presented within the specification error framework. In this section, general distributional assumptions are maintained. In section two, specific results are presented for the case of normal regression disturbances. Simple estimators are proposed and discussed. In the third section, empirical results are presented. I. Sample Selection Bias as a Specification Error To simplify the exposition, consider a two equation model. Few new points arise in the multiple equation case, and the two equation case has considerable pedagogical merit. Consider a random sample of T observations. The equations for individual i may be written as (la) Yu - xufi1 + U1± (lb) Y2± - X2i02 + U2± 5 exogenous where is a lxK^ vector of/regressors, 6^ is a K^Xl vector of parameters, (2) E(Uj:L) - 0, ECU^Uj,^ - cjjf , j-1,2, E(UjiUj'i,) " °» 1 * V' The final assumption is an implication of a random sampling scheme. Denote the joint distribution of U^, TJ by MU^, u2i^ which ^ be a singular distribution. The regressor matrix is assumed to be of full rank so that if all data were available, each equation could be estimated by least squares, and all parameters would be identified. Suppose that one seeks estimates of equation (la) but that data are missing on Y^ for certain observations. The crucial question is "why are data missing for certain observations?" No matter what the answer to this question, one can write the population regression function for equation (la) as E(Yli|X1±) = \±BV i - 1.....T while the regression function for the subsample of available data is E(YliJX^i, sample selection rule) « + ECU^jJ sample selection rule), i - 1,...,T where, for convenience, the i subscripts are labeled so that the first < T observations have all data available. If the conditional expectation of V is zero, the selected sample regression function is the same as the population regression function. In this case, least squares may be applied to the subsample of the available data to estimate the population regression function. The only cost of having an incomplete sample is a loss in efficiency. In the general case, the sample selection, rule that determines the available data has more serious consequences. Consider the following selection rule: data are available on Y^ if (3) Y2± > 0 while if Y2i < ° we do not obtain observations. Clearly the choice of zero as a threshold is an inessential normalization. Also, one could define a dummy variable d^ with the properties (4) d± - 1 iff Y2i > 0 d± » 0 iff Y2i < 0 so that one could analyze the joint distribution of ^-^±* ^± dispensing with Y^ altogether. The advantage in using the selection rule representation (3) is that it permits a unified summary of the existing literature. Utilizing this representation, one may write E-X2ie2). In the case of independence between and ^i* 30 t^iat t^ie selection rule is independent of the behavioral function being estimated, the conditional mean of is zero. In general, the conditional mean of the disturbance does not vanish. Accordingly, the selected sample regression function may be written 7 (5) E(Yu[X;Li, T2i > 0) - X^ + E(UU|U21 > - X^). The selected sample regression function depends on X^ and X^. Regression estimates of equation (la) fit on the selected sample omit the final term of equation (5). Thus the problem of sample selection bias, initially viewed as a missing dependent variable problem may be reformulated as an ordinary omitted explanatory variable problem. "Several special cases of this model are of interest. First assume that the only variable in the regressor vector is the constant "1". In this case, the probability that an observation is included in the sample is the same for all observations and is not a function of any explanatory variables. The conditional mean of is a constant. Ordinary least squares estimators of equation (la) yield unbiased estimators for slope coefficients but a biased estimator for the intercept, and the population variance cr^. The same analysis applies to a more general model with X^ regressors as long as the set of XL^ variables is uncorrelated with the conditional mean of U^. In particular, if X^ and are independent random variables, this analysis continues to hold. In the general case with nontrivial regressors included among the X2^ variables it is unreasonable to expect that the regressors of equation (la) (i.e., X^) are uncorrelated with the conditional mean of U^. Accordingly, least squaresestimatoisof the slope coefficients (£>^) are biased. Without further assumptions about the distribution of it is not possible to sign the bias. If the conditional mean of the disturbance is well approximated by the linear terms of a Taylor's series expansion, this approximation may be substituted in equation (5) and an ordinary specification error analysis may be performed. 8 From equation (5), it is evident that a symptom of selection bias is that variables that do not belong in the true structural equation (e.g., elements of 3^ not in X^) may appear to be statistically significant determinants of when regressions are fit on selected samples. For example, in Gronau's analysis of the selection bias that arises in using the wages of working women to estimate the potential wage of nonworklng women, variables that affect the probability that a woman works, such as the presence of children, may appear to affect market wages when, in fact, no causal association exists. Thus regression evidence that women with children earn lower wages is not necessarily evidence that there is discrimination against such women or that women with lower market experience—as proxied by children—earn lower wages. Evidence that such extraneous variables "explain" wage rates may be interpreted as evidence in support of the selection hypothesis. However, even if no such extraneous variables appear in the selected sample regressions, estimates of the intercept and the population variance may be biased. If one knew the conditional mean of U,. or could estimate it, one li could enter it as a regressor in equation (5) and use' ordinary least squares to estimate the 8^ parameters. In the next section, I discuss a method for estimating the conditional mean for the case of jointly normal disturbances. Before turning to this discussion, it is helpful to relate the simple model presented here to previous work in the literature. The justly celebrated model of Tobin (1958) may be fit within this framework. (See also Amemiya, 1973.) In Tobin's model, data are missing on Yli 1£ Yli < °* SettinS *u = *2±, Bx= B2, Xj^ = X21, and = U^, the "Tobit" model arises.^- The bivariate density M^j^i U^) becomes degenerate ^"Tobin assumed a normal density of U-^. The conceptual logic of his model does not rely on normality. 9 since = ^2±' Siace ^2i are i(*entical» conditional mean of is not orthogonal to and bias is guaranteed for the least squares estimators of equation (la) applied to selected samples. Tobin's model was a major stimulus to later work. Its simplicity and elegance mask two important ideas that have been confused. Most economists have interpreted his model as a prototype of a limited dependent variable model: the range of observed values of the random variable Y ^ cannot fall short of zero. Putting Tobin's model this way, it is less interesting. Most economists are willing to live with this type of truncation of the range of a variable and simple transformations can eliminate it (e.g., use of logs). The important feature of Tobin's model is that a selection rule (Y^ < 0) generates the sample of observed data. Both Cragg (1971) and Nelson (1975) note that the selection rule generating observations on Y^ need not be as closely related to the population regression function as Tobin assumed. Their models may be fit within the schema of equations (1) and (3). For example, consider Nelson's model. Y^ is observed if Yli > Z2i where *s a random variable. In terms of the notation of equation (1), his model becomes Y2± - T - Z2±t B2 - 0. If Y£± > 0, is observed while if Y2i < 0, Y ^ is observed to be zero.1 Elsewhere, I present a model that.can be fit within the sample selection framework (Heckman, 1974). This model will be elaborated in Section III along with the closely related models of Gronau (1974) and Lewis (1974) ^Note, however, that Y^ is not, strictly speaking, a limited dependent variable since nothing prevents Y^ from becoming negative. 10 I note, in passing, that multivariate extensions of the preceding models, while mathematically /straightforward,may be of considerable substantive interest. Two examples are offered. One concerns migrants choosing among K prospective regions. Each person can be viewed as possessing K distinct wage functions, one for each region. If the self selection rule is to choose that region with the highest income, both the selection rule and the subsample regression functions can be simply characterized by an obvious K + 1 variable extension of the previous analysis. The second example concerns the measurement of union-nonunion wage differentials. Each person in a hypothetical population can be viewed as possessing both a union and a nonunion wage function. One self selection rule, based on the assumption of freedom of entry into unionism, is to select the unionism status with the highest wage. Estimators of wage pooled functions based on/union and nonunion samples yield biased estimates of the economic return to unionism if selection into unionism status is nonrandom. Before concluding this section, it is useful to clarify three concepts that are frequently confused in the literature. The first is the concept of a truncated variable. The second is the concept of a truncated sample. The third is the concept of a censored sample. A sample is said to be censored when it is possible to use sample evidence to estimate the probability that a hypothetical observation will be observed. This is the situation assumed in the model of equations (1) and (3). A truncated sample differs from a censored sample because the probability of sample selection cannot be estimated from observed data. A random variable is said to be truncated when its range is limited. Clearly, random variables can be truncated in either censored or truncated samples. Also, quite clearly, the operational distinction between a censored and 11 truncated sample vanishes if there is a priori information about the probability of sample selection for a hypothetical observation. These categories often overlap. Thus in Tobin's model the sample Is censored but the random variable is truncated. II. Simple Estimators for the Case of Normally Distributed U.. and U„, -li -21 In this section, the model of equations (1), (3) and (4) is derived for the specific case of joint normality for U"li and U2i> The normality assumption is used in the models surveyed in Section I and Is a natural starting point for any analysis. A simple estimator for this normal model is derived and discussed. The joint distribution of U^, u2i'h^Uli' U2i^,iS a bivariate normal density fully characterized by the assumptions stated in equation (2). It is permitted to be singular as in Tobin's model. Using well known results in the literature, (see Johnson and Kotz (1972)), pp. 112-113 or Gronau (1974), Lewis (1974) a12 E(UlilU2i ? " W -7-7172 \ (a22y a22 A, E(U2iJD2i > - hiB2) «7—172 i li " 1 - t^) where 4 and * are, respectively, the density and distribution function for the standard normal random variable and 12 ratio of the "A " is the inverse of Mills' ratio, and is the/ordinate of a standard normal to the tail area of the distribution. There are several important features of X^. (1) Its denominator is the probability that a population observation with characteristics X ^ is selected into the observed sample. (2) X(Z) is a monotone increasing function of Z and hence is a monotone decreasing function of the probability of sample 3X selection *(Z). In particular, lim X » 0, lim X «, and > 0. Z±+ -* Z±-»- 1 3Zi Figure 1 displays the relationship between X and *. In samples Fig. 1—-Probability of sample inclusion. in which the sample selection rule guarantees' that all population observations have an equal chance of being sampled, X(Z) is zero and the least squares estimator of equation (la) has optimal properties. Using these results, equation (5) becomes a. (6a) 12 E(Yli|Xli' Y2i - 0) * Vl + -±±T7T X (Zj) 1/2 while the comparable expression for becomes 13 a (6b) E(Y2i|X21, Y2i > 0) - X2±e2 + 22 X (Z ) (a,,) If one could estimate and hence estimate X^, one could enter the latter °12 variable as a regressor in equation (6a) and estimate 6 and -y- by least squares. Similarly, if one could measure when Y2i > 0, as in Tobin's model, knowledge of Y2i and X± would permit direct estimation of 62 and (cr^)1^2. Representation (6a) reveals that if cr^2 -» 0, so that the disturbances that 'affect sample selection are independent of the disturbances affecting the behavioral functions of interest, may be omitted as a regressor. Thus, if either X or cr^2 is zero, or both, least squares estimators of 6^ are unbiased. The full statistical model of which equations (63) and (6b) are expectations is now developed. One may write the model as (7a) tu - ^ +xi + vu (cr22) (7b) y21 - x2ie2 + -^m x± + v2i (a22) where e(vu!xuI v u2i > - x2ie2) - 0 e(v2i|x2±, x±f U2i > - x2±62) - 0 and E - X2±62) - 0 for i j1 i1. It is straightforward to demonstrate that (8a) ECV^ilXli' Xi' U2i * " X2i62) " all((1 " p2) + p2(1+ZiXi ' Xi)} (8b) E<\J2±\h±' X2i' V °2i > " h±*2> ' ai2U + Vi " *!> E^ilX2i' V U2i > " W ' J22C1 + Vi ' Ai} (8c) where 2 a 14 .oreover, one can easily establish that (9) 0 < (1 + X±Z± - \\) < 1. There are several important consequences of this inequality for the covariance structure of the disturbances of equations (7a, b). Suppose that one knows and A^, and enters ^ as a regressor in equations (7). Standard least squares estimators of the population variance of and are downward biased estimators of the appropriate parameters. Also, the standard estimator of the interequation covariance is downward biased in absolute value. Note further that if Z^ contains regressors (apart from "1") the variances of the disturbances of equations 7 are heteroskedastic. Least estimators squares/are not GLS estimators. The GLS estimators have an interesting interpretation. Unlikely observations (those with low probability of sample inclusion) receive greater weight than likely observations. This follows because the middle term in inequality (9) is a monotonically increasing function of the probability of sample inclusion, *(Z).^" Accordingly, less likely observations receive greater weight and observations with zero probability of sample inclusion receive the greatest weight. The GLS estimators based on known X^ possess unusual properties, not fully developed here. In contrast with the usual case for GLS estimators, parameters of the regression function enter the disturbance variance. This is seen most clearly in equations (7a) and (8a). Using the definition of p, presented below the equations (8a-8c), the coefficient on the X variable in 1/2 equation (7a) may be rewritten as Pajj_ 30 t*iat fc^e dependence is explicit. ^"An elementary application of L'Hospital's rule reveals that in the limit, as Z. —, X ■+ «, and lim E(V^.) - 0. Similarly, lim E(V.,V„.) - 0 and 11m Ijfe - (1 - p*>. V - 15 The consequences of this dependence are interesting although their full development is peripheral to this paper. Nonetheless, a brief outline may be of some interest. With known X^, one may use least squares to produce unbiased 1/2 estimates of the regression parameters of equation (7a); 0^ and Using the least squares residuals from equation (7a), one may form a consistent estimator of a1^*^ Then, an approximate GLS estimator that converges in distribution to the true GLS estimator may be found by estimating equation (7a) by weighted least squares with the estimated weights obtained from equation (8a). An important feature of this problem round is that one/GLS estimators are not asymptotically efficient compared to the appropriate likelihood function estimates which are based on a truncated bivariate normal distribution with known points of truncations.2 The preceding analysis appears to be somewhat beside the point since as a practical matter one does not know Z^ and X^ and hence one cannot directly estimate equation (7a). But in the case of a censored sample, it is possible to compute the probability that an observation has missing data so that it is possible to use probit analysis to estimate Z^ and hence X^. In the case of a truncated sample this is not so. However, if prior information is available on the probability that is observed it is possible 1 Denote the residuals by V... Since X. and Z. are known, and since 1/2 1 pa - is estimated, one can estimate This yields a consistent estimate of the variance that is guaranteed to be positive. Note, however, that nothing in the procedure guarantees a value of |p| inside the unit interval although in large samples it must lie in that interval. 2 This is so because o and a^. appear in the regression coefficients and in the variance so the information matrix is not block diagonal. An iterative estimator basedonthe initial consistent estimates previously discussed is asymptotically efficient 16 o estimate Z^ and X^ so that prior information on the probability of sample inclusion eliminates the distinction between censored and truncated samples. In the censored case, the probit likelihood function is T do' M - i [*(z.)]di[*C-zi)]1"di i=l where d^ denotes the event "observation of Y^." Under the standard conditions for identification in probit analysis (see Nerlove and Press, 1976), one 1/2 may consistently estimate S^lc^ * hence Z^ and X^. The estimated X^ may be substituted for the actual X^ in the preceding analysis. In Appendix A, the asymptotic distribution is derived for the least squares estimator based on an estimated X.^ instead of the actual X^. The least squares estimators are consistent and asymptotically normally distributed. Moreover, in the important special case of the null hypothesis of no selection bias (e.g., • 0 in equation (7a)), the standard least squares estimator of the variance-covariance matrix of the regression coefficients is the appropriate estimator. However, if 1* ^> tlle standard estimator is inappropriate and the formula (A4) in Appendix A should be used instead. As in the case of exact GLS estimators based on known values of X^, approximate GLS estimators are not asymptotically efficient nor do they converge in distribution to GLS estimators based on known X^ except in the important special case of a null hypothesis of no selection bias. To achieve asymptotically efficient estimators, maximum likelihood estimators must be employed. The estimators suggested here provide initial consistent estimators for the likelihood equations so that a one step iteration (Rothenberg and Leenders, 1964) yields estimates that are asymptotically efficient. Thus the task of computing efficient estimates is simplified and the problem of locating a starting value for likelihood 16-A function iterations is resolved. Elsewhere (Heckman, 1976) it is shown that for one problem the initial consistent estimators discussed here closely approximate the likelihood maximizing parameter estimates. 17 III- New Estimates of Female Labor Supply Functions and Wage Functions Free of Selection Bias: New Tests of an Old Model A. The Model In this section, the techniques of Section II are applied to estimat the labor supply and wage functions of married women. In the absence of fixed costs of entering and exiting the labor market, and under the assumption that workers are free to choose their hours of work, two functions fully characterize the labor supply decision. The first function is the market wage function for the woman, defined by equation (la). The second function Is the reservation wage that records the value that a woman places on her time if she does not work (W*). If the market wage exceeds the reservation wage (Y^^ > W*), a woman works and her hours of work adjust so that in equilibrium the marginal value of her time equals her market wage rate. more fully Under certain simplifying assumptions elaborated/elsewhere (Heckman, 1974a),hours of work, h^, are proportional to the gap between market wages and reservation wages. Denoting this proportionality factor by 1/y, and letting Y2i » Y^ - W* be the gap, one Is led to the following model: (11a) W1±\\V *2i > 0) - X1161 + E^JU^ > - X2i62) (lib) EChJXj^, X^, Y2i > 0) -^W2i)hf X2i* Y2i > 0) " X & ^ + iECu2i|u2i>-x21e2) (Uc) y > 0 18 This model differs from the sample selection model of equations (la) and (lb) in one important respect. Unlike the case in equation (lb) there is information about Y^^ up to a factor of proportionality (1/y) if a woman works (Y2i > 0). The decision function that characterizes labor force entry, which is the sample selection rule for this model, is closely related to the hours of work equation. The model of Lewis (1974) and Gronau (1974) is exactly the model of equations (la) and (lb) and does not utilize the potential source of information that closely links the participation decision and the labor supply function. From inspection of equations (11a) and (lib), it is clear that both wage and hours of work functions may be subject to selection bias. Least squares estimators of the wage and hours functions fit for working women confound the parameters of the sample selection function with the parameters of the behavioral functions of interest. This is not to say that estimates of wage or labor supply functions fit on subsamples of working women are of no interest. A regression model that deletes the conditional expectation of the error terms approximates a function with a well defined interpretation. Consider equation (lib). The same set of variables (X^) appears in the regression function and in the conditional mean of U^. If one deletes the conditional mean, to a first order approximation a regression equation estimates the vector K h , 1 3ECU2i'U2i > - X21g2) y = y y 3X2i Thus ordinary least squares coefficients estimate the effect of a variable moving along the behavioral function, the first term, and the effect of the variable in sorting people out in the taste distribution, the second term. 19 To clarify this decomposition, a concrete example may be helpful. Let vector consist of one variable—say, ability to perform market tasks. Ability is expected to increase the supply of hours to the market for a working woman (82/y >0)• Moreover, ability is expected to increase the probability that a woman works. But this means that as one samples across working women with greater ability one is sampling women with progressively (3E(U21|U2i > - X2 B2> lower average tastes for work -^- < 0 Thus the regression coefficient on the ability variable is a downward biased estimate of $^Y' Estimates of answer the question "what is the average effect of an additional unit of ability on the labor supply of women already working?" Economic theory provides a guide to the sign and magnitude of this coefficient. Estimates of 8,,/y answer the question "what is the change in the average" labor supply of women when one moves across ability groups?" These estimates give the basic ingredients required to estimate the aggregate labor supply curve. Given a distribution of ability in the population, one can add up the average labor supply at each ability class to compute aggregate labor supply.^- Typically, economic theory does not directly yield predictions about this parameter which combines parameters' describing movements along a given labor supply function with the parameters determining the entry of workers into the labor force. The parameter 1/y plays a crucial role in this analysis, and may be interpreted as the uncompensated effect of a change in wage rates on labor supply. From equation (lib) it is not clear how this parameter may be estimated. Recall that Y ^ is defined as the difference between market wages and reservation wages - W*). To demonstrate how y can be estimated one may introduce an explicit function for W* "Store precisely, equation (lib) multiplied by the probability that a woman works, yields an estimate of the average hours supplied to the market by women with traits , x^. 20 and note Chat V2i Y1I " Wi X1IS1 ' V 1 (12) h - — = —-- - -— + i (u - e ) . i y y y y 11 1 Then it is clear that if one variable appears in X that does not appear in N^, such as the market human capital of the wife, given estimates of equation (11a) one can estimate both y and fy- Note, too, that one can the follow conventions in simultaneous equation theory to avoid/multiplicity of estimates of y that arise in the overidentifled case if one inserts estimates of (obtained from (11a))to generate a predicted value of wage rates in (12), i.e., denoting such estimates by g^, estimating Y x fi - N.u, ■f-JUr^ + 7wu-«i>+7J[u",i-V where the final term vanishes in large samples. The crucial feature of labor supply function (12) is that the supply of labor is assumed to be a function of the gap between market wages and reservation wages. This gap, in turn, is a measure of the probability that a woman works. Thus a strong assumption of this formulation is that a woman more likely to work is also more likely to supply more labor when she works. Over the empirically relevant range, the labor supply curve may and Hanoch (1976) become backward bending (y < 0). Moreover, as noted by Cogan (1975)/,fixed costs of entry and exit may alter the simple relationship of equations (lib) and (12), and may even result in opposite signs for the effect of certain variables on labor supply and participation. As an example, consider the effect of money costs of child care. Eolding everything else the same, the greater the number of preschool children, the greater the cost of child care and hence the less likely is the event that 21 a woman works. However, given that a woman works, greater expenditure on child care results in a reduction of income and hence an expansion in hours worked if leisure time is a normal good. Time indivisibilities in the availability of child care, and commutation costs tend to reinforce the work increasing effect of child care costs.1 It is straightforward to extend the model of equations (11a) - (11c) to allow for these effects. A more general model is the three equation system (13a) Xli - Xli61 + Uu d3b) y2± - x2.e2 + u2. (13c) h± « T31 = X2163 + U3±, E(U ) - 0, ECU Ujllf) = ajjT i = i' = 0 otherwise. As before, a woman works, and her hours are positive if and only if > 0. A noteworthy feature of equations (13b) and (13c) is that the same set of variables determine the participation decision and the quantity of hours supplied. Under the null hypothesis that equation (lib) is correct, ^3 03 are equal up to a constant of proportionality (62 D —) and the Joint distribution of the U_.^ is a singular trivariate density. Assuming normality for ^Obviously, time costs decrease leisure consumed but need not decrease hours of work. Writing the leisure demand function in terms of wage rates W and full income (WT + A) where A is asset income and T is total time available, L =■ F(W, WT + A), 3L/3T = F W, 3h/3T - (1 - F W). Since F2 is positive the sign , /derivative the/ is ambiguous. Ceteris paribus the higher the market wage rate, the more likely is it the case that time and money costs operate in the same direction. 22 the V ^ one can write (34a) ECY11|X21, Y^ > 0) - + -^jjj X± (a22) (14b) E(Y3i - hi|X2i, Y2i > 0) - X2iB3 + X± (a22) using the same definition of given before.^ B. Main Empirical Results The data utilized in the empirical analysis are a sample of 1735 women taken from the 1967 National Longitudinal Survey of Work Experience of Women Age 30-44 (The "Fames" data) who are white, married with spouse present with husbands working in the previous year (1966). A woman is classified as working If she worked for pay in 1966 and satisfied the other sample selection criteria. Using this definition, 812 of the women work in 1966. The primary data source Is described elsewhere in detail (Shea, et al. 1970). A more complete description of the means of the data used here and the sources of sample attribution is provided in Appendix A. Given current professional Ignorance about the appropriate dimension of labor supply, a variety of measures could be analyzed. Instead, a careful examination of the available data suggests that only one reliable measure is available: annual labor supply as defined by dividing annual earnings in 1966 by a questionnaire wage asked in early 1967. A superficial inspection of the data source suggests that a direct measure of labor supply is availabl ^ote that equation (13c) is cast in terms of hours of work. Technically speaking, hours should be treated as a limited dependent variable In the empirical analysis presented below, I ignore this complication. Since hours of work distributions are concentrated far away from zero, and it is possible to use log hours rather than hours in a more general formulation. 23 by taking the product of weeks worked in 1966 and "usual hours worked." However these data are not usable since "weeks worked" includes vacation time and sick leave—two important margins of adjustment. Inspection of the histograms of both weeks worked and annual hours defined as the product of weeks worked and average weekly hours, suggests too much bunching of hours in standard reporting Intervals. Appendix C presents the histograms for the measure used as well as the standard measures.^" An important point to note in these histograms is that the distribution of annual hours of work, properly measured, shows much less of the bunching away from zero hours of work that is manifest in conventional measures of labor supply. It Is precisely this artificial bunching that has stimulated recent work that introduces fixed costs into the analysis of labor supply behavior (Rosen, 1974). Accordingly, it is not surprising to find the result reported below, that with a proper measure of labor supply, there is much less evidence in favor of models with fixed costs of work. One disadvantage of the choice of labor supply measure that is offered in the text is that some women who work in 1966 did not supply a questionnaire wage. Only five percent of the sample is lost for this reason. 24 The specification of the economic relationships is conventional and requires little comment. Following Mincer (1974), the logarithm of wage rates is assumed to depend on schooling and market experience. Experience Is defined as the number of years since leaving school that a woman has worked six months or longer. Following much previous research, female labor supply is postulated to depend on wage rates of the head and wife, the presence of children, family assets and wife's education. Recent work by the author (Heckman (1977)) presents evidence that the labor market experience of the wife cannot be treated as an exogenous variable in the participation decision. (Evidence on this is offered below.) This variable records the wife's previous work history and is highly correlated with unmeasured determinants of current labor force participation. The empirical analysis discussed below explicitly deals with the endogeneity issue and considerable evidence is found for endogeneity of experience in labor supply and participation equations but little evidence for endogeneity in wage functions. Estimates of labor supply functions that purge "experience" of its endogenous component produce more plausible labor supply estimates. The structure of the discussion of the empirical results is as follows. Fitst, estimates of equation (13b) are discussed. Then estimates of the labor supply and wage functions are presented. Finally, some tests of the simple model of equations (11a) and (lib) are performed in a separate section. 1/2 Table 1 records the estimates of the (ct^ ) normalized coefficients of equation (13b) which generates the probability that a woman works. The first column presents estimates based on the assumption that experience is exogenous. The second column presents estimates based on predicted experience. The instrumental variables used to predict experience are reported below the table. 25 TABLE 1 PROBIT ESTIMATES OF THE PARAMETERS DETERMINING THE PROBABILITY THAT A WOMAN WORKS (EQUATION (13b)) (Asymptotic normal statistics in parentheses) e2/(a22) 1/2 (1) Estimates That Treat "Experience" As Exogenous (2) Estimates That Treat "Experience" As Endogenous Intercept Nbr. of children less than 6 Assets Husband's hourly wage rate ($/hr.) Wife's labor market experience (yrs.) Wife's education Log likelihood Observations -.817 (4.7) -.504 (10) .436xl0"7 (.25) -.177 (8.0) .098 (15.0) .080 (15.3) -920.9 1735 -.412 <1.56) -.493 (9.11) .6l9xl0~6 (.29) -.167 (7.81) .046 (1.81) .074 (5.3) -1073.1 1735 The probability that woman i works is <*22> 1/2 1 -tV2dt The instrumental variables used to predict experience are linear and squared terms for children less than six, 1967 assets, husband's age, husband's education, husband's hourly wage, wife's education, and interactions of all linear terms. 26 As expected from a reading of the literature, the presence of small children, and a higher husband's wage rate lower the probability that a randomly selected woman works. Women with greater education are more likely to work. For both sets of estimates, greater work experience raises the probability of participation although both the size of the effect, and its statistical significance, are diminished when predicted experience is used in place of the actual variable in the estimation of the probit coefficients. A straightforward application of the Wu (1973)) test rejects the null hypothesis that "experience" is uncorrelated with the error term in (13b).^ Following the methodology outlined in Section II, the probit 1/2 coefficient estimates may be used to consistently estimate &2^a22 ' ^i an<* hence X(Z^). Hourly wage regressions with and without these estimated regressors are presented in Table 2 which also presents some evidence on the endogeneity .of experience in the wage function. Column 1 presents the estimates of the traditional wage function. The estimates of the traditional equation corrected for censoring, but assuming experience to be endogenous, are presented in column 2. There is some Indication of sample censoring but it is not overwhelming. The test statistic on "A" in column 2 is only marginal, and the wage coefficient estimates are essentially unchanged from column 1. Columns 3 and 4 record the results of an analysis that predicts experience and tests whether or not regression specifications based on predicted experience differ significantly from regression specifications with actual experience. Inspection of the Wu statistic on the bottom line of columns 3 and 4 suggest that the endogeneity ^e Wu test as used here consists of entering both "experience" and the residual of "experience" from predicted "experience" in the probit function. If the coefficient on the residual is significantly different from zero, one rejects the null hypothesis of uncorrelatedness of experience with the error term. The test statistic for this model gave a "t" of 2.1. TABLE 2 RESULTS FOR HOURLY WAGE RATES (EQUATION (13a)) (Asymptotic normal statistics In parentheses) Variables 4U d o O -H *H Cll 4-> u) •H 0) tj M (0 öq M a) Eh ptf Traditional Regression ^ Corrected for Censoring Traditional Regression w Corrected * for Endogeneity Traditional Regression ~ Corrected ^ for Endogeneity and Censoring Wife's labor market .0167 (7.85) .0207 (5.99) .003 (.33-) .0151 (1.2) experience (yrs.) Wife's education (yrs.) .0763 (13.6) .0779 (13.7) .074 (12.7) .076 (12.8) X m • • .0878 (1.48) * * • .1002 (1.4) Intercept -.401 -.515 -.226 -.419 R2 .230 .235 .168 .1709 Wu statistic** ■ * # » ■ « .34 (t) 1.17 (F) The instrumental variables for experience are listed In Table 1. **The Wu statistic Is the "t" score on the residual of predicted experience from actual experience in column 3 and is the "F" score on this residual and the residual between predicted X and actual X (based on measured experience). Predicted X is obtained from a regression of X based on measured experience on polynomials in the Instrumental variables. 28 of experience is not an important issue in the estimation of the coefficients of the wage function. In my judgment, the best wage function is the traditional one recorded in column 1, but there is little to choose from between the estimates presented in column 1 and those presented in column 2. The story with respect to the estimates of the labor supply functions is different. There is strong evidence for both sample censoring and of experience. endogeneity/ The estimates of the traditional regression specification are displayed in column 1 of Table 3. These estimates are in agreement with those in previous studies and require little comment. The regression estimates recorded in column 2 are unreasonable. There is little evidence of sample censoring, but the coefficients of the equation are not fit with much precision. Column 3 differs from column 1 in that experience is treated as an endogenous variable. The result of the Wu test applied to this equation strongly rejects the null hypothesis that experience is an exogenous or predetermined variable. Column 4 displays the estimates of the labor supply function accounting for sample censoring and endogeneity of experience. The null hypothesis that experience is predetermined and the null hypothesis of no censoring are both rejected. Accordingly, the estimates in column 4 are offered as the best in this table. A comparison of columns 3 and 4 reveals important differences. Except for insigificant coefficients, all of the slope coefficients in the labor supply equation presented in column 4 are larger in absolute value than the coefficients in column 3. The elasticity of labor supply in the column 3 estimates is high but not too much outside the range of estimates presented by Schultz (1975, p. 31). The elasticity of 4.5 derived from the specification in column 4 seems unduly large and requires some comment. TABLE 3 ANNUAL HOURS WORKED DEFINED BY DIVIDING EARNINGS BY WAGE RATES (EQUATION (13c)) (Asymptotic normal statistics in parentheses) (1) (2) (3) (4) 'S co (5)*' H rH t-t O «d ß (dö cd (3 d M W nö cot) bo ö o tu m *j q *H o-Hiup: OtH-ho Variables u in u w o u u to < oj ü qj" b b S h hin m g (4) T3 4J W F> Nbr. of children -155.8 (3.73) -94.1 (.93) -141.04 (2.61) -925.1 (2.75) -207.2 less than 6 Assets (?) 3.1xl0*3 (.18) 3.2xl0~3 (.191) 3.2xl0_3 (.17) 3.1xl0~3 (.17) +3.1xl0_3 Husband's hourly -33.1 (1.9) -12.4 (.35) -16.9 (.9) -275.4 (2.5) -32.9 wage rate (S/hr.) Wife's experience (yrs.) 48.6 (12.1) 39.1 (2.66) 57.2 (2.55) 128.9 (3.4) 60.0 Wife's education (yrs.) 21.1 (1.89) 11.7 (.66) 10.9 (.89) 119.5 (2.5) 11.70 * .... -201.8 (.67) .... 2401 (2.36) .... Intercept 664.2 912.1 765.1 -1755 .... K2 .17 .176 .115 .121 • ■ * Wu statistic* .... .... 3.1 (t) 9.8 (F) ... Implied labor supply 1.68 1.35 I.99 4.47 2.08 elasticity w/respect to hourly wage rates*** The Wu statistic is explained in Table 2. ** These estimates are obtained from the coefficients in column 4 added to the effect of the variable on the conditional mean of the disturbance, (3A/aX21), multiplied by the coefficient on X(2401). ***These estimates are obtained by dividing the experience coefficient in the wage function into the |->erience coefficient in the labor supply fun{?tfonand dividing by average labor supply. 30 The Important point to note is that traditional estimates of the coefficients of labor supply functions of working women confound two effects: movement along a given labor supply function for working women and movement across taste distributions. Thus, for example, presence of an additional child under six has a dramatically negative effect on hours of work for a working woman (-925 hours reduction in supply). But working women with an additional child have a greater average taste for market work since only the most work prone women remain at work after the Imposition of a child. The two separate effects for each variable can be combined and evaluated at the sample mean. The result of such combination is displayed in column 5. By and large there Is close agreement between the coefficients of column 3, which are estimates of the combined effect, and the coefficients in column 5. In particular, the estimates of the wage elasticities are in very close agreement. The conclusion to be drawn from the labor supply analysis is that traditional methods of estimating labor supply functions give a downward biased (in absolute value) estimate of the true effect of economic variables. The * timates presented here reveal a strong behavioral response to wage change which is not discordant with previous estimates, but which casts a new light on their interpretation. The estimated labor supply elasticity reported in column 3 (1.99) is quite similar to an estimate of .2.3 reported by Harvey Rosen who uses the same data set.^" Rosen's estimated elasticity combines the effect of a the labor supply of already wage change on/ working women with the effect of a wage change in altering 1See Rosen (1976). 31 the composition of the sample of working women. Thus his estimate is compared with the estimate in column 3, and is an understatement of the effect of a wage change on the labor supply of women already working. A final feature of the estimates presented in column 4 is worth noting. The coefficient on "A" is large and positive. This suggests that unemasured factors that raise the probability of participation also tend to increase the volume of labor supplied to the market. The sign of the correlation is in accord with that predicted by the simple model of equations (11a) and (lib). C Tests of the Simple Model of Equations (11a)-(lib) and a Revised Model In this section, some informal tests of the simple model of equations (11a) and (lib) are conducted. Most, but not all, of the restrictions predicted by the model are in accord with the data. An expanded version of the simple model is offered that allows for the effect of variation in the availability of informal day care arrangements that is documented elsewhere (Heckman, 1974) as well as variation in the fixed costs of work examined by Cogan (1975) and Hanoch (1975). The structure of this section is as follows. First, informal tests are discussed. Then, a revised model is offered. As previously noted (page 21), one implication of the simple model of equations (11a) and (lib) is proportionality between the estimates of 1/2 S0/a0„ from the probit function and the parameters of the labor supply a 22 1/2 function 03 (i.e., &3 « (-^- }%^°22 ^' The constan,: of proportionality is predicted to be positive. The ratios of the probit coefficient estimates (taken from column 2 of Table 1) to the hours of work coefficient estimates (presented in column 31-A 4 of Table 3) are displayed in Table 4. Given Che sampling error in estimating these coefficients, the ratios are remarkably close to each other. The agreement is closer yet if one examines only the ratios of coefficients that are statistically significant in both equations. These ratios are denoted by an asterisk. A second test of the simple model is available. From equation (7b) and (lib), one can write the hours of work function as al/2 hi " -f (-ZI + V + Y V2i" The variables in parentheses can be estimated from the probit coefficients. From equation (8c), the variance in the residual in the second term is given 32 TABLE 4 RATIO OF PROBIT COEFFICIENTS (ß^a^ ) TO LABOR SUPPLY FUNCTION COEFFICIENTS (ß,) Intercept Nbr. of Children Less than 6 Assets Husband's Hourly Wage Rate Wife's labor Market Experience Wife's Education .23xl0"3 -3* .53x10 .19xl0"3 -3* .61x10 .36xl0~3 _3* .62x10 J Denotes a ratio of coefficients that are statistically significant at conventional levels* in both relationships. TABLE 5 ESTIMATES OF EQUATION 15 Standard Error of k Estimate (From regression coefficient) (From regression residuals) 1424 (t stat. is 46) 1221 TABLE 6 LABOR SUPPLY COMPARISONS WITH THE "TOBIT" MODEL "Tobit" Estimates Estimates from Table 3, Column 4 Nbr. of children less than 6 Assets ($) -658.5 (9.5) .24xl0"2(.9) -925.1 (2.78) .3xl0"2(.2) Husband's hourly wage rate ($/hr.) -201.4 (7.6) -275.4 (2.5) Wife's experience* yrs. 87.02 (5.1) 128.9 (3.4) Wife's education yrs. 88.0 (5.1) 119.5 (2.5) Intercept -669 (2.0) -1755 Estimated standard error of regression (o^ /y) 1409 * t t ft Ln likelihood -7595.53 * * * ■ * Predicted experience as defined in Table 1. 33 Thus another test of the simple model can be conducted. Run the weighted interceptless regression (15) h^ - k(-Zi +■ \±) v± + c± 2 -1/2 where wi - (1 + A^Zj^ - A^) . The regression coefficient is a consistent 1/2 estimator of /y which is the square root of the residual variance in the estimated equation. A test of the simple model is to compare the square root of the estimated residual variance with the regression coefficient. This comparison is made in Table 5. The agreement between the two estimates is remarkably close. Moreover, as previously discussed in the first paragraph of this section, the constant of proportionality estimated in Table 4 is 1/2 -3 the inverse of /y. Using .6x10 as an estimate of the average ratio, 1/2 another estimate of Q^l ^ ^s ^-666, again a number close to the regression coefficient estimate. Other informal tests of the simple model are possible. If equation (lib) is the labor supply function, the model of Tobin, discussed in section one ("Tobit") is an appropriate description of the labor supply function. Estimated Tobit coefficients are displayed in Table 6. "Tobit" underestimates (in absolute value) the coefficients of the labor supply function.^- Except for the intercept term, each "Tobit" coefficient is about seven-tenths of the corresponding coefficient of the unrestricted labor supply estimates, reproduced in the second column of Table 6. Note, however, that the estimated standard error of the Tobit regression (1409) Is remarkably close to the previous estimate (1424) obtained from equation (15). This understatement of Tobit coefficients suggests that the Shultz (1975) estimates of female labor supply elasticities, based on Tobit, are downward biased. 34 At this point it may be helpful to take stock of what has been learned. The simple model is almost right. The "Tobit" estimates of the slope coefficients are smaller (in absolute value) than the initial consistent estimates, but by a constant of proportionality (.7). The only discordance in this pattern comes in the estimates of the intercept terms. The Tobit intercept is disproportionately larger than the unrestricted labor supply Intercept than the ratio of slope coefficients would suggest is appropriate. The model of equations (11a)-(11c) and (12) may be modified slightly to rationalize this pattern. First review the economics of the simple model. To focus ideas suppose the wage function (corresponding to equation (11a)) is (16a) W » aQ + a^E + U while the reservation wage function (i.e., value of time) Is (16b) W* » YQ + Yjh + Y2A + e where e, U are disturbances, E is (exogenous) experience, A is asset income and h is time not spent at home. The simple theory assumes that ct^ > 0, > 0, Yj > 0- A woman works if W > W* at zero hours of work, i.e., (17) aQ + OjE + U > YQ + Y2A + e. Her labor supply function is obtained by equating (16a) and (16b) when inequality (17) is met, i.e., (18) h - ifr, + BlE - Y0 - V) + 35 Suppose that there are work related costs such as day care and other household expenses. Recent evidence (Heckman, 1974b) suggests that some women have access to limited quantities of low cost day care and other household services from friends and neighbors. An analytically simple way to characterize such limited availability of low cost substitutes is to view it as an augmentation of the woman's time budget that expands available time by less than one hour for each hour worked up to some given number of working hours. A consequence of the limited availability of low cost substitutes is a discontinuity in the labor supply function at the given number of hours. AA' Figure 2 illustrates this case. The solid line/is the labor supply curve for a woman of given characteristics who uses market substitutes for her time. Market substitutes for the wife's household input are assumed to G' Fitted line BE' A Wage Figure 2 be available at fixed marginal prices. The reservation wage for such women is given by A.1 The line, BEFG, illustrates a labor supply curve for ^"As recently discussed by Brumm (1976) , no unique reservation wage is defined if there are/work related costs. Nonetheless, the extension of the such 36 a woman who has access to Informal sources that (imperfectly) replace her time at home up to h* hours. Note that this woman has a lower reservation wage, B, than the other woman, but that beyond h* hours, it takes a greater wage rate to induce her to work more hours. This is so because of the wealth effect that arises from her access to low cost sources, and from the assumption that leisure is a normal good. The population of all women contains a mixture of women with the two types of labor supply functions. In a general model it is plausible that both slopes and intercepts of labor supply functions are affected by the limited availability of low cost substitutes for the woman's time. Before more elaborate models are explored it is useful to examine more fully the implications of the simple model depicted in Figure 2. Given a-1 distribution of the two types of labor supply functions in the population, and given that some women with a "broken" labor supply function have hours of work in excess of h*, the average reservation wage in the population is less than the average of the intercepts of the labor supply functions. In a model that ignores interpersonal variation in the cost of household substitutes, the two measures coincide so that the average of the intercepts is the average of the reservation wages. An important consequence of the inequality of reservation wages and intercepts is that any empirical procedure that constrains the intercept of the labor supply equation to be the reservation wage understates the effect of wages (and other variables) on labor supply. This insight is important because the model of equations (11a)-(11b) and the Tobit model both impose this constraint on the data. To establish this result intuitively, note that using equations (16a) and (16b), equation (18) can be written as labor supply curve to A defines a wage that plays the same role as the reservation wage in a model with no work related costs. 37 h - so that the wage that Just induces a woman to work a positive number of hours is the reservation wage w*. In terms of the notation of Figure 2, A - w*. If one constrains the intercept of the labor supply curve to be the reservation wage, when the "broken line" function BEFG describes some or all of the data, one underestimates the response of hours to wage rates as well as the intercept in the labor supply equation. See the dashed line B'G' in Figure 2. One can prove that "Tobit" and the model of equation (lib) impose this constraint on the fitted function. model with fixed costs of work. For simplicity consider money costs of work. As both Cogan (1975) and Hanoch (1975) have shown, the effect of fixed costs of work on labor supply is that women who work at all must work a minimum number of hours, say h, to recoup the fixed costs. The reservation wage is raised over a case without fixed costs. This model is depicted in Figure 3. Here the standard labor supply function for a woman of given characteristics is indicated by a solid line while the modified labor supply function is indicated by a dashed line. As before, "A" denotes the reservation wage in the standard case while "C" denotes the reservation wage in the case of By way of contrast with these results, it is helpful to consider a hours h 0 A C Wage Figure 3 38 fixed costs of work. In the presence of fixed costs, the reservation wage is greater and the level of the supply function is higher reflecting the assumption that leisure is a normal good, and that fixed costs subtract from income. Unlike the situation in the previous case, the average of the reservation wages exceeds the average of the intercepts. Thus any model that constrains the intercept of the labor supply function to equal the average of the reservation wages overstates the effect of wages (and other variables) on labor supply.^" The empirical results reported in Tables 4 and 6 favor the model of differential access to low cost substitutes for time in the home over a model with fixed costs in a dominant role. "Tobit" underestimates the reponse of labor supply to a change in economic variables, and the Tobit intercept is higher than the intercept of the unrestricted labor supply function, an implication of a model In which the true reservation wage is less than the intercept of the labor supply equation. Finally, note that the only modification required to make equations (lla)-(llb) consistent with data is relaxation of the interequation proportionality of intercepts■ The corresponding interequation slope coefficients are related by a common factor of 2 proportionality. '''This argument is made by both Cogan (1975) and Hanoch (1975). These effects would also arise if employers offered "tied" packages of wages and hours if each Individual had his own "best11 mini mum hours offer in the market. In this case, the labor supply function for the tied case would coincide over the relev; range /with the standard labor supply function, but the average of the intercepts would understate the. reservation wage. 2 Another implication of the modified model that the correlation between the disturbances of the labor supply function and the participation equation need not be unity because there is a source of variation in the labor supply function that does not effect the participation equation (maximum availability of close substitutes h*). 38-A The choice of the dependent variable crucially affects the outcome of such tests. In results not reported here, use of the conventional measure of labor supply defined as the product of "usual hours per week" and "usual weeks" leads to precisely opposite implications, i.e., one would accept a model of fixed costs. But as previously noted, the standard measures induce the illusion of fixed costs via reporting error that overstates the extent of labor supply and the frequency of occurence of standard reporting intervals so that empirical analyses based on this measure of labor supply yield misleading conclusions. 39 Summary and Conclusions In this paper, the bias that results from using nonrandomly selected data is discussed within the specification error framework of Griliches and Theil. A computationally tractable technique is discussed that enables economists to utilize simple regression techniques to estimate behavioral functions free of selection bias. Asymptotic properties of the estimator are developed. A model of female labor supply and wage rates is estimated with this technique. The empirical results suggest that selection bias is an important problem in estimating labor supply functions but is less Important in estimating wage functions. Very high estimates of the elasticity of female labor supply are derived but these are shown to be consistent with conventional estimates that ignore selection bias. The labor force experience of the wife is shown to be an endogenous variable in labor supply equations but not in wage functions. Some informal tests of the model of Heckman (1974a) are presented. Many Implications of the model appear to be In accord with the data but an expanded model that introduces the notion of limited household accessibility to low cost substitutes for the wife's time appears to fit the data better. With a proper measure of labor supply, the implications of a model with fixed costs of work in a dominant role are rejected by the data. APPENDIX A* The Asymptotic Distribution ,of Estimators Based on an Estimated a^ For notational convenience rewrite equation (7a) in the text (Al) Y1± - Xli01 + a± + Vu i - 1,...,^ < T When an estimated value of a^ is used in place of the true a^, equation (7a) is modified to read CA2) y1± - h±h + a± + a±-i±)+v1±. (a22> C^22) Estimates of a^^ are taken from probit functions fit on the full sample of T observations. Thus, we know 1/2 82 ST (02/a2^Z--^2") - N(0, S), a22 1/2 Since a^ «* a(z^) = a(- ^±^2^22 ^' siace a is a twice continuously differentiable function of z^, plim \± - plim a(z±) - plim X^2±&2/ ^a"1 ^1 1 22 1 V where 43 ,3A. v , 3A. Then it is the case that '12 (o22)1/2 4-1 t'-l ^ CT12 L L °22 4-1 i'-l H V where the variance does not converge to zero. The two terms in (A3) are asymptotically Independent. (The proof Is trivial and hence is omitted.) Now we derive the distribution of the least squares estimators based on estimated a.^. Define B i*iAi# 1 *(zi)x2ixi z ^±\HZ±) 1-1 plim W2!' 1 *(ZI)XiiAI z ^±Hz±)x. z \\ *(Z±) -1 Under the previous assumptions, this matrix exists and is of full rank (K^ + 1) Then ei - si 1/2 T22 ,1/2 J22 44 17 B- /"f 12 f22 £ Aiai" V *(zi} So (A4) ^~ '12 (a22> 172 '12 ( 22) 172 N(0, B * B') where ft y1*(z1)xli ij) - a 11 1 Vi *CV '12 r22 l l' e z x_' x e ee xJ, n. . 1 £-1 1 £ XV 1£1£* £=1 I'-l"* T l l l l' e e x n £»1 £»1 J-L£ £ £' a2 where n±. a^^^.,^ l l e 10 £=1 £'-1 1£1£l L£ ~£ . M3Z L£ ^£ 1/2 e x 1/2 * * i-ii 1» r *> *> £ £ and £1. . has been previously defined. £ V 45 2 a12 Under the null hypothesis of no sample selection bias, -, and a22 B B 1 is the standard least squares estimator for the variance-covariance matrix of the regression coefficients. In the general case, the parameters of are estimable so that B $ B' is estimable. One may also derive approximate GLS estimators that do not converge in distribution to the GLS estimators with known a^. The essential ingredients of their derivation are available from previous results. Let T denote i the number of observations on configuration I that appear in the selected sample. Array the observations so that the first T^ observations are on configuration 1, etc. Then for a sample of size T^, one may write n1IT 11 n2IT 12 Vt 1LJ '12 r22 "iL1!1! * • * "li^l where ui ,3A , ,3A '11 ' " <3Z£)<3Z£f) ^ Tx X2ir ' and i^ is a T^ x 1 vector of "Is". Now Z is a positive definite matrix so ppi where F is an orthogonal matrix. Then define X2iP *t (äz-> —— . so that K may be written as 46 K - cr 11 11 12 r22 L T LL ■v.®*;. and K ^ may be written as K -1 -1 11 I_ n"1 + y * -i -i where ^ - (^) (n£ ) (o ) „2 2 i 12 . 12 and T - - — (1 + "J—5 22 22 11 A-l *T n"1 1L L 2 T-.u.-nT1) 1A £A £ *2 ®*2 *« (see, e.g., Graybill (1969), Thm. 8.3.3). Let the 1 x (K^ + 1) data vector for observation I (3^» A.^), be denoted by y^. The observation matrix is the x (F^ + 1) matrix R. Then the appropriate GLS estimator is 47 *1GLS " (R'kTWk-1^. Then /"T (01GLS - - N(0, R' K*1 R) Note that the elements of K are estimable. APPENDIX B.l In the 1967 National Longitudinal Survey of the work experience of women age 30-44, 5083 observations are available. The following sample selection criteria were imposed to reach a usable sample of 1735 women, 812 of whom work in 1966. The number of observations failing to meet a criteria is given in the column to to the right of the rejection criterion. Observations may be rejected for any of the reasons listed, and a given observation may be rejected for several reasons. (1) Nonwhite 1477 (2) Married spouse present 1019 (3) Farmers 252 (4) Missing husband's income 421 (5) Missing annual hours of husband 336 (Including no work group) (6) Missing wife's experience 301 (7) Missing wage data on wife 126 Assets were assigned in 165 cases. An equation is fit on the available 1570 observations. The equation is Assets (1967) - -6891 + 73 (wife's experience) + 1647 (wife's education) + 466.4 (number of children less than six) + 806.8 (husband's education) + 2040 (husband's age) - 17.475 (husband's age squared). 48 APPENDIX B.2 Sample Means of the Data Used In the Analysis (From 1967 National Longitudinal Survey of the Work Experience of Women 30-44) Workers Total Sample Number of Observations 812 1735 Number of children less than 6 .312 .565 Assets ($) 11,711 11,974 Husband's 1966 hourly wage 3.45 3.73 rate ($/hr) Wife's education (yrs.) 11.42 11.29 Labor force experience (yrs.) 10.63 7.80 Wife's annual hours worked 1289 . . . Wife's hourly wage rate 2.12 ... A. .6412 1.12 49 APPENDIX C.l The histograms for reported weeks worked in 1966, reported annual hours worked, and estimated hours worked based on a division of 1966 earnings by a questionnaire wage rate are displayed in that order. The ordinate gives the number of observations for the value of the variable displayed on the abclssa. 50 e C c e C C c c C c e í. 0 (1 o 0 0 0 0 0 0 0 0 0 ♦ 4 + 4 ♦ ♦ 4 4 4 + 4 ♦ i 3 a 3 3 •J 3 3 3 3 3 3 0 b » 9 e v t í. S Z L Z 2 g c í. Z 9 I & A 0 6 i 0 ď L 9 í ( u 0 9 * • • • • * * • ■ • • e t Z Z z Z Z Z z i I E C E E Z Z Z Z Z z t 0 0 0 0 0 0 0 0 0 0 0 4 ♦ + 4 4 4 4 4 + 4 ♦ 3 a 3 3 3 3 a 3 3 9 9 i 9 1 t/ Z 6 Ĺ Z 0 E 8 Z L S 0 6 6 0 E E Z 0 1 • 9 0 V 6 E í » 9 « m • ■ » » • • I I i t 6 L 9 Z I t * *+ * * * *** ** *•*•* * ** ******** ********* ***** ********** •** 4i * * * •*•• ***** * • * ***** * * • * * * e I l U I 6 z EZtl Ob 1IE2Z&S £99«*« V ZE60 T 6Z6ZÍZÍS8 &*6 Ei* 19SOZ899 ES 68&M *S9 I 09818 »I *986I0e>0S96 1 t I bt z I I II t e I ttt I II I TU t l6 tsäo *et (SiO3M~3jln*san0H~3di« i*t taievtbvA • i 00*3000*i I0491CI*F 10» 3699* * 10« 3b0z* L t043£ti'6 z0+36zz*1 Z0+3?8** t 20+39C1*l ?043D6&* t ".ROSS PRODUCT: l. VAR I ABLE I 20» H0D_ANNU AL_H0UR5 I /♦ INCOME/t WAG= PATE COUJ» •/ 3ft» OBSt , 612 113Z2UZU21U1 tllllllllilll 112121333211 1 1 179l707a27205a20l226543327747a8S7O917427390aS3333322 1311 1 I 31 t t * .0006«Ol * 3.5006*01 * * + 3.OOOE«Ol 2 .SOOE+01 2.0COC+01 * * ♦ * * * * ** * * 1.SOOE+Ol « * * • * • ** ♦ * * * ** * * t.OOOE+01 * • * * * a.oooe+oo * ♦•** * * ** 1.OOOE+00 * « 1 2 7 1 1 1 1 2 2 2 2 2 3 3 3 4 á. 4 4 5 3 • * ■ • • ■ * * • • * * ■ • » • • • . • . * • 0 2 0 7 0 3 S a 0 1 4 6 4 2 S 7 0 3 6 ň 1 í 4 0 a 1 4 4 2 9 6 3 A 1 a A 3 0 0 <5 2 9 á. 1 7 c s 7 9 S 1 4 s 1 4 7 0 3 7 0 3 (S « 2 A 4 3 6 e e E E e £ E e E E e E e E E E E E e • ♦ • + + * + + + ■♦ + ♦ + * + * + ♦ ♦ ♦ ♦ + + C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a 1 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 CO «ff g Al». nm — O "O — W « «4 Pi (MO •"O U) JC U 1LI I il tu 10 a < > im <0 (MIO K) • O (V O * » «l*||Jt Q„ • tu + o n • — oou + a-< 04 ♦tOO'OUJ + O'* m *n<4 uj ♦ im •oa)u