Psychological Bulletin 1968, Vol. 70, No. 6, 426-443 MULTIPLE REGRESSION AS A GENERAL DATA-ANALYTIC SYSTEM 1 JACOB COHEN New York University Techniques for using multiple regression (MR) as a general variance-accounting procedure of great flexibility, power, and fidelity to research aims in both manipulative and observational psychological research are presented. As a prelude, the identity of MR and fixed-model analysis of variance/covariance (AV/ACV) is sketched. This requires an exposition of meansof expressing nominal scale (qualitative) data as independent variables in MR. Attention is given to methods for handling interactions, curvilinearity, missing data, and covariates, for either uncorrelated or correlated independent variables in MR. Finally, the relative roles of AV/ACV and MR in data analysis are described, and the practical advantagesof the latter are set forth. If you should say to a mathematical statistician that you have discovered that linear multiple regression analysis and the analysis of variance (and covariance) are identical systems, he would mutter something like, "Of course—general linear model," and you might have trouble maintaining his attention. If you should say this to a typical psychologist, you would be met with incredulity, or worse. Yet it is true, and in its truth lie possibilities for more relevant and therefore more powerful exploitation of research data. That psychologists would find strange the claimed equivalence of multiple regression (MR) and the fixed-model analysis of variance (AV) and covariance (ACV) is readily understandable. The textbooks in "psychological" statistics treat these matters quite separately, with wholly different algorithms, nomenclature, output, and examples. MR is generally illustrated by examples drawn from the psychotechnology of educational or personnel selection, usually the prediction of some criterion (e.g., freshman grade point average) from predictors (e.g., verbal 1 This work was supported by Grant No. MH 06137 from the National Institute of Mental Health of the United States Public Health Service, and by an open computing grant from Abacus Associates, Inc., New York, N. Y., to whom grateful acknowledgementis accorded. The author is also grateful to the members of the Society of Multivariate Experimental Psychology for their constructive response when this material was presented at their annual meeting in Atlanta, Georgia, November 1966. This work profited greatly from detailed critiques supplied by Robert A. Bottenberg and Joe H. Ward, Jr., but since not all their suggestions were followed, they share no responsibility for any defects in the result. and quantitative score, high school rank). The yield is a multiple correlation (R) and a regression equation with weights which can be used for optimal prediction. The multiple R and the weights are subjected to significance testing, and conclusions are drawn about the effectiveness of the prediction, and which predictors do and do not contribute significantly to the prediction. By way of contrast, AV and ACV are generally illustrated by pure research, manipulative experiments with groups subjected to different treatments or treatment combinations. Means and variances are found and main effect, interaction, and error mean squares computed and compared. Conclusions are drawn in terms of the significance of differencesin sets or pairsof means or mean differences. More analytic yield of one or both of these systems is sometimes presented, but the above is a fair description of the respective thrusts of the two methods, and they are clearly different. The differences are quite understandable, but the basis for this understanding comes primarily from the history and sociology of behavioral science research method and not from the essential mathematics. MR began to be exploited in the biological and behavioral sciences around the turn of the century in the course of the study of natural variation (Galton, Pearson, Yule). A couple of decades later, AV and ACV came out of the structure of (agronomic) experimentation, that is, of artificial or experimentally manipulated variation, where the treatments were carefully varied over the experimental material in efficient and logically esthetic experimental designs. The 426 MULTIPLE REGRESSION IN DATA ANALYSIS 427 chief architect here was R. A. Fisher. These historical differences resulted in differences in tradition associated with substantively different areas and value systems in the psychological spectrum (cf. Cattell, 1966). Yet the systems are, in the most meaningful sense, the same. One ofthe purposes ofthis article is to sketch the equivalence of the two systems. In order to do so, it is necessary to show how nominal scales ("treatment," religion) can be used as "independent" variables in MR; the same is shown for "interactions." It is also necessary to demonstrate how multiple R* (and related statistics) can be computed from fixed-model AV and ACV output. Oncethe case is made for the theoretical equivalence of the two systems, the practical advantages of MR will be presented, which, given the foregoing, will be seen to constitute a very flexible general system for the analysis of data in the most frequently arising circumstance, namely, where an interval scaled or dichotomous (dependent) variable is to be "understood" in terms of other (independent) variables, however scaled. A word about originality. Most of the material which follows was "discovered" by the author, only to find, after some painstaking library research, that much ofit had been anticipated in published but not widely known works (chiefly Bottenberg & Ward, 1963; Li, 1964). Thus, no large claim for originality is being made, except for some of the heuristic concepts and their synthesis in a general dataanalytic system realized by means of MR. THE EQUIVALENCE OFTHESYSTEMS :NOMINAL SCALES AS INDEPENDENT VARIABLES IN MR Some of the apparent differences in MR and AC/ACV lie in their respective terminologies. The variable being analyzed (from AV and ACV) and the criterion variable (from MR) are the same, and will be called the dependent variable and symbolized as Y. The variables bearing on Y, variously called main effect, interaction, or covariate in AV and ACV (depending on their definition and design function), and predictor variables in MR will be called independent variables, and symbolized as Xi (i = 1, 2,- •-k). Each X< consumes one degree of freedom (df). In complex problems (e.g., factorial design,curvilinear analysis), it is convenient to define sets of the Xi, each such set representing a single research variable or factor. In the conventional use of MR, the X< are ordered quantitative variables, treated as equal interval scales. Thus, in a study of the prediction of freshman grade point average (Y), one might have Xi = verbal aptitude score, X% = quantitative aptitude score, Xs = percentile rank in high school graduating class, and X^ = Hollingshead socio-economic status index. Thus, k = 4, and the questionof sets need not arise (or, they may be thought of as four sets, each of a single variable). But what if one wanted to include religion among the X^ Or alternatively, if the entering class were to be assigned randomly to four different experimental teaching systems, how would experimental group assignment be represented? More generally, how does one accommodate a purely nominal or qualitative variable as an independent variable in MR? Imagine a simple situation in which a dependent variable Y is to be studied as a function of a nominal scale variable G, which has four "levels": groups Gi, G2, G3, and G*. For concreteness, 7 and Gmay be taken as having the following alternative meanings: Research Area Social Psychology Clinical Psychology Attitude toward United Nations Suggestibility TheGSet:Gi,G2,G3,G4 Religion: Protestant Catholic Jewish Other Diagnosis: Paranoid Schizophrenia Nonparanoid Schizophrenia Compulsive Neurosis Hysterical Neurosis 428 JACOB COHEN Physiological Retention Psychology Formally, what is being posited is the assignment, not necessarily equally, of each of n cases into (four) mutually exclusive and exhaustive groups, no matter whether G is an organismic, naturally occurring variable or one created by the experimenter's manipulative efforts on randomly assigned subjects. The expression of group membership as independent variables in MR can be accomplished in several ways, all equivalent in a sense to be later described. The intuitively simplest of these is "dummy" variable coding (Bottenberg & Ward, 1963; Suits, 1957). Dummy Variable Coding Table 1 presents various coding alternatives for the rendition of membership in one of four groups. Columns 1,2, and 3 represent a dummy variable coding scheme. It involves merely successively dichotomizing so that each of 3(= g— 1) of the 4( = g) groups is distinguished from the remainder as one aspect of G. For example, on X\ all subjects in Gi are scored 1and all others, without differentiation, are scored 0. Thus, this variable by itself carries only some of the information in the G variable as a whole,for example, Protestant versus all other, or Paranoid Schizophrenia versus all other. However, the three variables coded as in Columns 1, 2, and 3 together exhaust the information of the Gvariable. One might think that a fourth independent variable, one which distinguishes G* from all others, would be necessary, but such a variable wouldbe redundant. In the usual MR system which uses a constant term in the regression equation, it requires no more than g — 1 independent variables (no matter how coded) to represent g groups of a G nominal scale. A fourth Xi here is not only unnecessary, but its inclusion would result in indeterminacy in the computation of the MR constants. This is an instance of a more general demand on the set of independent variables in any MR system: no independent variable in the set may yield a multiple R with the remaining independent variables of 1.00. This constraint on the independent variables (in Treatment: Drug and Frontal Lesion Drug and Control Lesion No Drug and Frontal Lesion No Drug and Control Lesion matrix algebraic terms, the demand that their data matrix be nonsingular or of full rank) would be violated if we introduced a fourth variable, since, in that case, any of the four Xi would yield R = 1.00 when treated as a dependent variable regressed on the other three. In terms that are intuitively compelling, one can see that members of G< are identified uniquely on the Xi, Z2, X$ vector as 0, 0, 0, that is, as not Gi, not Gj, and not Ga, thus not requiring a fourth dichotomous Xi. G4 isnot being slighted; on the contrary, as will be shown below, it serves as a reference group. Any group may be designated for this role, but if one is functionally a control or reference group, so much the better.2 Before we turn to a consideration of Xi, Xz and X^ as a set of variables, let us consider them separately. Each can be correlated with the dependent variable Y. A set of artificial data was constructed to provide a concrete illustration. For n = 36 cases, a set of threedigit Y scores was written, the cases assigned to four groups and coded for Xi as described. The resulting product moment r's (pointbiserial) were ry\ = — .5863, rra = .0391, and rY3 — .4965. When squared, the resulting values indicate the proportion of the 7 variance each distinction accounts for: rVi = .3437, rV2 = .0015, and rVs = .2465. Thus, for example, the Protestant versus non-Protestant variable accounts for .3437 of the vari- ! It isofinterest to note that information aboutthe "omitted" group, here Gt (more generally, Go), is readily recovered. The value for the correlation of the dichotomy for that group with any variable Z (rzo) is a simple function of the r's of the other variables with Z (rz>) and the standard deviationsof the Xi, namely where : Population m^mi—m^m^m If the AVHo is true, then knowledgeof group membership and the use of group means leads to the same least squares prediction of the Y value of a given case as no knowledge, namely, the grand mean, thus one can account for none of the variance in Y by such knowledge,hence -RV.123 = 0, and conversely. A full MR analysis also yields the regression coefficients and constant for the regression equation: ?=J BiXi+BjX, + ---+5t Xi +4 [6] where Y is the least-squares estimated ("predicted") value of Y, the B\ are raw score partial regression coefficients attached to each X<, and A is the regression constant or F-intercept, MULTIPLE REGRESSION IN DATA ANALYSIS 431 that is, the estimated value of F when all Xi are set at zero. (Its computation is accomplished by including a "unit vector" with the Xi; seeDraper &Smith, 1967.) In any MR problem, a Bi coefficient gives the amount of the effect in Y expressed in F units which is yielded by a unit increase in Xi. But since as dummy variables the Xi are coded 0 — 1, a unit increase means 1, membership in the group, rather than 0, nonmembership in the group. Solving for the values of the general regression Equation 6 for the artificial data, and using dummy variables, we obtain: F =- 30.34Xi - .56X2 + 21.22X8 + 84.12 Since group membership is all-or-none, the Bi values give the net consequence of membership in G{relative to Gt for groups G\, G2, and G3. Thus, Fi = Pi =- 30.34(1) - .56(0) + 21.22(0) + 84.12 = 53.78 F2 = F2 = - 30.34(0) - .56(1) + 21.22(0) + 84.12 = 83.56 F3 = Y, = - 30.34(0) - .56(0) + 21.22(1) + 84.12 = 105.34 And G4 has not been slighted, since, substituting its scores on X\, X2, and Xa, we find: F4 = F4 = - 30.34(0) - .56(0) + 21.22(0) + 84.12 = 84.12. Thus, one can understand that "J34," the "missing" reference group's weight, is always zero, and that therefore Y* = A. The exact values of the Bi will vary, depending on which group is taken as the reference group (i.e., is coded 0,- ••, 0), but the differences among the Bjs will always be the same, since they are the same as the differences between the group F means. That is, whichever the reference group, the separation of the Bi's in the example will be the same as that among the values —30.34, — .56, +21.22 and 0. (For example, if GI is taken as the reference group, the new Bt are 0, 29.78, 51.56, and 30.33, and the regression constant A - Fi = 53.78.) Not only are the £,• meaningful, but also the multiple-partial correlations with the criterion, that is, the correlation of F with Xi, partialing out or holding constant all the other independent variables, which for the sake ofnotational simplicity, we designate pi. With dummy variable coded Xi, pi can be more specifically interpreted as the correlation between F and the dichotomy made up of membership in G,versus membership in Go,the reference group. The pi thus give, in correlational terms, the relevance to F of the distinction between each Gi and the reference group. Furthermore, the pt, Bi, and ft (the standardized partial regression coefficient) can be tested for significanceby means of t (or equivalently, F with numerator df = 1). Indeed, the null hypothesis is the same for all three,—the respective population parameter equals zero. But for a given X,-, if any one of the three is zero, all are zero, and the value of / is identical for all three tests. For the artificial data, the results are ft Pi -30.34 - .478 - .464 - 2.96 X, -.56 -.009 -.010 -.05 X3 +21.22 .334 .344 2.07 Thus, the Gi-Gi distinction and also the Gs-G4 distinction with regard to F are significant (two tailed .01 and .05, with 32 df) while the G2-G4 is not. These are identically the results one would obtain for t tests between the respective F means, using the within-group mean square (with 32 df) as the variance estimate. The reader, having been shown the MR-AV identities, may nevertheless react, "O.K., that's interesting, but so what?" Other than the provision of correlational (or regression) values, no advantage ofMR over AV isclaimed for this problem. But if there were other independent variables of interest (main effects, either nominal, ordinal, or interval; interactions; covariates; nonlinear components; etc., whether or not correlated with Gor each other), their addition to the G variable could proceed easily by means of MR, and not at all easily in an AV/ACV framework.This possibility is the single most important advantage of the MR procedure, and will receive further attention below. To summarize, dummy variable coding of nominal scale data yields the multiple R? and 432 JACOB COHEN F test (proportion of variance accounted for by group membership and an overall significance test) and the group F means, but also information on the degree of relevance to F of membership in any given group, G,-, relative to the remainder (ry,-), and to a referencegroup in terms of either regression weights (£,• or ft) or correlation (pi), as well as specific significance tests on the relevant null hypotheses. The importance of dummy variable (or other nominal scale) coding lies not so much in its use when only a single nominal scale constitutes the independent variables, but rather in its ready inclusion with other independent variables in MR. CONTRAST CODING Another system for representing nominal data can be thought of as contrast or "issues" coding. Here, each independent variable carries a contrast (in the AV/ACV sense) among group means. Each subject is characterized for each contrast according to the role he plays in it, which depends upon his group membership. With all contrasts so represented, the MR analysis can proceed. As an example, reconsider the representation of the G variable. We can contrast membership neither GI or G2 versus membership in either G3 or Gn. This could be substantively interpreted as, for example, majority versus minority religions, schizophrenic versus neurotic, or drug versus no-drug treatment condition. The coding or scoring of this issue may be rendered as in Column 4 in Table 1: the value 1 is assigned the subjects in Gi and G2 and the value —1 to those in Ga and G±, as is done in the computation of orthogonal contrasts in AV (e.g., Edwards, 1960). Actually, any two different numbers can be used to render this issue by itself, but there are advantages for some purposes in using values which sum to zero. The simple correlation between the dependent variable and this Xi is a point-biserial correlation (as were the dummy variable correlations) whose square gives directly the proportion of F variance attributable to the GI, G2 versus G3, Gt distinction. For the artificial data, the rVi = .2246 (TYI = — .4739). This is a meaningful value which gives the size of the relationship in the sample. This ryi can be tested for significance, and confidence limits for it (or for rVi) can be computed by conventional procedures. Other issues or contrasts can be rendered as independent variables. For example, a second issue which may be rendered is the effect on F of the Gi versus G2 distinction, ignoring G% and G4. A third issue may be the analogous Gs versus G4 distinction, ignoring Gi and G2. These are rendered, respectively, in Columns 5 and 6 in Table 1. Each yields an r and r2 with the criterion which is interpretable, testable for significance, and confidence boundable. Beyond the separate correlations of these three contrast variables, there is the further question of what their combined, effect is on F. We compute the .RV-m and F and obtain exactly the same values as when the arbitrary or dummy variable coding was used, .4458 and 8.580 (forthe artificial data). This follows from the fact that the three independent variables satisfy the nonsingularity condition, that is, no one of them gives a multiple R with the other two of unity. This is a necessary and sufficient condition for any coding of g — 1 independent variables to represent G (see next section). As before, the partial statistics, that is, the pi, Bi and ft and the common / test of their significance are also meaningful. If the independent variables all correlate zero with each other, the ft will equal their respective rn. That this must be the case can be seen from the fact that each rV« represents a different portion of the F variance whose sum is the multiple .RV.m and thus the relationship -RV.123 = Srr,-ft = 2>V, must hold. The X,as presented in Columns 4, 5, and 6 will be mutually uncorrelated if and only if the group sample sizes are equal. If they are not equal, the correlations among the Xi will be nonzero, which means that the contrasts or issues posed to the data are not independent. Such would be the case, in general, in the example if it were religion or diagnosis which formed the basis for group membership, and the actual natural population randomly sampled. Given unequal Hi for the four samples, although it is possible to make the three contrasts described above mutually uncorrelated, the coding of Columns 4, 5, and 6 does not do so. The scope of this article precludes discussion of the procedures whereby contrasts are coded so as to be uncorrelated. We note here merely that although it is MULTIPLE REGRESSION IN DATA ANALYSIS 433 always possible to do so, it is not necessarily desirable (see below). Since, in AV terms, the between-groups SS can be (orthogonally) partitioned in various ways, there are sets of contrasts other than the set above which can be represented in the coding. A particularly popular set is that automatically provided by the AV factorial design. If the four groups of this example are looked upon as occupying the cells of a 2 X 2 design (an interpretation to which the physiological example of drug versus no drug, frontal lesion versus control lesion particularly lends itself), each of the usual AV effects can be represented as Xi by the proper coding. The first is the same as before, and contrasts G\ and G2 with Gi and G^ for example, the drug-no-drug main effect, reproduced as Column 7 of Table 1. The second main effect, for example, frontalcontrol lesion, contrasts Gj. and Ga with G2 and G4 and is given by the coding in Column 8. This latter X^ gives ry?, the (point-biserial) r for (e.g.) site of lesion with the dependent variable (e.g.) retention, and rVa is the proportion of F variance accounted for by this variable. The remaining df is, as the AV has taught us, the interaction of the two main effects, for example, Drug-No-Drug X Frontal-Control Lesion. It can always be rendered as a multiplicative function of the two single df aspects of the main effects. Here, it is simply coded as the product of each group's "scores" on Xi and X2 (given as Column 9 in Table 1): 1X1 = 1, 1X-1=-1, -1X1=-1, and —1 X — 1 = 1. Rendering the interaction as X3, one can interpret it as carrying the information of that aspect of group membership which represents the joint (note, not additive) effect of the drug and frontal lesion conditions. Its (point-biserial) ryz is an expression in correlational terms of the degreeof relationship between Y and the joint operation of drug and lesion site. rVs gives the proportion of F variance accounted for by this joint effect. In the example, these three issues are conceptually independent, thus it would be desirable that the Xi be uncorrelated, that is, r12 = r13 = r23 = 0. The coding values given in Columns 7, 8, and 9 of Table 1 will satisfy this condition if (and only if) the sample sizes of the four cells are equal. (If not, other coding, not discussed here, would be necessary.) The conceptual independence of the issues arises from the consideration that they are both manipulated variables. When this is the case, it is clearly desirablefor them to be represented as mutually uncorrelated, since then the /3y,= rri and the .RV-m is simply a sum of the separate rV,. Thus, the total variance of F accounted for by group membership is unambiguously partitioned into the three separate sources. Further, the factorial AVF test values of each of the separate (one df) effects is identical with the t2 of the analogous MR partial coefficients (&, Bt, or pi). However, whether one wishes to represent the issues as uncorrelated depends on whether they are conceptually independent and the differing «,• are a consequence of animals randomly dying or test tubes being randomly dropped on the onehand, or whether they carry valid sampling information about a natural population state of affairs. Assume F is a measure of liberalism-conservatism and reconsider the problem with the groups reinterpreted as d: low education, lowincome (n\ = 160), G2: low education, high income (w2 = 20), Ga: high education, low income (w3 = 80), and G.»: high education, high income («4 = 100). These unequal and disproportional «,• carry valid sampling information about the univariate and bivariate distributions of education and income as defined here, the product moment fu (phi) between them (coded as in Columns 7 and 8) equalling .4714. They may also be correlated with their interaction. One would ordinarily not wish to render these effects as uncorrelated, since the resulting Xi would be quite artificial, but rather by the coding given in Columns 7, 8, and 9, where, again, X3 is simply the XiX* product. Note that whether the X< are correlated or uncorrelated, or whether the m are equal or unequal, all of these coding systems yield the same .RV.m and associated F. Two systems of rendering nominal scale (group membership) information into independent variables have been described: dummy variable coding and contrast coding. They result in identically the same multiple B? (and associated F) but different per independent variable partial statistics which are differently interpreted. Either involves expressing the nominal scale of g levels (groups) into g — 1 434 JACOB COHEN independent variables, each carrying a distinct aspect of group membership whose degree of association and statistical significance can be determined. Nonsense Coding It turns out, quite contraintuitively, that if one's purpose is merely to represent G so that its R*Y and/or its associated F test value can be determined, it hardly matters how one codes Xi, X2)- • •, Xe_i. Any real numbers, positive or negative, whole or fractional, can be used in the coding subject only to the nonsingularity constraint, that is, no Xi may have a multiple R of 1.00 with the other independent variables. Consider, for example, the values of Columns 10-12 of Table 1. The numbers for Xi in Column 10 were obtained by random entry into a random number table and their signs by coin flipping. Column 11 for X2 was constructed by squaring the entries in Column 10, and Column 12 for Xa by cubing them. Powering the X\ values assures the satisfaction of the nonsingularity constraint. Now, using these nonsense "scores" to code G and the same F values of the artificial example, we find the same UV-ias of .4458 with associated F = 8.580! Or, alternatively, the coding values of Columns 13, 14, and 15 were obtained by haphazard free association with a quick eyeball check to assure nonsingularity. They, too, yield #V.m = .4458 and F = 8.580. Why these, or any other values satisfying nonsingularity will "work" would require too much space to explain nontechnically. Ultimately, it is a generalization of the same principle which makes it possible to score a dichotomy with any two different values (not only the conventional 0 and 1) and obtain the same point-biserial r2 against another variable. Of course, the statistics per X*, that is, rYi, pi, Bi, fti, are as nonsensical as the X»-. But the regression equation will yield the correct group means on F, and, as noted, J?2 and its F remain invariant. Thus, with the aid of an MR computer program and a table of random numbers (or a nonsingular imagination), one can duplicate the yield of an AV. Apart from its status as a statistical curiosity, of what value is the demonstration that one can simulate an AV by means of an arbitrarily coded MR analysis? Not much, taken by itself. However, despite this disclaimer, it should be pointed out that for most investigators, the yield sought from the AV of such data is the significance status of the F test on the means, which the MR provides; the latter also "naturally" yields, in R", a statement of proportion of variance accounted for. True, this is identically available from the AVin if, but this is not generally understood and computed. The MR approach has the virtue of calling to the attention of the investigator the existence of a rho (relationship) value and its distinction from a tau (significance test) value (Cohen, 1965, pp. 101-106), an issue usually lost sight of in AV contexts (but, see Hays, 1963, pp. 325-333). But if it hardly matters how we score G and still get the same .RV.m and F ratio, we can score it in some meaningful way, one which provides analytically useful intermediate results, that is, by dummy variable or contrast coding. For other approaches to nominal scale coding, see Bottenberg and Ward (1963) and Jennings (1967). ASPECTS OF QUANTITATIVE SCALES AS INDEPENDENT VARIABLES As noted in the introduction, psychologists are familiar with the use of quantitative variables as independent variables in MR. This, indeed, is the only use of MR illustrated in the standard textbooks. Thus, given duration of first psychiatric hospitalization as the dependent variable Y, and as independent variables: age (Xi), Hollingshead SES Index (X2), and MMPI Schizophrenia (Sc) score (X3), the psychologist knows how to proceed. But MR provides opportunities for the analysis of quantitative independent variables which transcend this very limited approach. Curmlinecur Regression From the enlarged conceptual frameworkof the present treatment of MR, we would say that this analysis is concerned with the linear aspects of age, SES, and Sc. There are other functions or aspects of these variables which can be represented as independent variables. It has long been recognized that curvilinear relationships can be represented in linear MR by means of a polynomial form in powered terms. The standard Equation 6 Y = B,Xi + -B2X2 + • • -BkXk + A MULTIPLE REGRESSION IN DATA ANALYSIS 435 is linear in the X{. If the X,- are Xi = Z, X2 = Z2 , X3 = Z8 ,- •-Xk = Zk , the equation is still linear in the Xi, even though not linear in Z. The result of this strategem is that nonlinear regression of F on Z can nevertheless be represented within the linear multiple regression framework, the "multiplicity" being used to represent various aspects of nonlinearity, the quadratic, cubic, etc. The provision of any given power« ofZ, that is, Z" allows for u — 1 bends hi the regression curve of F or Z. Thus Z1 or Z provides for 1 —1 = 0 bends, hence a straight line, Z2 providesfor 2- 1= 1 bend, Z3 for 2 bends, etc. In most psychological research, provision for more than one or two bends will rarely be necessary. It is the same strategem of polynomial representation further refined to make these aspects orthogonal to each other, which is utilized in the AV, also a linear model, in trend analysis designs. A note of caution must be injected here. Such variables as Z, Z2 , and Z8 are in general correlated, indeed, for score-like data, usually highly so. Table 2 presents some illustrative data. In this example, the correlations are .9479, .8840, and .9846. For reasons ofordinary scientific parsimony, unless one is working with a strong hypothesis, we normally think of them as a hierarchy: how much Y variance does Z account for? (.5834) If Z2 is added to Z as a second variable, how much do both together account for? (.5949) The difference represents the increment in variance accounted for by making allowance for quadratic (parabolic) curvature. In the example, it is a very small amount,—.0115. If to Z and Z2 we add Z3 , the multiple J?2 r.i23 becomes .5956, an increment over .R2 r.i2 of only .0007. Each of these separate increments, or the two combined can be tested TABLE 2 ILLUSTRATIVE DATA ON POLYNOMIAL MULTIPLE REGRESSION Variable 7 1— v ^(=* -X-i)yo / V \ Zi (— A 2^ z» (= X3) Correlations (r) Y .7638 .7582 .7268 Z .9479 .8840 Z» .9846 Cumu- lative .5834 .5949 .5956 Incre- ment .5834 .0115 .0007 „ .1399 -.0116 .0419 for significance. In general,any increment to an R*Y.A due to the addition of B can be tested by the F ratio: (1 - R*Y.A,B)/(n - a - b -1) with df = b and (n — a — b — 1), where [V] R2 y-A,B is the incremented R> based on a + b independent variables, that is, predicted from the combined sets of A and B variables, P?Y.A is the smaller P? based on only a independent variables, that is, predicted from only the A set, a and b are the number of original (a) and added (b) independent variables, hence the number of df each "takes up." This F test of an increment to R* is much more general in its applicability than the present narrow context, and its symbols have been accordingly given quite general interpretation. It is used several times later in the exposition, in other circumstances where, because of correlation among X,-, it provides a basis for judging how much a set of independent variables contributes additionally to Y variance accounting. Since what is added is independent of what is already provided for, this is a general device for partitioning JR2 into orthogonal portions. Since the size of such portions depends on the order in which sets are included, the hierarchy of sets is an important part of the investigator's hypothesis statement. The generality of Formula 7 is further seen in that Formula 4 is actually a special case of Formula 7, where R*Y.A is zero because no Xi are used (hence a = 0) and B?Y-B is the R* based on b (= k) df which is being tested, that is, an increment of J?2 from zero. Either set may have oneor more independent variables. Thus, to test the increment ofZ2 to Z alone, assuming total n = 36, F=- (.5949-.5384)/l .0115 : .4051/33 = .934 with df = 1 and 33 (a chance departure). To test the pooled addition of both Z2 and Z8 to 436 JACOB COHEN Z, (.5956-.5834)/2 : (l-.5956)/(36-l-2- 1) .0122/2 \4044/32° = .483 with df = 2 and 32 (also a chance result). The need for caution arises in that if one studies the results of the regression analysis which uses Z, Z2 and Z8 , where the solutionof the partial (regression or correlation) coefficients is simultaneous, not successive, the three variables are treated quite democratically. Each is partialed from the others without favor or hierarchy. Since such variables are highly correlated, when one partials Z2 and Z3 from Z, one is robbing Z of F variance which we think of as rightfully belonging to it. Table 2 gives the pi of the three predictors when one treats them as a set. The values are smaller (reflecting the mutual partialing), and may be negative (reflecting "suppression" effects). Because the pi are so small, they may well be nonsignificant (as they are here), even though TYZ is significant and any of the other variables may yield a significant increment. Thus, the significance interpretation of the regression of a set of polynomial terms simultaneously may be quite misleading when the usual hierarchical notions prevail. On the other hand, if the analyst's purpose to portray a polynomial regression fit to an observed set of data, he can solve for the set simultaneously and use the resulting MR equation. For the data used for Table 2, the regression Equation 6 is: F = 11.70Xi - .50X2 + .25X3 + 55.90 the values being the B{ regression coefficients and constant, and the Xt successively Z, Z2 , and Z3 . One can substitute over the range of interest of Z and obtain fitted values of F for purposes of prediction or of graphing of the function. There are other means whereby curvilinear relationships can be handled in an MR framework. Briefly, one can organize an independent variable Z into g class intervals (ordinarily, but not necessarily equal in range) and treat the resulting classes as groups, coding them by the dummy variable technique described above. This results in g — 1 independent variables, each a segment of the Z range. The resulting R'Y-Q is the amount of F variance accounted for by Z (curvilinearly, if such is the case) and the F means for the g intervals, computable from the resulting raw score regression equation, can be plotted graphically against the midpoints of the class intervals of Z to portray the func- tion. A more elegant method is the transformation (coding) of the Z values to orthogonal polynomials. This has the advantages in that the resulting Xt terms representing linear, quadratic, cubic, etc., components of the polynomial regression are uncorrelated with each other; thus each contributes a separate portion of the F variance capable of being tested for significance. Unfortunately, this method becomes computationally quite cumbersome unless the Z values are equally spaced and with equal w,- per interval. The latter is the usual case when Z is an experimentally manipulated variable, where the standard trend analysis designs of the AV can be used (Edwards, 1960). Finally, although the first few powers of a polynomial is a good general fitting function, in some circumstances, such transformations of Z as logZ, 1/Z, or Zs may provide a better fit. Draper and Smith (1967) provide a useful general reference for handling curvilinearity (and other MR problems). Joint Aspects of Interactions Given two independent variables, Xi = Z and Xi = W, one may be interested in not only their separate effects on F, but also on their joint effect, over and above their separate effects. As noted above (Contrast Coding), where this was discussed in the narrow context of a 2 X 2 design, this joint effect is carried by a third independent variable, a score defined for each subject by the product of his Z and W scores, that is,Xz = ZW. This variable contains this joint effect, which is identically the (first-order) interaction effect of AV, or the "moderator" effect of Saunders (1956). This identity is quite general, so that a triple interaction is carried by a triple product, say ZWV, etc. Furthermore, the above are all interactions or joint effects of linear aspects of the variables. The more complex interactions of nonlinear aspects, such as the linear by quad- MULTIPLE REGRESSION IN DATA ANALYSIS 437 ratic, or quadratic by cubic, made familiar by advanced treatments of AV trend analysis (Winer, 1962, pp. 273-278), would be represented by products of powered variables, for example, ZW1 , ZW3 , each a single independent variable. The presentation of joint effects as simple products in MR requires the same caution as in the polynomial representation of a single variable. (Indeed, a powered variable can be properly understood as a special case of an interaction, for example, Z2 contains the Z by Z interaction.) If one uses simultaneously as independent variables X\ = Z, Xz = W, X3 = ZW, the correlations of Z with ZW, and W with ZW will ordinarily not be zero, may indeed be large, and the partial coefficients for Z and W (ft, B, p) will have lost to ZW some F variance which properly is theirs (just as Z would be robbed of some of its F variance by Z2 and Z3 ). The problem is solved as in the polynomial regression analysis: Find JRV.iss, the variance proportion accounted for by all three variables ;then find JfJV.iz, the amount accounted for without the interaction. The increment is tested for significance by the F ratio of Formula 7. This, too, generalizes. In more complex systems, involving either more variables and higher order interactions or interactions among polynomial aspects (or both), one forms a hierarchy of sets of independent variables and tests for the significance of increments to J22 by means of the same F ratio (Formula 7). For example, if one has three variables Z, W, and V, represented both linearly and quadratically with all their interactions, one possible way of organizing the variables is by means of the following sets: A:Z,W,V B:ZW,ZV, WV C:ZWV D:Z\ W\ V2 E: ZW, ZW\ ZW, ZV\ WW, WV1 One would then test K>Y.AB — R*Y.A, R — R2 Y-AB, etc., each by the F ratio for increments. When a set containing more than one variable is significant, one can "break out" each variable in it and test its increment for significance by the same procedure. Of course, one can elect to make all sets contain only one variable, but the number of resulting tests (in the example there would be 20) brings with it an increased risk of spuriously significant results over the complete analysis. This strategy parallels that of the AV, where the avoidance of this risk is implicit. In a 4 X 5 factorial design AV, for example, the interaction involves a single mean square based on 3 X 4 = 12 df which is tested by a single F test. One ordinarily does not test each of these 12 effects separately unless the set as a whole is significant. The principle, of course, obtains even for the main effects, involving sets of 3 and 4 df, where each set normally is tested "whole- sale." Other combinations and priorities of the X, Y, and Z variables are, of course, possible. This operation involves formulating hypotheses about what constitutes a relevant class of independent variables and the priorities of these classes. It depends not only on mechanical variance-stealing considerations, but also on substantive issues in the research and the judgment of the investigator. Although the discussion in this section has been concerned with interactions among quantitative variables, the principles of forming interaction variables hold also for nominal variables, and for mixtures of variables. Let an "aspect" of a research variable such as religion or IQ be one of the Xt- of the set which represent it. Then, for example, if the interaction of u aspects of one variable U and t> aspects of another variable V are desired, one may form a total of uv interaction Xt, by multiplying each of the « aspects by each of the v aspects. Each of the resulting uv independent variables is a single (one df) variable which represents a specific aspect by aspect joint or interaction effect. Either U or V may be nominal or quantitative. Where nominal, their aspects may be dummy variables or contrasts; where quantitative, the aspects may be powered polynomial terms or missing data dichotomies (see below). One can thus generate such single interaction X,- as "majority-minority religious group by authoritarianism," "experimental group D versus control group by 438 JACOB COHEN quadratic of stimulus intensity," etc. It is both convenient and enlightening to have each such joint aspect separately and unambiguously (but not necessarily orthogonally) represented in the set of independent variables. Their individual increment to B? and significance can then be determined. Perhaps as important as being able to represent the interaction Xf in specific detail is the availability of the option not to represent some or all of them. The textbook paradigms for factorial design AV lead data analysts to dutifully harvest all possible interactions of all possible orders up to the highest, whether or not they are meaningful or interpretable or, if interpretable, communicable. There emanate from psychology departments many silent prayers to the spirit of R. A. Fisher that highorder interactions will not prove significant! Obviously, one need not (indeed cannot) analyze for all possible aspects including joint aspects of variables if for no other reason than the rapid loss of df for estimating error. The need to "specify the model," that is, the set of Xt to be studied in MR has the salutary effect of requiring an incisive prior conceptual analysis of the research problem. This goes hand in hand with the flexibility of the MR system, which makes readily possible the representation of the research issues posed by the investigator (i.e., multiple regression in the service of the ego!), rather than the canned issuesmandated by AVcomputational routines. Missing Data In nonexperimental, particularly survey, research, it frequently occurs that some subjects are missing data on one or more (but not all) of the independent variables under study. Typically, the data are not missing randomly, but for reasons frequently related to values for other independent variables, and particularly to values for the dependent variable under study. For example, in a study of factors associated with the rehabilitation of drug addicts, reported weekly wages on last job is used as an independent variable, among others. Some respondents claim they do not recall or refuse to respond. As another example, consider a retrospective study of the school records of adult mental retardates where the recorded IQ is abstracted for use as an independent variable but found missing in some cases. In neitherof these cases can one prudently assume that the mean of these cases on the X, in question, other Xi, and, particularly, Y is the same as that for the cases with data present. The practice of excluding cases lacking some of the data has the undesirable properties of analyzing a residual sample which is unrepresentative to an unknown degree of the population originally sampled, as well as the loss of information (viz., the fact of data being missing) which may be criterion relevant. MR provides a simple method for coping with this problem. Each such variable has two aspects, its value (where present) and whether or not the value is present. Accordingly, two independent variables are constructed: Xi is the value itself, with the mean of X\ for those cases where it is present entered for the cases where it is missing, and Xz is the missing data aspect, a dummy variable dichotomy coded 0-1 for absent-present. These two aspects contain all the information available in the variable. Moreover, as scored, ri2 = 0, hence Xi and Xi are each contributing an independent portion of the Y variance. Actually, any value entered for the missing data in X\ will "work" in the sense ofaccounting for Y variance, that is, the J?V-i2 will be the same. The use of the mean will uniquely result in r\i = 0, which may be advantageous interpretively. For some purposes, this advantage may be offset by using some (or any) other value, obviating the necessity of a prior computation of the mean. The researcher, normally sensitive about tampering with data, may find the prospect of "plugging" empty spaces in his data sheet with means singularly unappealing. He may even correctly point out that this will have the effect of reducing rYi from what ry% is for the subsample having X values present. In rebuttal, it must be pointed out that the subsample is not representative of the originally defined population, and the method proposed can be thought of as reflecting the fact that the population studied contains missing data, and fully incorporates this fact as positive information. ANALYSIS OF COVAEIANCE Viewed from the perspective of the MR system, thefixed-modelACVturns out to be a MULTIPLE REGRESSION IN DATA ANALYSIS 439 rather minor wrinkle, and not the imposing parallel edifice it constitutes in the AV/ACV framework. A covariate is, after all, nothing but an independent variable, which, because of the logic dictated by the substantive issues of the research, assumes priority among the set of independent variables as a basis for accounting for F variance. Consider a research in educational psychology in which the F variable is some performancemeasure in children, Xi is midparental education, X2 is family income, and G, carried by the set Xz, Xt, X6 represents some differential learning experience in four intact classes. This situation is a "natural" for ACV (assuming its assumptions are reasonably well met). Onewould think of it as studying the effect of learning experience or class membership on F, using Xi and Xz as covariates. Thus considered, we are asking how much variance in F (and its significance) the variables Xs, Xi, and Xt account for, after the variance due to Xi and X2 is allowed for, or held constant, or "partialed out" (the terms being equivalent). The form of the MR analysis to accomplish this purpose is directly suggested. Find UV.12346, the proportion of F variance all independent variables account for. Then find R2 Y-ii, the proportion of F variance attributable to the covariates education and income. Their difference is the increment due to group membership, which is tested for significance by the F test of Formula 7 used in a different design context above. Note that no problem arises if the four groups are defined by a 2 X 2 factorial. If X3, X4, X& are coded as in Columns 7, 8, 9 in Table 1 to represent the two main effects and their interaction, the respective ACV significance tests are performed by (Formula 7) F ratio tests of the increments •RV-12346 — ^2^.1245 (for the main effect represented by X3), P*Y.mu —K*Y.IM> (for the main effect represented by X\) and rVi2846 — rVm4 (for the interaction or joint effect). Note that Xi and X2 are always included hi the debited R2 , because of their priority in the issues as defined. This principle is readily generalized to designs of greater complexity. That a covariate is nothing but another independent variable except for priority due to substantive considerations is evident when one considers a study formally almost identical to the above, now, however, done by a social psychologist. Since there are four different classes and four different teachers, the classes ipso facto have had different learning experiences. But this research is concerned with the effects of parental education and income on the performance criterion, with group membership now the contaminant which must be removed, hence the covariate. Using the same set up and data, he would find .RV.12846 —R^y.ut as the combined effect of education and income, •RV.12845 — -RV.2846 as the net effect of education (i.e., over and above that ofincome aswell as the covariates of class membership), and •RV.12845 — R*Y-IW for the net effect of income, each F-testable as before. Thus, one man's main effect is another man's covariate. The MR approach to ACV-like problems opens up possibilities for statistical control not dreamed of in ACV. We have just seen how purely nominal or qualitative variables (class membership) can serve as covariates. Beyond this, we can apply other principles which have been adduced above: (a) Any aspects of data can, by appropriate means, be represented as independent variables. (&) Any (sets of) independent variables can serve as covariates by priority assignment in variance accounting. Thus, for example, one can make provision for a covariate being nonlinearly related to F (and/ or to other independent variables) by writing a polynomial set of independent variables and giving the set priority; or, one can carry two variables and their interaction as a covariate set; or, one can even carry as a covariate a variable for which there are missing data by representing the two aspects of such a datum as two independent variables and giving them priority. Finally, one can combine the priority principle with those of contrast coding to achieve analytic modes of high fidelity to substantive research aims. The ACV assumption that the regression lines (more generally, surfaces) of the covariate ([/) on F have the same slopes (more generally, regression parameters) between groups (F) is equivalent to the hypothesis of no significance for the set of wo interaction independent variables. This hypothesis can beF-tested as a Set B following the inclusion of U as Set A, using Formula 7. 440 JACOB COHEN DISCUSSION In the introduction it was argued that MR and AV/ACV are essentially identical systems, and so they are, at least in their theory. In the actual practice of the data-analytic art, many differences emerge, differences which generally favor the MR system as outlined above. Before turning to these differences, a closer look at their similarity in regard to statistical assumptions is warranted. This article has concerned itself only with the fixed-model AV/ACV, wherein it is assumed that inference to the population about the independent variables is for just those variables represented (and not those variables considered as samples) and that values on these variables are measured without error. This means that in a MR whose set of X{ include quantitative variates (e.g., scores), the population to which one generalizes, strictly speaking, is made up of cases having just those X{ values, only the F values for any given combinations of values for the Xi varying; moreover, the F distribution (and only this distribution) is assumed normal and of equal variance for all the observed combinations of Xi values. These seem, indeed, to be a constraining set of assumptions. However, the practical effect on the validity of the generalizations which one might wish to draw is likely to be vanishingly small. It seemslikely that the substantive generalizations made strictly for the particular vectors of Xi values in the rows of the basic data matrix of the sample would hold for the slightly differing values which the population would contain if the sampling is random. As for the normality and variance homogeneity assumptions for F, the robustness of the F test under conditions of such assumption failure is well attested to (for a summary, see Cohen, 1965, pp. 114-116). Particularly when reasonably large samples are used, itself desirable to assure adequate statistical power, no special inhibition need surround the drawing of inferences from the usual hypothesis testing, certainly no more so than in AV. A discussion of the practical differences between MR and AV is best begun with a consideration of the nature of classical fixed-model AV. Its natural use is in the analysis of data generated by experimental manipulations along one or more dimensions (main effects), resulting in subgroups of observations in multifactor cells, treatment combinations. Each main effect is paradigmatically a set of qualitative distinctions along some dimension. These dimensions are conceptually independent of each other, and since they are under the control of the designer of the experiment, the data can, in principle, be gathered in such a way that the dimensions are actually mutually orthogonal in their representation in the data. (This condition is met by the proportionality of cell frequencies in all two dimensional subtables.) This also results in interactions being orthogonal to each other and to the main effects. Thus, the paradigm is of a set of batches (one batch per AV main effect or interaction) of qualitative independent variables, all batches mutually orthogonal. Now, under such conditions, one can, as illustrated above, analyze the data by MR, but there is no advantage in so doing. The AV can be seen as a computational shortcut to an analysis by the linear model which analyzes by batches and capitalizes on the fact that batches are orthogonal. Thus, the classical fixed factorial AV is a special simplified case of MR analysis particularly suited to neat experimental layouts, where qualitative treatments are manipulated in appropriate orthogonal relationships. Later refinements allows for quantitative independent variables being exploited by trend analysis designs, but these, too, demand manipulative control in the form of equally spaced intervals in the dimension and equal sized samples per level if the computational simplicity is to be retained. These designs are quite attractive, not only in their efficiency and relative computational simplicity, but also in the conceptual power they introduced to the data analyst, for example, interactions, trend components. They were presented in excellent applied statistics textbooks. Inevitably, they attracted investigators working in quite different modes, who proceeded to a Procrustean imposition of such designs on their research. A simple example (not too much of a caricature) may help illustrate the point. Dr. Doe is investigating the effects of Authoritarianism (California F scale) and IQ on a cognitive style score (F), using high school students as subjects. He is particularly interested in the F X IQ interaction, that is, in the possibility MULTIPLE REGRESSION IN DATA ANALYSIS 441 that fYF differs as a function of IQ level. He gives the three tests, and proceeds to set up the data for analysis. He dichotomizes the F and IQ distributions as closely as possible to their medians into high-low groups and proceeds to assign the Y scores into the four cells of the resulting 2 X 2 fixed factorial design. He then discovers that the number of cases in the highlow and low-high cells distinctly exceed those in the other two, an expression of the fact that F and IQ are correlated. He must somehow cope with this disproportionality (nonorthogonality). He may (a) throw out cases randomly to achieve proportionality or equality; (6) use an "unweighted means" or other approximate solution (Snedecor, 1956, pp. 385- 387); or (c) "fit constants by least squares" (Snedecor, 1956, pp. 388-391; Winer, 1962, pp. 224-227), which is, incidentally, an MR procedure. Clearly, this is a far cry from experimentally manipulated qualitative variables. These are, in fact, naturally varying correlated quantitative variables. This analysis does violence to the problem in one or both of the following ways: 1. By reducing the F scale and IQ to dichotomies, it has taken reliable variables which provide graduated distinctions between subjects over a wide range, and reduced them to two-point (high-low) scales, squandering much information in the process. For example, assuming bivariate normality, when a variable is so dichotomized, there is a reduction in rVx, the criterion variance it accounts for, and hence in the value of F in the test of its significance, of 36%. This wilful degradation of available measurement information has a direct consequence in the loss of statistical power (Cohen, 1965, pp. 95-101,118). 2. The throwing out of cases to achieve proportionality clearly reduces power, but, even worse, distorts the situation by analyzing as if IQ and F scale score were independent, when they are not. Other approximations suffer from these and/or other statistical deficiencies or distortions. If Dr. Doe uses the MR-equivalent exactfitting constants procedure, he has still given up computational simplicity, and, of course, the measurement information due to dichotomization. If he seeks to reduce the latter and also allow for the possibility of nonlinearity of Y on X{ regressions by breakdown ofIQ and/or F scale into smaller segments, say quartiles, his needs for equality of intervals and cases will be frustrated, and he will not be able to find a computational paradigm, which, in any case, would be very complicated. It seems quite clear that, however considered, the conventional AVmode is the wrongway to analyze the data. On the other hand, the data can be completely, powerfully, and relevantly analyzed by MR. Asimple analysis would involve setting Xi = IQ, X2 = F scale score, X3 = (IQ)(F). By finding .RV.m — R*Y-U and testing it for significance (or equivalently, by testing the significance of pi), he learns how much the interaction contributes to Y variance accounting and its significance. Determinations of the values of rVi, rVz, l&r.it, -RVia —^ri, and -RV.12 — r*Yt and testing each for significance fully exploits the information in the data at this level. If he believes it warranted, he can add polynomial terms for IQ and F score and their interaction in order to provide for nonlinearity of any of the relationships involved. Another practical difference between MR and AC/ACV is with regard to computation. The MR procedure, in general, requires the computation and inversion of the matrix of correlations (or sums of squares and products) among the independent variables, a considerable amount of computation for even a modest number ofindependent variables. It is true that classical AV, whose main effects, interactions, polynomial trend components, etc., are mutually orthogonal, capitalizes on this orthogonality to substantially reduce the computation required. Whatever computational reduction there is in AV or MR depends directly on the orthogonality of the independent variables, which we have seen is restricted to manipulative experiments, and is by no means an invariant feature even of such experiments. However, given the widespread availability of electronic computer facilities, the issue of the amount of computation required in the analysis of data from psychological research dwindles to the vanishing point, and is replaced by problems of programming. The typical statistical user of a typical computer facility 442 JACOB COHEN requires that a computer program which will analyze his data be available in the program library. Such programs will have been either prepared or adapted for the particular computer configuration of that facility. Unfortunately, it is frequently the case that the available AV program or programs will not analyze the particular fixed AV design which the investigator brings. Some AVprograms are wanting in capacity in number of factors or levels per factor, somewill handle only orthogonal designs, some will handle only equal cases per cell, some will do AV but not ACV, some of those that do handle ACV can handle only one or two covariates. Many will not handle special forms of AV, for example, Latin squares. On the other hand, even the most poorly programmed scientific computer facility will have at least one good MR program, if for no other reason than its wide use in various technologies, particularly engineering. All the standard statistical program packages contain at least one MR program. Although these vary in convenience, efficiency, and degree of informativeness of output, all of them can be used to accomplish the analyses discussed in this article. In contrast to the constraints of AV programs, the very general MR program can be particularized for any given design by representing (coding) those aspects of the independent variables of interest to the investigator according to the principles which have been described. A note of caution: as we have seen, given even a few factors (main effects of nominal variables or linear aspects of quantitative variables), one can generate very large numbersof distinct independent variables (interactions of any order, polynomials, interactions of polynomials, etc.). The temptation to represent many such features of the data in an analysis must be resisted for sound research-philosophical and statistical reasons. Even in researches using a relatively large number of subjects (ri), a small number of factors (nominal and quantitative scales) can generate a number of independent variables which exceed n. Each esoteric issue posed to the data costs a df which is lost from the error estimate, thus enfeebling the statistical power of the analysis, This, ultimately, is the reason that it is desirable in research that is to lead to conclusions to state hypotheses which are relatively few in number. This formulation is not intended to indict exploratory studies, which may be invaluable, but by definition, such studies do not result in conclusions, but in hypotheses,which then need to be tested (or, depending on the research context, cross-validated). If one analyzes the data of a research involving 100 subjects by means of MR, and utilizes 40 independent variables, what does one conclude about the 4 or 5 of them which prove to have partial regression weights "significant" at the .05 level? Certainly not that all of them are real effects, when one realizes that an overall null hypothesis leads to an expectation that 5% of 40, or 2 are expected by change. But which two? A reasonable strategy depends upon organizing a hierarchy of sets of independent variables, ordered, by sets, according to a priori judgments. Set A represents the independent variables which the investigator most expects to be relevant to F (perhaps all or some of the main effects and/or linear aspects of continuous variables). These may be thought of as the hypotheses of the research, and the fewer the better. Set B consists of next order possibilities (perhaps lower order interactions and/or some quadratic aspects). These are variables which are to be viewed less as hypotheses and more as exploratory issues. If there is a Set C (perhaps some higher order interactions and/or higher degree polynomials), it should be thought of as unqualifiedly exploratory. (If there are covariates in the design, they, of course, take precedence over all these sets, and would enter first.) The "perhaps" in the parenthetical phrases in this paragraph are included because it is not a mechanical ordering that is intended. In any given research, a central issue may be carried by an interaction or polynomial aspect while some main effect may be quite secondary. In most research, however, it is the simplest aspects of factors which are most likely to occupy the focus of the investigator's attention. However, the decision as to what constitutes an appropriate set depends on both research-strategic issues that go to the heart of the substantive nature of the research, and subtle statistical issues beyond the scope of MULTIPLE REGRESSION IN DATA ANALYSIS 443 this article. The latter are discussed byMiller (1966, pp. 30-35). The independent variables so organized,one first does an MR analysis for Set A, then Sets A + B, then Sets A-f B + C.Each additional set is tested for the increment to R* by means of the F test of Formula 7. A prudent procedure would then be to test for significance the contribution of any single independentvariable in a set only if the set yields a significant increment to R2 . A riskier procedure would be to dispense with the latter condition,but then the results would clearly requirecross-validation. REFERENCES BOTTENBERG, R. A., & WARD, J. H., JR. Applied multiple linear regression. (PRL-TDR-63-6) Lackland AF Base, Texas, 1963. CATTELL, R. B. Psychological theory and scientific method. In R. B. Cattell (Ed.), Handbook of multivariate experimental psychology. Chicago: Rand McNally, 1966. COHEN, J. Some statistical issues in psychological research. In B. B. Wolman (Ed.), Handbook of clinical psychology. New York: McGraw-Hill, 1965. CUEETON, E. E. On correlation coefficients. Psychometrika, 1966, 31, 605-607. DRAPER, N., & SMITH, H. Applied regression analysis. New York: Wiley, 1967. EDWARDS, A. E. Experimental design in psychological research. (Rev. ed.) New York: Rinehart, 1960. HAYS, W. L. Statisticsfor psychologists.^^ York: Holt, Rinehart & Winston, 1963. JENNINGS, E. Fixed effects analysis of variance by regression analysis. Multivariate Behavioral Research, 1967, 2, 95-108. Li, J. C. R. Statistical inference. Vol. 2. The multiple regression and its ramifications. Ann Arbor, Mich.: Edwards Bros., 1964. McNEMAR, Q. Psychological statistics. (3rd ed.) New York: Wiley, 1962. MILLER, R. G., JR. Simultaneous statistical inference. New York: McGraw-Hill, 1966. PETERS, C. C., & VANVOORHIS, W. R. Statistical procedures and their mathematical bases. New York: McGraw-Hill, 1940. SAUNDERS, D. R. Moderator variables in prediction. Educational and Psychological Measurement, 1956,16, 209-222. SNEDECOR, G. W. Statistical methods. (5th ed.) Ames: Iowa State College Press, 1956. SUITS, D. B. Use of dummy variables in regression equations. Journal of the American Statistical Association, 1957, 52, 548-551. WINER, B. J. Statistical principles in experimental design. New York: McGraw-Hill, 1962. (Received November 13, 1967)