Statistics in Political Science By H. T. Reynolds 1 Introduction Political science is universally recognized as a quantitative discipline that uses statistical and mathematical techniques extensively [1] . The widespread application of quantitative techniques, however, is a relatively recent phenomenon. Indeed, apart from a few notable exceptions, statistical analyses do not appear in the literature much before 1950 and did not become common until the following decade. Ironically, this statement remains true even in view of the fact that the science of statistics itself originated in the study of politics [23] . As might be expected, the initial works were rather rudimentary, relying for the most part on descriptive methods rather than classical inference. The situation had changed dramatically by the 1960s, when most subfields of political science were widely using statistical procedures with considerable sophistication. Although political scientists made few original contributions to formal statistics and borrowed heavily from related fields such as sociology and economics (See Uncertainty in Economics), they nevertheless produced innumerable novel and fruitful applications of a vast array of quantitative methods. Indeed, the topic is so integral to the profession that its study has become virtually a subfield in its own right. Statistics, it is safe to say, is the main analytic tool employed in political science. Yet its coming was not altogether peaceful. The rapid rise of quantification sparked considerable controversy. For years, particularly between 1955 and 1970, the discipline found itself divided into two camps, behavioral and traditional. Among other things, behavioralists emphasized systematic observation, rigorous and precise concept formation, empirical verification of proposition, and theory building. Statistics seemed to be ideal for accomplishing these ends, especially since voluminous census and survey data were becoming available. Statistics also seemed appropriate for dealing with the indeterminacies that seem to characterize much of human behavior. On the other hand, critics charged that quantification often leads to gross oversimplifications and that the subtle but essential vagaries of politics get lost too easily in mathematical models. Although the behavioralists’ findings may be statistically correct, they are usually trivial or irrelevant or both. Furthermore, the traditionalists despaired that so much time was spent on “technique” that the equally important task of judgment was ignored. This debate has now become quieter, as each side concedes, they point to the other without entirely abandoning its original position. Political scientists generally see the advantages but also the very real limitations of statistical analysis. Despite the considerable progress in understanding made possible by statistics, the mysteries of political phenomena seem as deep as ever. c republished in Wiley StatsRef: Statistics Reference Online, 2014. Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. 1 This article was originally published online in 2013 in Encyclopedia of Statistical Sciences, John Wiley & Sons, Inc. and Statistics in Political Science 2 Statistical Analysis in Political Research The role that statistics plays is perhaps nowhere better illustrated than in the area of public opinion and elections (See Election Projections). To be sure, the methods are found in virtually all subfields: public administration, policy analysis, international relations, legislative and judicial behavior, and comparative politics. But in the study of attitudes and voting, one of the first topics to be approached in this fashion, one sees the diversity and sophistication with which statistics is applied to political science. One can also appreciate some of the achievements and failures, the hopes and disappointments, and the future prospects of a humanistic discipline trying to become scientific. The earliest attempts to study voting and public opinion, like so many other parts of political science, relied heavily on intuitition, subjective observation, and untested generalizations. There were, of course, path breakers in the 1930s and 1940s whose work encompassed multivariate methods. Gosnell and Gill [11] , for example, studied voting patterns in Chicago with the help of simple and partial correlation coefficients and a new “method called multiple factor analysis, which is now being perfected by Professor L. L. Thurstone” (p. 979). Similarly, V. O. Key’s seminal work, Southern Politics [16] , gave political scientists fresh new ideas about how to approach public opinion data. What really encouraged the development of statistical applications, however, was the advent of the sample survey, a technique that soon produced an abundance of quantifiable data. The first uses of this information were rather elementary. Most emphasis fell on descriptive statistics and frequency distributions. The early voting studies, for example, relied primarily on complex bar graphs rather than, say, regression coefficients to present multivariate results. (See among others Lazarsfeld et al. [17] , Berelson et al. [2] , and Campbell et al. [4] .) Then, and even to a large extent now, there was little classical hypothesis testing. Except for the commonly calculated goodness-of-fit chi-square statistic, used to test for an association between two variables, political analysts treated hypothesis testing as secondary to parameter estimation. Indeed, they were taught early on that tests such as the chi-square statistic could be misleading since large samples, which were increasingly prevalent, inflated the numeric values of the results. Consequently, political scientists have not spent much effort on test theory. One seldom sees simultaneous inference or Bayesian techniques in political analysis. Furthermore, there is relatively little experimentation and hence only modest use of experimental designs and analysis of variance methods. No, instead of stressing test or decision theory political scientists in the early period turned to measures of central tendency, variation, frequency distributions, cross-classifications, and especially measures of association between two variables. When truly quantitative (interval-level) data were available, they calculated the Pearsonian (product-moment) correlation coefficient; when they had only categorical data, they calculated nominal and ordinal measures of association developed by among others Goodman and Kruskal [10] , Kendall [15] , and Stuart [26] . These indices showed how strongly two variables—party affiliation and occupation, for example—were related. Knowing the strength of an association was considered more important than knowing only that the variables had a statistically significant relationship. Soon, regression analysis became popular because it allowed one to study a single dependent variable (e.g., whether people voted for a Democratic or Republican candidate) as a function of a set of independent variables (e.g., partisanship, socioeconomic status, and attitudes). But even simple one-equation regression was not entirely satisfactory because there was also an interest in explaining the relationships among the predictors. Partly for these reasons, political scientists in the early 1960s began to turn to causal and path analysis. Based on the work of Wright [28] , Simon [24] , and Blalock [3] , causal analysis requires that the investigator hypothesize causal dependencies among a set of variables. If certain assumptions and conditions hold, one can derive predictions and test them against observed data. The predictions usually involve partial correlation coefficients that are readily calculated from a basic set of two variable correlation. Causal models, which are graphical representations of the interrelationships among the variables, have both 2 Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. Statistics in Political Science analytic and heuristic value since they require one to make explicit assumptions that are all too often left implicit. The procedure provides a method for translating a verbal theory into a mathematical one. An important example of causal analysis is Goldberg’s [8] study of American voting behavior. Although Goldberg believed that a person’s vote depended causally on various socioeconomic and political characteristics, he recognized that the antecedent factors also had a causal ordering among themselves. Figure 1 (an example of this work) shows a model that seemed to fit best his data. Arrows, representing direct causal linkages, point in one direction because this model, as in most of the original models developed by political scientists, allowed only for one-way, as opposed to reciprocal, causation. The model makes very stringent assumptions about error or disturbance terms (here represented by e’s) such as that they are not directly interrelated among themselves. If these assumptions and other conditions hold, then one can derive predictions about the magnitude of certain partial correlation or path coefficients. By comparing the observed and predicted values, one can decide whether the model is tenable. If it is not, then arrows representing direct causal linkages are added or deleted and new predictions derived. Key: 1 FSC, father’s sociological characteristics 2 FPI, father’s party identification 3 RSC, respondent’s sociological characteristics 4 RPI, respondent’s party identification 5 RPA, respondent’s partisan attitude 6 RV, respondent’s vote for president in 1956 Predicted Equations Observed Values r 41.23 = 0 −0.017 r 51.234 = 0 0.083 r 61.2345 = 0 −0.019 r 52.134 = 0 0.032 r 62.1345 = 0 0.053 r 53.124 = 0 −0.073 r 63.1245 = 0 −0.022 Key: 1 FSC, father’s sociological characteristics 2 FPI, father’s party identification 3 RSC, respondent’s sociological characteristics 4 RPI, respondent’s party identification 5 RPA, respondent’s partisan attitude 6 RV, respondent’s vote for president in 1956 Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. 3 Statistics in Political Science Figure 1. Goldberg’s causal model of voting behavior. Predicted Equations Observed Values r 41.23 = 0 −0.017 r 51.234 = 0 0.083 r 61.2345 = 0 −0.019 r 52.134 = 0 0.032 r 62.1345 = 0 0.053 r 53.124 = 0 −0.073 r 63.1245 = 0 −0.022 Causal and path analyses of these sorts have appeared frequently in the social sciences since their original development in the early 1960s. Besides being helpful analytic devices, they have heuristic purposes as well. Causal models have been employed in the investigation of panel data (i.e., surveys of respondents at different times), measurement error, and unmeasured or unobserved variables. As useful as the method is, however, it has several drawbacks. Concentrating on partial correlation coefficients can obscure the importance of estimating the underlying parameters. After all, since a causal model represents a system of equations, one wants to know more than how well the data fit a particular model; one also needs good estimates of its structural coefficients. Others complained that the necessary assumptions were unrealistic. For instance, is a person’s partisan attitude only an effect of his party identification? Could there not exist a reciprocal relationship in which the variables are both causes and effects of each other? Finally, these models were analyzed by ordinary least-squares (OLS) regression, which give, inefficient or inexact results when the assumptions about the error terms are not satisfied. For these and other reasons, political scientists along with other social scientists turned to general structural equation models [6] . This more general approach contains causal and path analysis as special cases. 4 Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. Statistics in Political Science Figure 2. Page and Jones’ nonrecursive voting model. Once again, the motivation and limitations of structural equation modeling are best illustrated by an example. Page and Jones [22] , in what may become another methodological milestone in political research, wanted to explain voting behavior using explanatory variables similar to Goldberg’s. But troubled by what they considered an “incorrect” assumption of one-way causation, they developed a model that allowed reciprocal causal effects among the main variables. (See Figure 2 in which each arrow stands for a causal path and represents a structural coefficient. The sets of exogenous variables are assumed to be prior to the endogenous variables and orthogonal to one another.) Since the structural equations contained too many endogenous unknowns to be estimated from the observed data, the authors had to add exogenous variables—that is, variables which were presumably causally unaffected by the endogenous factors—and use two-stage and three-stage least squares to obtain the estimates. Key: CPD, comparative policy distances CPA, current party attachment CCE, comparative candidate evaluation V, vote X, Y, Z, exogenous variables Key: CPD, comparative policy distances CPA, current party attachment CCE, comparative candidate evaluation V, vote X, Y, Z, exogenous variables What is noteworthy about this work is not its substantive conclusions, which will surely be debated, but its demonstration of how far the study of politics has progressed. Starting with simple cross tabulation, political scientists then advanced to the analysis of single-question models, then to multiequation Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. 5 Statistics in Political Science recursive (one-way causal) systems, and now to dynamic simultaneous equation models. Along the way, they recognized the limitations of OLS regression and are now using advanced regression estimating techniques: Multistage least squares, full-information maximum likelihood, instrumental variables, and the like. They are also increasingly using time-series analysis because many variables of interest have an order based on time. Time-series data appear regularly in the study of defense expenditures, the amount of violence in the international system, economic conditions and election outcomes, and public policy analysis. Although the growing sophistication in the use of regression analysis has brought many results, it has also raised troublesome questions. To estimate the causal processes, for example, Page and Jones [22] introduced a number of presumably exogenous variables. But in doing so, they introduce additional assumptions about how these variables are related to those already in the system and to each other. This is a perennial problem in political research: The phenomena of most interest are quite complex and any effort to describe them mathematically inevitably leads to restrictive and sometimes unrealistic assumptions. One difficulty may be overcome at the cost of introducing new ones. The authors criticized previous investigators for assuming one-way causation while making a host of other assumptions. Not surprisingly, then, their results have not found universal acceptance and the definitive explanation of voting seems as elusive as ever. Paralleling the application of regression techniques has been the widespread use of factor analysis. Its acceptance has been motivated by two related considerations. First, there is a belief that although political behavior manifests itself in numerous ways—it can still be explained by reference to a much smaller number of factors and the researcher’s objective is to use factor analysis to identify and name the underlying factors. A second, perhaps more practical motivation is the utility of factor analysis as a data reduction technique. A typical public opinion survey contains scores of questions. If they are all intercorrelated, the correlation matrix may be reduced to a smaller number of factors thereby simplifying the interpretation of the data. Typical is Finifter’s [7] study of political alienation. Starting with 26 items that appear to measure the concept, she used factor analysis to isolate two factors, labeled “powerlessness” and “perceived normalessness.” Factor analysis and related techniques—discriminant, cluster, and canonical analysis—are becoming quite popular for a variety of purposes. They are used in legislative and judicial studies to identify voting factions or clusters of issues that define policy dimensions, in public opinion research to find ideological structures among individuals or groups, and in comparative politics to compare the attributes of nations on various political and socioeconomic variables. Factor analysis is also used as a tool for improving political measurement and exploring the consequences of measurement error. Another relatively recent development is the emergence of multivariate procedures for the analysis of categorical (as opposed to quantitative) data. A common problem in political research takes this form: Suppose that one wants to analyze candidate preference (Democrat or Republican), partisanship (Democratic, Republican, independent), income (high, medium, low), region (South and non-South), and race (white, nonwhite). If a sufficiently large number of cases are available, then one can make a 2 × 3 × 3 × 2 × 2 cross-classification or contingency table. But what is the best way to analyze it? Two general approaches have been proposed. The first, based on log-linear models (i.e., model for the logarithm of cell frequencies or some function of them), uses maximum likelihood estimation [9] . It permits one to estimate and test the significance of “main” effects and various types of interaction. One might want to know, for example, if the relationship between preference and partisanship is the same for different combinations of the demographic factors. The second approach also answers this type of question and others as well, but uses weighted least squares as the estimating procedure [12] . 6 Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. Statistics in Political Science At first sight, the multivariate analysis of categorical data appears to be a godsend to social scientists who have an abundance of ordinal and nominal scales to contend with. They are also useful in analyzing two-way cross classifications such as mobility tables and panel studies. Neither approach has displaced regression and factor analysis, however. Perhaps they have arrived on the scene too recently. Furthermore, both require large sample sizes and lead to the estimation of innumerable parameters. Political scientists may prefer the relative simplicity of least squares. Nevertheless, it seems certain that weighted least squares and maximum likelihood analysis of categorical data analysis will increase substantially in the future. 3 Statistics and Measurement in Political Science In attempting to achieve the rigor and precision of natural sciences, political science faces a formidable obstacle. The problems of greatest interest usually involve considerable complexity and subtlety. Indeed, there is often very little agreement about what certain concepts mean, much less about how to measure them. How, for instance, does one conceptualize and measure “power,” “equality,” or “democracy”? Nearly everyone agrees, then, that empirical measures of political concepts are fallible indicators subject to at least three types of errors. Errors may arise in the first place from an inappropriate level of measurement. Most statistical tools assume that the variables are measured quantitatively on interval or ratio scales. But political phenomena are not so easily quantified, and the best one may achieve is a classification of the subjects on a nominal or ordinal scale. There has been considerable to-do about whether such scales should be treated as interval level or analyzed by methods designed explicitly for categorical data. Many investigators simply assign numbers to categories, especially if they have dichotomous data, and use the usual statistical formulas. Others prefer to rely strictly on categorical statistical procedures. A particularly troublesome question remains, however. Suppose that an aptitude or behavior is actually continuous, rather than discrete, but is measured as though it consisted of only two categories. Such a case might arise in a sample survey in which people’s preferences are classified as “pro” or “con,” when, in fact, innumerable shades of opinion may exist. Is it legitimate to make inferences about substantive phenomena on the basis of such data—no matter what statistic is used? This is still an unresolved issue. A second source of error is random and nonrandom measurement error. Sample surveys and census materials, the source of much data, frequently reflect selective retention, biases, disinterest, incomplete record keeping, or other mistakes. One must assume that empirical data at best imperfectly represent the underlying properties and that error variance will be a sizable portion of a variable’s total variance. Thus, it is not surprising that reliability and validity have become important issues in the scientific study of politics. They have been dealt with in a variety of ways. There is perhaps less concern in the literature with “classical measurement theory” than one would imagine for a discipline that relies so heavily on questionnaires. In fact, many early investigators took the reliability of their measures for granted as they seldom bothered to compute or report standard reliability checks such as test-retest, split halves, alternative form, or reliability coefficients (e.g., Cronbach’s alpha) [18,20,21] . Fortunately, the 1970s witnessed a reaction against this laissez-faire attitude toward measurement, and findings that had for years been accepted as dogma were challenged on the grounds that the data were faulty. Another source of measurement error is more troublesome. Empirical measures often serve as indirect or substitute indicators of the true concepts. Political scientists find themselves in the place of a Boy Scout who must measure the height of a tree from its shadow, only they frequently have only a vague notion of how the shadow’s dimension relates to the tree’s size. Not surprisingly, then, elaborate statistical manipulations cover but do not hide the profoundities of politics. Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. 7 Statistics in Political Science Confronted with these sources of error, political scientists have followed several paths. Many have tried to overcome them by using multiple indicators of a single theoretical concept. The earliest efforts in this direction were the construction of measurement scales from Likert (agree-disagree) items and scalogram analysis [25] . In the latter procedure, questions are related to one another in such a way that, ideally, an individual who replies favorably to item 1 also responds favorably to item 2; an individual who favors item 3 would also prefer items 1 and 2. A Guttman scale is both cumulative and presumably unidimensional in that it purports to measure a single underlying continuum. Guttman scales were used to measure individual behavior (e.g., attitudes, policy dimensions in Congress and the judiciary, the attributes of nations, and a host of other concepts). Although a scalogram analysis was popular in the formative years of quantitative political research, it is seen less widely now because it offers little guidance for selecting items that are likely to form a scale and because criteria for assessing the adequacy of the scale sometimes give very misleading results. More important reasons for its decline are the assumption of unidimensionality and the emergence of multidimensional scaling techniques. Multidimensional scaling, a product of psychometric research, has found a natural audience in political science. It permits one to locate individuals on several attitudinal dimensions instead of just one. Weisberg and Rusk [27] , as an example, wanted to know which factors (partisanship, ideology, issues, personality) affected people’s evaluation of the 12 candidates. They also wanted to know how many dimensions people use to evaluate candidates. Employing a nonmetric multidimensional scaling routine, the investigators found that candidate evaluations as well as certain attitude preferences required a two-dimensional space with one axis representing traditional left-right politics, the other a newer left-right division. Although multidimensional scaling, along with factor analysis, is currently the most widely employed techniques for the construction of measurement scales, causal and path analysis have become a main means to assess the reliability of measures and to explore the causes, effects, and possible remedies of various kinds of measurement errors. Using path analysis, Costner [5] provided criteria for identifying biases due to certain types of nonrandom measurement error through the use of multiple indicators. Similarly, J¨oreskog’s [13,14] analysis of covariance structures permits one to identify errors and assess their statistical significance. Variations of both approaches are found throughout political science. 4 The Future of Statistics in Political Science Statistics in political science, it seems safe to say, will flourish. Quantitative methods have shown their value and researchers steadily expand the areas to which they are applied. Obviously, this survey barely touches the breadth of applications. At the same time, political scientists have become more knowledgeable and careful users. A factor that encouraged the proliferation of statistical applications in political science was the availability of computers and especially prewritten program packages. These systems, with increasing in power and sophistication, relieved social scientists of the burden of learning the computational and theoretical underpinnings of many procedures. By relying on these preprogrammed instructions, one could produce reams of computer output without having an extensive training in statistics. Textbooks that supplied computing formulas but few proofs and little theory also fueled the growth. But the discipline discovered that although “results” come early and profusely, it is often hard to make theoretical sense out of them. Today political scientists, having a solid background in mathematics, statistics, and computer science, tend to be more self-conscious and responsible about their methods and methodology. They have learned to be very thoughtful about the assumptions, theory, and interpretation of the techniques they use, and when problems arise, they are better able to consult with specialists in the statistical sciences. Being more confident, they have also become more eclectic and adaptable, borrowing from whatever field promises the best solutions to particular problems. The result is more rigorous and precise research. 8 Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. Statistics in Political Science At the same time, quantitative research is becoming less dogmatic. In its infancy, empirical political scientists tended to dismiss nonquantitative studies as too impressionistic, unsystematic, and parochial. Even worse, technique at times dominated content; findings were taken as proven not because evidence necessarily supported them but because they flowed from extremely esoteric multivariate procedures involving seemingly endless equations and matrix operations. Statistical significance passed for substantive significance, while intuition and common sense fell by the wayside. Fortunately, by now everyone seems to realize the very real limits to which the statistical sciences can be pushed in human affairs. The heart of the matter is of course the multifarious nature of the topic. It is not simply that politics is difficult to conceptualize and measure, although that is certainly true because measurement continues to be the Achilles heel of the discipline; without improvement in this area, progress will be slow indeed. But just as important, statistical generalizations often obscure the nuances, the idiosyncrasies, the special cases—in short, the very things that make politics so interesting and significant. Many investigators have found that however elaborate the research design, their results are seldom convincing unless they are embedded in a contextual grasp of the problem, and that one simply cannot analyze disembodied numbers without grossly misreading the phenomena they measure. In the future, therefore, the discipline will continue to borrow heavily from the statistical sciences, but it will apply this where-withal with greater care, imagination, and most important, appreciation of the limits of its valid application. References [1] Alker, H. R. , Jr. (1975). In Handbook of Political Science, Vol. 7: Strategies of Inquiry, F. I. Greenstein and N. W. Polsby, eds. Addison-Wesley, Reading, MA, pp. 139–210. (A good history and survey of statistical application in political science.) [2] Berelson, B., Lazarsfeld, P. F., and McPhee, W. N. (1954). Voting. University of Chicago Press, Chicago, IL. [3] Blalock, H. M. , Jr. (1964). Causal Inferences in Nonexperimental Research. University of North Carolina Press, Chapel Hill, N.C. (A landmark in the social sciences, this work introduced political scientists to causal analysis.) [4] Campbell, A., Gurin, G., and Miller, W. E. (1954). The Voter Decides. Row and Peterson, Evanston, IL. [5] Costner, H. L. (1969). Amer. Polit. Sci. Rev., 75, 245–263. [6] Duncan, O. D. (1975). Introduction to Structural Equation Models. Academic Press, New York. [7] Finifter, A. W. (1970). Amer. Polit. Sci. Rev., 64, 389–410. [8] Goldberg, A. S. (1966). Amer. Polit. Sci. Rev., 60, 913–922. [9] Goodman, L. A. (1970). J. Amer. Statist. Ass., 65, 226–256. [10] Goodman, L. A. and Kruskal, W. H. (1954). J. Amer. Statist. Ass., 49, 732–764. [11] Gosnell, H. F. and Gill, N. N. (1935). Amer. Polit. Sci. Rev., 29, 967–984. [12] Grizzle, J. E., Starmer, C. F., and Koch, G. G. (1969). Biometrics, 25, 489–504. [13] Jöreskog, K. G. (1969). Psychometrika, 34, 183–202. [14] Jöreskog, K. G. (1970). Biometrika, 57, 239–251. [15] Kendall, M. G. (1955). Rank Correlation Methods, 2nd ed. Charles Griffin, London. [16] Key, V. O. , Jr. (1949). Southern Politics. Alfred A. Knopf, New York. [17] Lazarsfeld, P., Berelson, B., and Gaudet, H. (1944). The People’s Choice. Columbia University Press, New York. (One of the first works to apply quantitative analysis in a systematic way to the study of electoral politics.) [18] Lord, F. M. and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Addison-Wesley, Reading, MA. [19] Miller, R. G. , Jr. (1966). Simultaneous Statistical Inference. McGraw-Hill, New York. [20] Nunnally, J. C. (1964). Educational Measurement and Evaluation. McGraw-Hill, New York. [21] Nunnally, N. C. (1978). Psychometric Theory. McGraw-Hill, New York. [22] Page, B. I. and Jones, C. C. (1979). Amer. Polit. Sci. Rev., 73, 1071–1089. [23] Pearson, K. (1978). The History of Statistics in the 17th and 18th Centuries, E. S. Pearson, ed. Macmillan, New York. [24] Simon, H. (1957). Models of Man: Social and Rational. Wiley, New York. [25] Stouffer, S. A. et al. (1949). Measurement and Prediction. Studies in Social Psychology during World War II, Vol. 4. Princeton University Press, Princeton, NJ. [26] Stuart, A. (1953). Biometrika, 40, 106–108. Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved. 9 Statistics in Political Science [27] Weisberg, H. F. and Rusk, J. G. (1970). Amer. Polit. Sci. Rev., 64, 1167–1185. [28] Wright, S. (1934). Ann. Math. Statist., 5, 161–215. 10 Copyright c 2013 by John Wiley & Sons, Inc. All rights reserved.