Society for Political Methodology Improving Data Quality: Actors, Incentives, and Capabilities Author(s): Yoshiko M. Herrera and Devesh Kapur Source: Political Analysis, Vol. 15, No. 4 (Autumn 2007), pp. 365-386 Published by: Oxford University Press on behalf of the Society for Political Methodology Stable URL: http://www.jstor.org/stable/25791902 . Accessed: 12/01/2015 23:56 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. . Oxford University Press and Society for Political Methodology are collaborating with JSTOR to digitize, preserve and extend access to Political Analysis. http://www.jstor.org This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions AdvanceAccess publicationApril2, 2007 Political Analysis (2007) 15:365-386 doi:10.1093/pan/mpm007 Improving Data Quality: Actors, Incentives, and Capabilities Yoshiko M. Herrera Department of Government, Harvard University, Davis Center,#S301, 1730 Cambridge Street,Cambridge,MA 02138 e-mail: herrera@fas.harvard.edu (corresponding author) Devesh Kapur Centrefor Advanced Studyof India, UniversityofPennsylvania, 3600Market Street,Suite 560, Philadelphia, PA 19104 e-mail: dkapur@sas. upenn. edu This paper examines the construction and use of data sets inpolitical science. We focus on three interrelated questions: How might we assess data quality? What factors shape data quality? and How can these factors be addressed to improve data quality? We firstoutline some problems with existing data set quality, including issues of validity, coverage, and accuracy, and we discuss some ways of identifyingproblems as well as some consequen ces ofdata qualityproblems.The core of thepaper addresses thesecond questionby analyzing the incentives and capabilities facing four key actors in a data supply chain: respondents, data collection agencies (including state bureaucracies and private organiza tions), international organizations, and finally,academic scholars. We conclude by making some suggestions for improving the use and construction of data sets. It is a capital mistake, Watson, to theorise before you have all the evidence. Itbiases the judgment. ?Sherlock Holmes in "A Study in Scarlet" Statistics make officials, and officials make statistics." ?Chinese proverb 1 Introduction Modern capital markets and political science have at least one thing in common: a de pendence on data. But the resemblance stops there.When data quality declines in capital markets orwhen investorsand analysts become insufficientlycritical about price/earnings Authors' note: For generous comments atmany stages in thepaper, theauthors would like to thankDawn Brancati, Bear Braumoeller, Kanchan Chandra, Jorge Dominguez, Errol D'Souza, Richard Grossman, Ana Grzymala Busse, Andrew Kydd, David Laitin, Daniel Posner, Jasjeet Sekhon, Hillel Soifer, Jessica Wallack, and Steven Wilkinson and theComparative Politics Research Workshop atHarvard University, and theanonymous reviewers fromPolitical Analysis. The authors take full responsibility for any errors. An earlier version of this paper was presented at theAmerican Political Science Association Annual Meetings, Boston, MA, August 2002. ? The Author 2007. Published byOxford UniversityPress on behalf of theSociety forPoliticalMethodology. All rightsreserved.For Permissions, please email: journals.permissions@oxfordjournals.org 365 This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 366 Yoshiko M. Herrera and Devesh Kapur ratios and revenues, debacles likeEnron andWorldcom can happen. In cases like these, executives feltgreater incentives tomeet short-termtargetsforearnings growth than they did to produce accurate data. The consequences: shareholder lawsuits, regulatory and accounting reforms,jail sentences forexecutives, and investors' losing their shirts.When data quality slips inpolitical science orwhen political scientists are insufficientlycritical about theway theirdata were created or how they should be used, very littlehappens. Inattentiveness to data quality is,unfortunately,business as usual inpolitical science. We propose a heightened critical attention to data construction and a new way of looking at it: as an operation performed by data actors in a data supply chain.We know thatdata do not "grow on trees," yetwe must occasionally remind ourselves thatdata are produced by people and entities according to theirown incentives and capabilities. Despite strong disciplinary consensus about the behavioral effects of incentives, their effect on data actors has been woefully understudied by political scientists.Like all organizations, those thatproduce data are prone toproblems of agency, bureaucratic incentives, shirking, andmultiple principals and goals, all ofwhich are likely to shape theiroutput, thatis,data. By turningour critical gaze inward, to the creation of the everyday data we take for granted,we hope to show thenecessity of focusing on data quality, discipline-wide. Ideally, we would like tomake routine in the discipline such questions about data quality as, Who produced the data? Wliy? What were the producers' incentives and capabilities? Did theywork as an independent agency, orwere they influencedby external actors? Did theproducers have incentives to shape thedata rather thanjust report it?Such critical questioning is long overdue. Although we advocate greater critical attention to theconstruction of data sets,we want toemphasize thatour aim isnot toquestion theutilityof "large-Afstudies,"where the large number of observations is critical to reliably address problems related to bias and mea surement error.However, we do believe that there are serious weaknesses inmany data sets used in cross-country regressions currently invogue in political science. Therefore, addressing the strategic construction and use of data speaks directly to the validity of results. The paper is divided into two sections.We firstoutline some problems with existing data setquality, including issues of validity, coverage, and accuracy; and we discuss some ways of identifyingproblems as well as some consequences of data quality problems. Subsequently, we examine how the incentives and capabilities facing four key actors in a data supply chain affect data quality: respondents, states (including bureaucracies and politicians), international organizations (IOs), and finally, academics. We conclude by making some suggestions for improving theuse and construction of data sets. 2 Problems with Data Sets and Why They Matter Problems of data quality aremanifest and significant in a wide range of settings, from information collected by IOs and governments to the data sets compiled by individual scholars. They affect all sorts of indicators, from thosemore difficult tomeasure like identityvariables, to themore objective indicators such as economic variables. The measurement of data quality, however, has barely begun. Our framework for measuring ithas three elements: validity, coverage, and accuracy. Validity refers to the relationship between theoretical concepts and collected information;coverage refersto the completeness of data sets; and accuracy refers to thecorrectness or avoidance of errors in data sets.We end this section of thepaper by covering some ways to recognize quality problems and a brief discussion of consequences. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 367 2.1 Validity Validity is at theheart of data quality because the objective of informationcollection in social science research is toenable one todraw inferences and test theories. If theconnec tionbetween what is actually measured and what is purported tohave been measured is tenuous (or absent altogether, in some cases), then the empirical enterprise breaks down. Gary Goertz (2005) has outlined threelevels of social science research thatprovide a useful framework for thinkingabout validity: concepts, dimensions, and data.We can consider validity in termsof the relationship between each of these levels. For example, take de mocracy as theconcept of interesttous.Depending on our definitionof theconcept, dimen sionsmight include fairness of elections or civil liberties,and data for thefirstdimension might include the incumbencywin rateor themargin of victory,whereas rightsenumerated in theconstitution, such as universal suffrageor thenumber of protests,might serve as data forthesecond dimension. Scholars might disagree on thedefinitionof theconcept itselfand subsequently which dimensions should be used tomeasure it.They also might disagree on thedata tobe used forany particular dimension. This framework suggests thatthe starting point forassessing thevalidity of data setsmust begin with thedefinition of concepts. Unfortunately,many importantconcepts inpolitical science remainundertheorized.There is still littletheoreticalagreement on basic definitionsof concepts such as "ruleof law," "cor ruption,"and "identity."Consider "caste" for instance, a concept thatmany people believe plays an importantrole in social, political, and economic outcomes in India. Is caste a self understanding or a socially ascribed category?An ethnic distinction or a class distinction? The answers to these definitional questions indicate differentdimensions and typesof data thatwould be needed toassess thereal-worldpresence or absence of castes. Even "objective" variables such as gross national product (GNP) are not immune to such conceptual complex ities,although decades of standardization of theSystem ofNational Accounts have led us to largelyforget the tremendous amount of coordinated effortthatwent intodefiningGNP. Despite the fundamental importance of concept-appropriate choices formeasurement, too littleattention has been paid to the construction of some of themost widely used indices and data sets. Some authors, notablyMunck and Verkuilen (2002a), have sug gested general standards forassessment of data sets and outlined a framework for evalu ation that specifically draws attention to issues of conceptualization, measurement, and aggregation. And, the issue ofmeasurement validity has been addressed by Adcock and Collier (2001) in theAPSR. Unfortunately, however,much more attention to thesemeth odological issues is needed inpractice. The Polity data series, one of themost widely used indices of democracy and author itarianism in political science, offers a typical case of concept validity problems accom panied by awidespread absence of scrutinyby users. The analysis byGleditsch andWard (1997) of the thirdedition of Polity warned that "the analytical composition of thewell known democracy and autocracy scores is not upheld by an empirical analysis of the component measurements." Moreover, they argued that "democracy, as measured by thePolity indicators, is fundamentally a reflection of decisional constraints on the chief executive. The recruitment and participation dimensions are shown to be empirically extraneous despite their centrality in democratic theory" (Gleditsch and Ward 1997, 361). Our intention is not to single out Polity.1Although thisfinding about a data set that many of us take forgranted is important, it is hardly unique. lrThedata sets we use as examples in this paper were chosen not because they are particularly error prone, but ratherbecause theyare among themost widely used inpolitical science. Discussion of their shortcomings is thus both relevant and illustrative for the entire field. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 368 Yoshiko M. Herrera and Devesh Kapur Another case of troubledconcept validitywas covered ina symposium on identityin the American Political Science Association (APSA) Comparative Politics Newsletter (Sym posium 2001). The authors pointed out thatalthough identityresearchers predominantly rely on the constructivist paradigm, quantitative indices such as the Ethno-Linguistic Fragmentation index (ELF) remain primordialist.2 The same can be said for thecontinued use of the very limited race and ethnicity categories on theU.S. census tomeasure "di versity." There appears tobe a frustratingdisconnect between conceptual and methodo logical advancements on the one hand and the continued use of theoretically outdated dimensions on theother. Measurement validity addresses the next level: the relationship between dimensions and collected data. Despite the fact thatmeasurement validity is a basic lesson in any introductorydata analysis course in political science, the use of imprecise or concept inappropriate indicators remains widespread in the field. This is evident in overt cases where data simplydo notmatch a dimension. But therearemany more subtle cases such as level-of-analysis problems where, for example, national data may be substituted for re gional data or where recent annual data are not available and thus old data are used repeatedly. For example, caste data were last collected in India on the 1931 census, but, as themost currentdata available, these 1931 data continue tobe used to explain contem porary phenomena. A related issue inmeasurement validity is theproblem of consistency, comparability, or reliability across countries. In brief,what ismeasured inone country,although itmay go by the same name, may not be what is being measured in another country.For example, data purporting tomeasure "human capital" mainly depend on measures of education. However, themost frequently used measure, "years of schooling," cannot distinguish between years spent in a madraasa inPakistan or a magnet school in theUnited States. Moreover, theproduction of precise numbers to code survey responses masks the incom parability thatoccurs when identical questions are interpreteddifferentlyby respondents.3 2.2 Coverage A second major component of data quality is issue coverage?that is, the presence or absence of the data needed for a given research question. Inmany cases data on key variables of interest to scholars and governments are either incomplete or simply not collected at all, especially for certain types of countries. In theworst cases, meaningful work onmany importantquestions cannot be done at all. For most countries in theworld, variation within countries cannot be analyzed since key political indicators, such as substate or regional measures of democracy, rule of law, and corruption, are not available. Similarly, beyond macroeconomic data, we lack information on several importanteconomic indicators.We all recognize thata significantpart of pro duction and trade in less developed countries (LDCs) is carried out in the informal sector, yet there is a dearth of data on thisvital part of the economy. Some endemic coverage gaps are specific to certain parts of theworld. Demographic data older than 20 years, such as the size and growth rate of the population, cannot be 2Efforts are underway to address this problem. For example, alternatives to the ELF include: the politically relevant ethnic group data set by Posner (2004); a constructivist data set on ethnic voting by Chandra et al. (2005); attempts tomeasure identitymore generally (Abdelal et al. 2006); and an index of ethnonationalist mobilization (Cederman and Girardin 2005). 3There have been important recent attempts to address the problem of cross-cultural comparability of survey questions. See King andWand (2004) and King et al. (2004). This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 369 unambiguously determined inmore thana fewAfrican countries,with themargin of error often near 20%. The same is trueof social statistics, such as those relating to literacy, school enrollment ratios, and poverty levels (Chander 1988). Closed societies also limit theavailability of information.And finally,with the increasing use of online statistics and theprominence of theEnglish language amongWestern social scientists, statistics thatare not inEnglish are more likely to be ignored than those thathave been translated into English. 2.3 Accuracy The final consideration of data quality is accuracy or theavoidance of outrighterrorsat the level of data collection and presentation. Some errors are the result ofmethodological reformswhose new measurements indicate changes despite real-world constancy, and others are the result of biased data due to the subjectivity of respondents. Apparent changes in data are sometimes due to changes inmethodology. Measured infantmortality in the Soviet Union rose in the 1970s. According toVelkoff andMiller, however, Soviet infantmortality in all probability remained flat; what changed was theway inwhich itwas measured (Velkoff andMiller 1995). Similarly, one reason why the growth of services may be a statistical artifact is the increased level of outsourcing inmanufacturing firms.For instance, ifGeneral Motors spins off itsdesign unit, thedata will show a decline inmanufacturing and an increase in services, even though little has changed in the real economy. And since many transactions in services are in the (unreported) informal sector,an economy thatsees a shiftfrom the informal to the formal sectorwill see fastergrowth inmeasured services compared to theactual change. The subjectivity of respondents has been amply documented in survey research and poses obvious problems fordata quality. Though underacknowledged, such bias isno less rife among thepopulation of "experts" whose responses underpin widely used data sets like theFreedom House democracy ratings and Transparency International's corruption index.The generous Freedom House scores toward certainCentral American countries in the 1980s may have reflected cold war, that is, anti-communist, understandings of de mocracy among experts; similarly,Transparency International largelymeasures bureau cratic corruption, rather than overall corruption, due to the types of people who give assessments. Close examination of these indices reveals thatmeasures thatrelyon expert opinion can be biased by factors thataffect thepopulation of experts. This criticism isnot directed against using expert respondents toconstruct indices. Our intention,rather,is to emphasize theneed to be circumspect and explicit about the sub jective construction of such quantitative data sets, and thereby to better understand un derlying biases and ultimately improve the construction and use of such data. Ostensibly objective data sets thatquantify complex concepts such as "democracy," "governance," and "rule of law" are oftenbased eitheron subjective surveysor on indexeswhose weights are also subjective. That analysis is subjective is not a problem per se, but that it is often taken or imagined to be objective obscures the challenges of using data wisely to appre hend real-world phenomena. 2.4 Recognizing Quality Problems How thendoes one identifyproblems with data quality? The two likeliestways are by looking fordiscrepancies among sources or inconsistencies within publication series and by looking into external citation of problems. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 370 Yoshiko M. Herrera and Devesh Kapur Often one need only be a careful reader touncover discrepancies eitherwithin thedata produced by a single organization or between differentorganizations claiming tomeasure the same thing.The InternationalMonetary Fund (IMF)'s primary statistical publication, International Financial Statistics, provides many instances where the data of the same year in books from differentyears do notmatch. Similarly, there are sometimes unex plained discrepancies between the print and electronic versions. This problem is by no means unique to the IMF. The World Bank offersdata on GNP per capita growth rates for countrieswhere underlyingGNP data do not exist; theyalso report the share of agriculture in gross domestic product (GDP) for countries with nonexistent GDP estimates (Kapur, Lewis, andWebb 1997). Moreover, there is no evidence that these anomalies have ever been corrected.Another way to spotquality problems is to look fordiscrepancies between organizations: between 1981 and 1986, the IMF's GDP estimates forZaire were about 60% of those of theWorld Bank. Unfortunately, many government statistical offices do not faremuch better than the IMF andWorld Bank, and there is no indication that thequality of statistics is improving over time. In India, theCentral Statistical Organization (CSO) produces data on GNP and othermacromeasures of the economy. On the other hand, theNational Sample Survey Organization (NSSO) provides micromeasures of the economy through surveys on con sumption, education, and so on. In principle, the consumption data estimated by the macroapproach of theCSO and themicrodata aggregated fromhousehold surveys con ducted by theNSSO should be equal, although some variations are inevitable. A few decades ago thatwas the case. More recently, thediscrepancy between NSSO and CSO data has grown increasingly substantial: 1999/2000 NSSO figures showed consumption at just half the level of theCSO estimates. The weaknesses of India's national accounts data are also evident in the growing discrepancy between the expenditure and production estimates of GDP A recentWorld Bank report points out that choosing between these estimates is not easy and that "the only conclusion thatcan be made confidently is that [India's] statistical architecture, once amodel forother developing countries, needs more consistency checks" (World Bank 2000, para. 1.19).Whether or not India's people are getting poorer, "its statisticsunquestionably are" (Aiyar 2001). A second way to recognize quality problems is to review thedata's external citation by scholars. Reviews and analyses of existing data sets are on the rise, a trendwe strongly encourage. Munck and Verkuilen (2002a), for example, have evaluated nine data sets on democracy.4 Some analyses have been cautionary.Assessing the latest,fourthedition of the Polity series,Treier and Jackman (2006) concluded that"skepticism as to theprecision of the Polity democracy scale iswell-founded, and thatmany researchers have been overly san guine about theproperties of thePolity democracy scale inapplied statisticalwork." Others have been more forceful in theircriticism. In assessing theBrettonWoods institutions,T. N. Srinivasan (1994, 4) stated bluntly: "publications of international agencies, such as the Human Development Report [of theUnited Nations Development Programme (UNDP)] andWorld Development Indicators of theWorld Bank, give amisleading, ifnot altogether false, impression of the reliability,comprehensiveness of coverage, comparability and re cency of thedata, and fail towarn theunwary users of the serious deficiencies in thedata". 4Munck and Verkuilen (2002a) was followed by three discussion pieces as well as a response by the authors: see Coppedge (2002), Marshall et al. (2002), Munck and Verkuilen (2002b), andWard (2002). For another evalu ation of democracy measures, see Collier and Adcock (1999). For a painstaking analysis of trade statistics, see Yeats (1990) and Rozanski and Yeats (1994); on comparisons of governance indices, see Kaufmann, Kraay, and Zoido-Lobaton (1999a, 1999b) and Kaufmann, Kraay, and Mastruzzi (2002); on rule of law, see Berkowitz, Pistor, and Richard (2003); and on ethnicity, see Laitin and Posner (2001) andWilkinson (2002). This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 371 2.5 Consequences Problems of low data quality, that is, problems with validity, coverage, and errors, will affect thequality of political science research.Where concepts are not clearly defined, we should expect a lotof variance inboth choices of dimensions, aswell as inconsistencies inmeasurement of data across time and space. These quality problems will also affect theanalysis and conclusions thatcan be drawn from thedata. And, when data sets are used inquantitative analysis, thereare also technical consequences. In termsof research results, several technical issues are relevant to the construction of data sets:measurement bias, measurement error and correlation of errors, and pooling or aggregation ofmeasures. Measurement bias is conceptually separate frommeasurement error.Where themeas ures themselves are biased, there are a host of complex issues and the consequences depend on how themeasures are biased and how themodels are parameterized.5 The consequences ofmeasurement errordepend on where theerrors are located and with what theyare correlated. It isworth briefly considering the following types of errors. 1.Measurement error in thedependent variable: In this case the regression coeffi cientswill have largervariances, leading togreater uncertainty regarding inference validity. 2. Measurement error inuncorrelated independent variables: As long as the indepen' dent variable is not correlated with any other independent variable, itwill result in a biased coefficient for thatvariable and the coefficientwill be attenuated toward zero. In otherwords, ifone is certain that the independent variables are not corre lated,measurement error in one such variable will make the estimate of thatvari able's effect biased downward, but the estimates of the other variables will be unaffected. 3. Measurement error among correlated independent variables: If the independent variables are correlated, then even random, unbiased measurement error in one single variable will lead to biased coefficients, and the direction of the bias is difficult todetermine; in some cases thecoefficientsmay even have thewrong sign (see Achen 1985). In otherwords, if independent variables are correlated and they almost always are innonexperimental settings, thenmeasurement error inonly one variable can make the estimates of that variable's effect as well as other variables' effects inconsistent. 4. Measurement error in independent variables correlated withmeasurement error in thedependent variable: If thisoccurs then the correct specification assumption is violated and in general all the coefficients are biased. Given these issues, thecross-country pooling of data and inparticular thecombination of data fromOrganisation forEconomic Co-operation and Development (OECD) countries with LDC data may be problematic if itentails correlated measurement error or bias. If measures associated with LDCs have greatermeasurement error than thedata fromOECD countries (for reasons outlined below) and if themeasurement error is correlated with other variables of interest,and perhaps with thedependent variable, then the resultsmay be biased and inconsistent.And it is worth repeating that this is the case even if the measurement error itself is not biased.6 5For a more general discussion of measurement bias seeWhite (1994). ^e hasten to add that thediscussion of the consequences of correlated measurement error is in regard toordinary least squares- and maximum likelihood-type estimators, two very commonly used models inpolitical science. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 372 Yoshiko M. Herrera and Devesh Kapur 3 Data Actors and theData Supply Chain: Incentives, Capabilities, and Consequences Problems with data quality have not gone entirely unnoticed. Methodologists and statis ticians areworking todevise technical fixes forvarious problems in large data sets.7And a varietyof scholars have individually endeavored to improveupon existing data sets8or to suggest novel indicators and measures.9 These painstaking effortsat evaluation and cor rections have so farreceived too littleattention.The uncritical use of problematic data sets, without regard to these attempts at improvement, continues relatively unabated. Despite well-known problems, high-profile data sets like thePolity series, retain, in thewords of Treier and Jackman (2006, 22), "near-canonical status.". All of which leads to a big question: why do theseproblems with data set quality persist? Our answer to thisquestion focuses on two factors: the incentives and capabilities of data actors.Data collection isof course costly, a factorwhich alone could explain some of the quality problems. But resources and budgets are not the only problem. Incentive structures facing both producers and users of data sets are an important part of the explanation as well: the incentives and capabilities of actors and institutions in thedata supply chain have significantyet underacknowledged consequences fordata quality. Figure 1 schematically represents the supply chain of data production. It begins with original respondents?individuals, households, firms,and government agencies. The data collection agencies?state statistical institutionsand private firms?are thenext links in thedata chain. State agencies can be both respondents and suppliers of data. As we move upstream, these data are supplied to IOs and nongovernmental organizations (NGOs), which have emerged as critical repositories of comparable cross-national data sets.Aca demic scholars receive and share data with IOs, but sometimes also receive data directly from either state statistical offices or private data collection firms.Although academics also collect data directly from respondents, the substantial costs of putting together large data setsmeans that their involvement is usually indirectby way of technical advice and assistance to IOs, NGOs, and data collection agencies. Similarly, IOs andNGOs also assist and thereforeinfluencedata collection agencies. This explains thedotted lines going back toward data collection agencies. Below we discuss each of these data actors in termsof incentives, capabilities, and consequences, summarized inTable 1. 3-1 Respondents?Incentives The incentives for respondents include opportunity costs, fear of punishment, political support, and material gain. Opportunity costs come intoplay when the incentives to re spond at all are weak. This is often the case when respondents see no direct benefit in participating, as when households are asked to complete census forms or firms are sur veyed, without statutory provisions mandating participation. Census participation is encouraged by the threat of formal punishment in countries where answering the questionnaire ismandated by law. 7See, forexample, thepreceding discussion, as well as Treier and Jackman (2006) on adjustments to thePolity IV series. For attempts to address contextually specific effects across contexts, seeWong and Mason (1991), King andWand (2004), and King et al. (2004). 8Examples of works attempting to update and amend the correlates of war data set include Bueno de Mesquita (1981, 21) and Slantchev (2004). 9There are far toomany works toname here, but, foran example, seeMishler and Rose (2001) onmeasurement of political support in transitional regimes or Rose (2002/2003) on measurement of the informal economy in transitional regimes. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 373 Original Respondents Data collection agencies IOs& NGOs Scholars Fig. 1 Supply chainof data production. Ironically, economic deregulation and political liberalization can reduce incentives if deregulation removes the legal obligation to respond. This was the case with the 1989 USSR census compared to the 2002 Russian census. Participation was mandatory in the former,voluntary and, not surprisingly,lower in the latter.Before liberalization in India in 1991, licensing requirementsmandated thatfirmsfillout surveys.With delicensing and the abolition of thegovernment agency formerlyresponsible for the surveys, theresponse rate fell as thenew agency lacked any statutorypowers to compel responses (Nagraj 1999). Mistrust of surveyors or fear of punishment forparticipation can be at work in both liberal as well as authoritarian regimes. In an environmentwhere respondents do not trust surveyorsor the state, theymay be reluctant to respond openly toquestions iftheyfear that that informationmight be used against them.Although this lack of trustismore likely in authoritarian regimes, it can also be a problem in democracies where privacy concerns may be primary. Pressure to comply with state directives or the need to secure political supportmay provide incentives for respondents to deliberately misreport data. The same logic that motivates households inChina to underreport theirnumber of children for fear of prose cution also moved firms in theUSSR tooverestimate production inorder tofulfillplanning targets.Similarly, inChina, an audit probe of 100 state-owned enterprises in 2003 found that81 had falsified theiraccounts, 69 ofwhich reportednonexistent profits.Even allow ing for selection bias in thefirmsaudited, can we trustthedata reportedby the 300,000 odd firms in thestate sector and, in turn,China's overall economic statistics (Kynge 1999)? Material gain is another incentive that affects respondents. Inmany countries, espe cially where the boundary between the tax authorities and the statistical office appears fluid,private entrepreneurswill understate earnings and output to avoid taxes. This is not only the case in places like China or Russia, and tax avoidance is not the only possible material incentive. In countries with capital controls and exchange-rate distortions, trade data are especially likely tobe manipulated by firms, throughunderinvoicing of exports and overinvoicing of imports.And the spate of corporate accounting scandals in theUnited States testifies to the power of incentives on data integrity?in this case, the linkage between reported profit earnings and fat annual bonuses. Beyond economics, data on identitygroups are also subject tomaterial incentives: for example, thewide array of compensatory (affirmative action) measures in India has moved many to strategically misrepresent theircaste origin in order to exploit statebenefits. When incentives pull actors indifferentdirections indifferentcountries, cross-national data sets are susceptible to particularly skewed results. Data on global fishery catches collected by theFood and Agricultural Organization (FAO) are a good example of this. Most fishermen tend tounderreport theircatches, and consequently,most countries can be presumed to underreport their catches to theFAO. Yet the catch statistics reported by China to theFAO continued to climb from themid-1980s until 1998.Watson and Pauly (2001) found thatthedifferencehad less todo with fish thanwith the structureof domestic incentives in China, especially the link between promotions of fisheries officials for This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 374 Yoshiko M. Herrera and Devesh Kapur Table 1 Actors, incentives, capabilities, and consequences in the data supply chain Actors Incentives Capabilities Data quality problems: validity, coverage, and accuracy Respondents (households, firms, and state employees) Data collection agencies (state bureaucracies or privatefirms) IOs andNGOs Academia Opportunitycosts Time Fear of punishment (mistrustof surveyors) Political support Material gain Internal organizational/ professional norms Material gain External pressure (from governments, society, and IOs) Internal organizational/ professional norms Supportofdonor states Cooperation of respondent states Rewards for publication quantity Rewards for theoretical contribution Costs of data collection/ improvement (For junior scholars) support of tenured scholars Knowledge Level of education/ literacy Access to surveys Level of health Human capital Financial resources from governments or IOs or researchers Human capital Financial resources from donor states Time Research funding Existing data sets Skills and technology for quantitative analysis Lack of response Intentional misreporting Selection bias in responses Lack of data collection or incomplete collection Unintentional errors Intentional misreporting/ manipulation of data Selection bias in responses Lack ofdata collection Selection bias in responses Lack of new data sets Continued use of low-quality data sets Misuse of data that do not match dimensions or concepts reportedproduction increases. Statistics can thusbe fishy indifferentways depending on thedifferent incentives for reporting across multiple countries. 3.2 Respondents?Capabilities Respondents' resources and capabilities primarily consist of time, knowledge, level of education, access to surveys, and level of health. Respondents who work or are otherwise busy may have less time to answer surveys; this is true across countries and may be This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 375 a problem for the sample ifcertain typesof people respond less frequently.Knowledge is another resource thatvaries, leading not only tovariance inaccuracy of responses but also to variance in response rates, if less knowledgeable people are lesswilling toparticipate. And knowledge may be related to level of education, as, forexample, illiteratepeople would be less able tofilloutwritten surveys.Access to surveysmight also vary insofaras surveyors tend to be concentrated in larger urban areas rather than remote or rural locations. As respondents' capabilities vary, sowill theirresponses, and ifthecapabilities are not evenly distributed inpopulations of interest,theremay be selection bias in the responses. 3.3 Respondents?Consequences The incentives and capabilities of respondents can result in nonresponses, intentional misreporting, and selection bias. Overcoming these factors,where possible, will depend on giving respondentsmore resources and positive incentives forparticipation. Unfortu nately, changing incentives and capabilities is likely to involve expensive structuraland institutionalchange, and is thereforea complicated, long-termproblem. Selection bias can at least be compensated for by a range of statistical techniques and technical solutions, such as targetingsamples, but one has to be able to identify itfirst. 3.4 Data Collection Agencies?Incentives Data collection agencies include state statistical offices as well as private firmsandNGOs charged with producing statistics. The bureaucrats who staff these agencies may face internalorganizational incentives, or external political and economic incentives, such as support of IOs ormaterial gain. Internal organizational incentives may include factors as basic as professionalism. Agencies where bothworkers andmanagement care about professionalism and reputation will tend touphold international statisticalnorms. The quality ofwork will be higherwhen statisticianswant tobe recognized formeeting internationalprofessional standards.These professional norms are not insignificantconsidering thegenerally low status and low pay of public-sector statisticians around theworld?and may explain high-quality state statis tics in relatively poor countries such as Ecuador. Such high professionals standards are, alas, rarely the case. Since many governments are inept,corrupt,and venal, especially innondemocratic or poor countries,why would we expect their statistics departments to be substantially different? In other words, if the public sector inmost LDCs is dysfunctional, in large part because of the inability or unwillingness to discipline shirking,we ought to expect similar behavior in those parts of the public-sector bureaucracy responsible for collecting data. Such situations, where even theprincipals are engaged in shirking,may lead tounintentional errorsor incomplete data at best or intentionalmisreporting atworst. The integrityof a national statistical agency's data is also affectedby the independence of theagency from itsgovernment, usually theexecutive. Compared to the large literature on central bank independence, littleanalysis has been done on therelative independence of national statistical agencies. Historically, state statistics developed tomeet the specific needs of governments and hence were biased toward serving government goals. This problem of government pressure continues inmany countries, especially nondemocratic ones. In China, for instance, it is still quite difficult for public organizations to exist independently of theCommunist Party. Consequently, local party leaders are the direct superiors of local National Bureau of Statistics functionaries, making it difficult for statisticians to act independently of theParty's wishes. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 376 Yoshiko M. Herrera and Devesh Kapur Even indemocracies, state statisticsmay be subject topolitical pressure. In theUnited States, recent scandals over themanipulation of the costs of a prescription drug plan or intelligence on Iraq have called into question the independence of politically sensitive data. In federal states generally, subnational governments may have incentives tomis reportormanipulate data submitted to federal or national governments in order tomax imize transfersfrom thefederal government.10Censuses may be particularly prone to such pressures because inmany countries the allocation of state largesse, as well as political representation, is based on census data. In some cases, thepolitical implications of certain data may simply render data col lection impossible. Many countries omit census questions regarding ethnicityor religion due topotential political fallout over results: forexample, France does not ask the race or ethnicity of its citizens, and entire censuses have been stopped in countries such as Lebanon, Nigeria, and Pakistan because of fears that the results would favor certain groups. IOs can also offer incentives to skew data. Central banks and financeministries of countries undergoing an IMF program have an incentive tominimize their fiscal deficit data tomeet IMF program targets,whereas European Union members have a similar incentive tomeet theMaastricht criteria. 3.5 Data Collection Agencies?Capabilities The capabilities of data collection agencies primarily consist of human capital and finan cial resources fromgovernments, IOs, and scholarly researchers.Human capital is critical to the production of high-quality data. However, attracting high-quality individuals to work in government statistical agencies is a difficult task. Few would rank positions in state statistical agencies at the topof prestige hierarchies. InRussia, forexample, thebest statisticians (who have not gone towork for IOs) go to theMinistry of Finance or the Central Bank ratherthan theState Statistical Committee (Goskomstat). The latter's staffis overwhelmingly (90%) female, underscoring thewell-known link between gender and occupational status.Russia is not alone on this issue: Rawski (2000) cites theChinese case, where "the country's statistical agencies complain thatfirmsassign often untrained staffto compile statistics, look for chances to cut positions assigned to statisticians, and refuse to submit standard reports."And China ismuch better able to compel compliance thanmost other countries. In India's case, statisticians in the federal bureaucracy are recruited through an exam and interviewconducted by a statutoryautonomous body, theUnion Public Service Com mission. By any yardstick, thenumber of applicants taking exams for jobs in the federal government is extremely high (Table 2). However, as Table 2 indicates, in the case of the Indian Statistical Service, thenumber of applicants was the lowest and the application to-post ratio the second lowest. Furthermore, it was the only service where the recommendation-to-post ratiowas less thanone, implying thatqualified candidates were unavailable. If a country of a billion people which otherwise does not lack qualified professionals cannot findfiftyqualified statisticians annually to staffits statistical bureau cracy,what does that say about the statistical capabilities of other poor countries thatare much lesswell endowed? In addition to human capital, data collection agencies and especially state statistical offices compete for financial resources from governments, IOs, and researchers. More 10On incentives for revenue forecasts among U.S. states, seeWallack (2006). This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 377 Table 2 Statisticalcapabilities in thegovernmentof India No. of No. of Application-to-post Recommendation-to-post Service/exam posts applicants ratio ratio Civil Services3 411 309,507 753 1 IndianForest Service 32 44,098 1378 1 EngineeringServices 557 61,625 110 1 IndianStatisticalService 50 1,370 27 0.54 Geologist 148 3,647 25 1 CombinedMedical Services 327 31,374 96 1 Source. Union Public Service Commission, 51st Annual Report 2000-01, table following para. 2.7, p. 12.Note: we have omitted data for less important services. The Civil Service Exam recruits India's elite federal bureaucracy including the Indian Administrative Service, Indian Foreign Service, Indian Revenue Service, Indian Account, and Audit Service, etc. often thannot, statistical offices are underfunded.We know thatover the last two decades virtually all developing countries have undergone major financial and fiscal crises. W^hen fiscally strapped countries have to cut theirbudgets, what are they likelier to cut: politi cally sensitive subsidies or support for institutional infrastructure,such as statisticsdepart ments? Indeed, when cast in such stark terms, this seems like a rhetorical question. Consider thiscomment on the stateof support for the statistical systemof a countrywhose "statistical agencies were having tomake do with antiquated equipment, uncompetitive pay packages, and the elimination of less important (but stillvaluable) data series .... It was apparently easier [forthatcountry] to subsidize [its]mohair industry,which costmore than the additional funding requested by the statistical agencies, than to ensure adequate data" (Swonk 2000). The comment was made of thepolitical support for statistical offices in theUnited States.What thencan we expect of poorer countries? 3.6 Data Collection Agencies?Consequences Wlien we consider the incentives facing data collection agencies as well as thegenerally weak capabilities of such agencies in termsof human and financial resources, there are several potentially negative consequences: lack of data collection or incomplete collec tion,unintentional errors, intentionalmisreporting ormanipulation of data, and selection bias in responses. Lack of data collection or incomplete collection can be the result of a lack of resources, but these problems can also result fromexternal pressure, as a way to hide embarrassing information about a state. Unintentional errors in the collection or processing of data aremost likely tobe theresultof human or financial resource problems. Intentional misreporting and manipulation of data, however, are probably a result of external pressure. Incentives that result inmanipulation of data are especially manifest in those cases where thedata are both ameasure and a target.In pursing the target,themeasure?and the data?is invariably contaminated. Hoskin (1996) writes thatmeasures that are targets "precisely and systematically embody a conflation of the 'is' and the 'ought'; for their nature is simultaneously to describe and prescribe ... measures as targets also prescribe what ought tobe.". Consequently, when ameasure becomes a target,itoften ceases tobe the appropriatemeasure. This insight largely comes fromCharles Goodhart's analysis ofMargaret Thatcher's efforts to control inflation in Britain in the late 1970s by targeting themoney supply. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 378 Yoshiko M. Herrera and Devesh Kapur Goodhart argued that,although therewas a stable link between money supply and in flation, itmight not persist if the government were to tryto control themoney supply. Goodhart's Law states that "as soon as a particular instrumentor asset ispublicly defined as money in order to impose monetary control, itwill cease to be used as money and replaced by substitutes which will enable evasion of that control" (Goodhart 1989). In other words, when themeasure (money supply) became a target, it ceased to be a good measure (of inflation),breaking down the relationship between money supply and inflation. InChina, local bureaucracies are often charged with collecting data as well asmeeting targets setby theirpolitical principals, thereby increasing the likelihood that thedata are subject toGoodhart's Law. When Beijing established theobjective of 8% annual growthas a "great political responsibility," targeting themeasure (GDP growth) vitiated thatmea sure, resulting in the "winds of falsification" thataffected the country's statistical report ing system (Rawski 2000). For example, in 1997-1998 theaverage growth rate reportedby all 32 of China's provinces, main cities, and regions was 9.3%, even while the state statisticsbureau's GDP growth ratewas 7.8%! China is hardly an exception. Under IMF programs, fiscal deficits are a critical target and thereforeare becoming lessmeaningful as ameasure, as governments learn to game the target.In 1999 thenew government inPakistan discovered thattheprevious regime had fudged budget figures between 1997 and 1999 tomeet IMF program targets because budget deficits are a measure of the fiscal health of a country. In theEU, the rules of the Stability and Growth Pact were designed to ensure that countries had sustainable public finances. Any Euro-zone country reportinga deficit above 3% ofGDP risks a large fine. However, "since countries collect theirown numbers and report them to theEU, given the penalties of transgression, there is a clear incentive to cheat" (The Economist 2002), or to use such statistical sleight-of-hand as off-budget transactions, deferring lia bilities and so on. The point is thatthese actionsmay become more pronounced when there are targets, therebyundermining thevalidity of themeasures. Some of the data commonly used in political science are in fact such data-skewing targetsof governments and data collection agencies. Taking intoaccount the incentives on data quality when data are both ameasure and a targetgives us insight into thedirection of thebiases thatare likely to occur in such cases. When targetsare ceilings (such as fiscal deficits), thedata are likely tohave downward bias.When targetsare floors (such as social sector indicators), thedata are likely tobe biased upward. The quality of data can even be an indicator of thevariable under investigation.Given thatmany governments, especially those inLDCs, sufferfrom limited capacities and weak institutions,we would a priori expect data-collecting institutions in LDCs also to be weaker. The quality of data produced by such states' statistical institutionsmight suffer from the same limited institutionalcapacity as the states themselves. The weak capacity of statistical agencies raises problems of endogeneity. Far too frequently,data are treatedas exogenous to the problem being studied: in theirwork on governance indicators and institutionalquality, Kaufmann, Kraay, and Zoido-Lobaton (1999a, 1999b) do not con sider thatwhere governance and institutionalquality areweak, thequality of data is also likely tobe weak?hence affecting theirresults. 3.7 Ws/NGOs?Incentives IOs and NGOs play an importantrole in thecollection and distribution of data sets across countries. Internal organizational incentives, such as professional norms, are as important This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 379 for such entities as for state agencies, but IOs and NGOs are also subject topressure from their several donor states.Although theyare unlikely tobe pressured tomeet targetsby governments, they do need the cooperation of states in order to receive state-collected data. Sometimes thedata collection work of IOs isbiased toward supporting theconcerns of theirdonor states, as is the case with government debt data. The World Bank's "Global Development Finance" data set (formerlytheWorld Debt Tables) is an exhaustive resource for theexternal debt of developing countries, but itreflects inpart the interestsof creditor countries, which exercise greater influence on the institution.By contrast, internal debt data are stillmuch less easily available.11 Similarly, there is simply no comparison in thedata quality regarding the two principal cross-border trafficflows?capital and labor?the formerreflectingtheendowments of the capital-rich North and the latterof the labor-rich South. It is thereforehardly surprising thatdata on internationalmigration (labor) reflectmany weaknesses in data quality.12 Additionally, IOs and NGOs must secure the cooperation of states thatsupply data. Poor states thatproduce less data, states thatare at war or facing other kinds of devastation (drought,HIV/AIDS, etc.), and closed societies in general are all less likely to cooperate with IOs and NGOs by providing data or allowing them towork inside the country. 3.8 IOs/NGOs?Capabilities Like statedata collection agencies, thecapabilities of IOs andNGOs are primarily human and financial resources. IOs such as theUnited Nations (UN), IMF, andWorld Bank tend tohavemore resources thanNGOs, such asHuman RightsWatch orGreenpeace. But there is variation of course across these organizations in terms of both human and financial resources. 3.9 IOs/NGOs?Consequences The chief quality consequence for IOs and NGOs as data actors is a likely lack of data collection on topics not supported by donor states and forpoor or inaccessible countries. This can lead to selection bias in responses across countries, as UN development and poverty data show. In 2000, the largest ever gathering of heads of states adopted theUN Millennium Declaration aimed at advancing development and reducing poverty. It soon became apparent, however, thatmany member countries lacked data on development and poverty, and IOs did not have the capabilities to compensate for this glaring lacuna. A recentUN analysis of therelevant indicators found that"not only are theresignificantgaps forevery indicator, thereare also extensive problems in relevance, accuracy, consistency and reliability" (UNDP 2003, 35). The sheer number of countries where this is thecase is starkly illustrated inTable 3. 3.10 Academics?Incentives Finally, letus turn inward and look at political scientists as data actors susceptible to the same range of incentives and capabilities as other actors.All sortsof actors and situations have been studiedwith regard to the role of incentives,but rarelyhave we taken a critical gaze to the effectof incentives on academic research, particularlywith regard to our use 1 Evidence of thisproblem can be found in a recent paper by Brown and Hunter (1999), which uses debt service ratio as a variable but ignores internal debt because those data are not as easily available as external debt data. 12For a fuller discussion of data on international migration, see United Nations (2004). This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 380 Yoshiko M. Herrera and Devesh Kapur Table 3 Data gaps in basic human development indicators, 1990-2001 Indicator Countries lacking trend data Countries lacking any data Children underweightforage Net primary enrollment ratio Children reachinggrade five Births attendedby skilledhealth 100 46 96 22 17 46 personnel Female share of nonagricultural wage 100 19 employment Urban HIV prevalence among pregnant 51 41 women ages 15-24 Population with sustainable access to 100 91 an improved water source Population livingon less than$1 a day 62 100 18 55 Note. A country is defined as having trend data ifat least two data points are available, one in each half of the decade, and the two points are at least 3 years apart. Source: UNDP, 2003, Box 2.1. and construction of data sets. The relevant incentives for academic scholars consist pri marily of the following: rewards forpublication quantity, rewards for theoretical innova tion, rewards (or costs) for data collection and improvement, and support of other academics. This last incentive applies particularly tojunior (untenured) scholarswho need the support of senior faculty. It almost goes without saying that scholars at research institutionsare under intense pressure to publish theirwork. Getting tenure, remaining employed, and receiving pay raises at a research institutiondepend largely on the number and quality of a scholar's publications. Quality of publications matters, but thatquality is not judged on thebasis of the underlying data quality used in a publication. Instead, publication quality largely depends on the reputation of the journal or publisher and the theoretical contribution of thework, ratherthan theempirical contributionper se.As long as publication quantity and quality are judged primarily on thebasis of outlet reputation or theoretical contribution, there is little incentive to improve data quality. The incentives for new data collection or improving data quality are unfortunately rather limited.The costs of being attentive toquality indata are not trivial.Data collection and improvement are costly in time, skills, and financial resources.Moreover, the effort required to determine whether comparative data are trulycomparative or whether indi vidual elements do representwhat theypurport to, is substantial, and there is limited credit in tenureor review processes for those considered tobe merely data collectors or correc tors.The payoffs fordata quality improvement are high only if thenew and/or improved data set isused in some kind of innovative theoretical analysis. This means thatinorder to be recognized, those who work to improve data quality still have to do just as much theoretical or analytical work as thosewho do not botherwith minding data quality. Finally, academia is a community, and as such the supportof other scholars constitutes an important incentive in individual work. Scholars' need for support varies according to career stage. Junior (untenured) scholars have more pressure topublish and also aremore dependent on community support than senior scholars. Therefore, junior scholars have even less incentive todevote time to improvementof data quality, and junior scholars also have fewer incentives tobe critical of existing data sets, especially ifcriticismwould put This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions Improving Data Quality 381 them at odds with senior faculty.The tenureprocess might be defended as a response to this incentiveproblem, in thatiteventually gives scholars thefreedom both towork longer on improving data and to criticize each other's work. However, junior scholars who have most recentlydone fieldwork are the likeliest tohave freshempirical knowledge, yet they are the least likely to engage indebates over data quality. The people most qualified are thus the least likely to devote time to data quality improvements. 3.11 Academics?Capabilities Time, research funding,quantitative skills and technology, and existing data sets are the capabilities most inplay for scholars. Because of theirenormous expense, only a limited number of data set construction proposals will be funded. Fixing existing data sets?a less flashy task thancoming upwith somethingnew?would be substantially less likely tofind financial support.Unfortunately, although scholarsmay discover errors in existing work, thereare notmany low-cost options for correcting data errors. Today scholars can access more off-the-shelf and downloadable data sets than ever before. Such resources afford researchers access to informationabout many places in the world about which theymay not have specific area trainingor expertise. But the costs of in-depth fieldwork have not similarly declined, meaning that fieldwork remains quite expensive relative tooff-the-shelfdata sets.Given limited time and funding, freely avail able data sets can, and often do, substitute for new and/or improved data sets based on detailed fieldwork. And data sourced from reputable institutions (like the IMF,World Bank, OECD, theUN family,Polity, Freedom House, theMinorities at Risk project, or American National Election Studies) are all themore attractive because an institution's reputation gives thedata sets a badge of credibility. Finally, a researcher's skill level affects the typeof data and analysis thathe or she is capable of. In recent years, exogenous technological trendshave led to a steep drop in the price of tools forquantitative analysis, such as better and cheaper software and hardware. These user-friendly advances requireminimal statistical and mathematical training.The combination of new technology and greater availability of data setsmay be driving down the cost of quantitative analysis. Such trends, thoughwelcome, can also drive down the incentives and opportunities for improving data quality since researchersmay be at too great a remove from the nitty-gritty of the data's construction to effectively scrutinize it. 3.12 Academics?Consequences For academics, theworst consequence of our incentive and capability structures is the ongoing recycling of low-quality data and the failure toproduce new data of high quality. Obviously, political science researchwould be more valuable ifdata quality improved; this would require individual scholars to devote more of their limited time and resources to improvingdata quality rather thanproducing more publications fromexisting flawed data sets.Because theresources, including timeandmoney, thatgo intoa publication are limited, trade-offsmust bemade. Work devoted to theoreticalandmodel formulationand hypothesis testingusing off-the-shelfdata has tobe weighed against the time itwould take to improve thequality of a data setor tobettermatch measurable indicators toconcepts and dimensions. Inorder forresearchers tofocus on data quality, theirincentives and capabilities would have to change: theuse of high-quality data inpublications ought tobe rewarded, or at least it ought tobe rewardedmore highly than theuse of lower quality data. One problem with the current system of incentives is that the penalties for using low-quality data are small, and the costs of pointing out errors in data usage are high. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 382 Yoshiko M. Herrera and Devesh Kapur If a researcher devotes his or her time to refuting the findings of a published article by using betterdata, thechance of publication (or benefit) is relatively high, but so too is the cost, because ittakes a lotof timeand effortto replicate and/ordisprove results.Moreover, it is hardly a disgrace tobe challenged empirically by futurework; indeed it is a sign of interestinone's research agenda. Thus, thedownside (or sanctions) forusing low-quality data is rather limited. In addition, some incentives for low-quality data use seem tobe self-reinforcing.The more scholars thatuse existing flawed data sets, themore likely such data sets are likely to be used by others. In otherwords, data are used because theyare used?and thedata sets, problematic or not, become acceptable by repetition.Using reputable institutions only shifts the locus of theproblem. The reputation of a prestigious data collection organiza tion,such as those cited above, may actually reduce the incentives for scrutinizing thedata: should therebe any problems with the data, the data-collecting institutions,rather than individual researchers, would bear thebrunt of the criticism. 4 Conclusion Modern political science is data driven. Ifpolitical scientists and institutionaldata actors were not tryingtoexplain real outcomes, thendata quality might not be so important.But to the extent thatwe are tryingto develop and test theories about outcomes, data are the fundamental basis forour enterprise.We should expect thatfundamental changes in the quality of informationproduced by political scientists, governments, and IOs would have substantial effects on public policy. Some have asked, are bad data better thanno data? We reject thiseither/orchoice. "No data" or "bad data" are not theonly choices because scholars need not be complacent with the statusquo, and improvementof data sets is a continuous task.And thus,thebest isnot theenemy of thegood. There were, are, and always will be shortcomings and limitations in data sets, and thecosts of poor data must be traded-offagainst theopportunity costs of the effortrequired to improve thedata. However, a focus on lowering thecosts of data quality and changing the incentives for improving data quality will make higher quality data a likelier norm for the future. Our conclusion is by no means thatquantitative analysis based on large-Afdata sets should be limited or thatdata sets are inherentlyor irreconcilably flawed. Indeed, quan titativeand statistical research isnecessary for testingand improvingdata as well as testing theories.13We have pointed out problems in data quality and studied data actors' incen tives and capabilities inorder to suggestmechanisms for improvement of data sets,while at the same time discouraging continued use of overly troubled data sets. In summary,we offer five broad suggestions: (1) encourage the production and dis semination of the growing literatureon data quality and methods for improvement; (2) consider betterways to use data sets known to be flawed; (3) consider incentives as an instrumentfor improving data quality; (4) consider ways to lower the costs of producing high-quality data; and (5) consider institutional solutions to solve certain collective action problems related todata quality. One example is the discussion that followed the publication in 1996 of theDeininger and Squire dataset on income inequality by theWorld Bank. When the theorized relationship between economic growth and inequality using these dataset did not hold up, scholars scrutinized the dataset itself, calling into question certain measures. This in turnprompted furtherrefinements of the data, as well as allowing for further testing of the theoretical relationship between inequality and other outcomes. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 383 There is a certain irony in the fact thatmethodology is a high-prestige area of political science and thata lotofwork is devoted to improvingmethods, but thatwork onmethods does not necessarily translate into improved everyday use of data. We believe thatgreater attention to the existing literature thatevaluates data sets and tomethodological issues concerning theuse of data setswould be a step in the rightdirection. As a firststep, researchers should examine data sets.Researchers can subject data sets to some simple "smell-tests" by asking a number of questions: Who created thedata?What incentives and capabilities were they subject to?Were theyan independent agency?Were theygoverned by an external actorwith a stake in thedata? Subjecting thedata to these questions will make the user more aware of possible quality problems with the data.14 When data sets do have problems in theirconstruction,we can at least be more circum spect about how we use them. A second, related, recommendation is thatwe need toconsiderways around bad data, that is,ways tobetteruse data sets thatare known tobe flawed.15 For example, ifa data set for a particular dependent variable is flawed (due to validity issues, coverage, or errors), re searchers should look for other observable implications of the argument and test those relationships.Triangulating byway ofmultiple testsof an argumentusing differentdata sets would be better andmake amore convincing case thanrelyingon one testusing poor data. In addition, as we pointed out in the section on validity above, selected data need tobe appropriate fora given dimension of a concept under study.However, a positive implica tion of this obligation tomatch data with appropriate dimensions and concepts is that although a particular data setmight be inappropriate formeasuring a particular dimension or concept, itmay well be appropriate for another dimension or concept. Thus, just because a data set is flawed for one purpose, it is not necessarily of no use for other purposes. The same logic applies to data setswith known biases. Researchers can make use of these biases, if they are acknowledged. If a significant relationship thatworks against thebias is found,we might bemore confidentof theresultsbecause thebias would move the results toward the null hypothesis. Rather than treatingbiased data sets as unusable, in the absence of improved data (which we discuss below), researchers should thinkabout how tomake use of flaws or biases inways thatmight strengthenthevalidity of results. In otherwords, because measurement error is so endemic, part of the solution to dealing withmeasurement error is learninghow todeal with itmore sensibly.Although we do not have space to review thegrowing literatureon statistical fixes forflawed data sets (many ofwhich have appeared in thepages of thisjournal), those technical solutions are another obvious place to look forways to deal with flawed data sets. A thirdrecommendation is thatas a scholarly community,we must pay closer attention to incentives.Rather than treatingdata quality problems as an unfortunate result of igno rance or incompetence, we need to consider the incentives facing respondents, statistical offices, IOs, and scholars when theyproduce data. Given thedegree towhich researchers analyze the effects of incentives, theirown supply elasticity of effortwith respect to the incentives they themselves facemight be presumed tobe fairlyhigh.16We suggest thatthe focus should be on ways to change these incentives to improve data quality. 14Forexample, since Becker's seminal article on crime (Becker 1968), researchers using officially reported crime statistics have had to be attentive to a number of quality issues. Errors due to underreporting by victims and underrecording by police may ormay not be normally distributed, and an attentive researcher should check to see whether errors are systematically related to explanatory variables. Similarly, rather than relying only on one data source, researchers could compare data from a number of sources and consider the competency and independence of those sources. 15We thank one of the anonymous reviewers for bringing this point to our attention. 16The issue has been emphasized by Cheibub (1999) andWidner (1999). This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 384 Yoshiko M. Herrera and Devesh Kapur The academic community as a whole needs to consider ways of lowering the costs of data quality. Increasing transparencyand availability of thedetails of data sets, including coding, is a way to at least enable users to engage thedata critically.17With more people able torecognize a data set's problems, thecosts of improving thedata set can be reduced. A few journals now mandate thatauthorsmake theirdata sets available upon request to readers. This is a positive development, but thereare only minimal enforcementmecha nisms for such rules. If authors fail toprovide data or provide it in a form that is not very usable, theburden falls on the reader topursue action. If journals, on theother hand, made thedata sets available on theirWeb sites, then it would be less costly for individual researchers to check and hopefully improve thequality of data sets. Additionally, a relatively low-cost error-revelation mechanism such as a "letters to the editor" section could be adopted by journals. International Security, for example, already has this inplace. The proliferation of suchmechanisms would have two effects: theywould increase incentives for authors to attend todata quality by increasing the likelihood of being publicly criticized, and theywould provide other scholars with important information regarding data errors, thus improving quality in futurework with the same data sets. Institutions also have roles to play in changing incentives. Small-scale institutional changes would include supportingmore forums for error discussion and greater trans parency. On a larger scale,major research funding agencies such as theNSF or theWorld Bank and UN need tomake data quality a priority.Data quality in large grants could be improved if therewere funding specifically earmarked for cleaning up existing or newly collected data sets andmaking themmore widely accessible. Although theNSF does have an archiving requirement, it is not systematically enforced. Rather than the archiving component constitutinga separate part of thegrant, scholars have to take funds from some other part of theirgrant towork on fulfilling the archiving task,meaning theyhave less incentive todo so. The APS A needs to take a leading role inadvocating and perhaps codifying higher data quality norms. APS A as an institution might be able to overcome collective action prob lems among field and subfield sections, as well as among individual scholars. Given the importance of cross-country data sets, and the considerable scope for improving data comparisons across countries,we believe thatdebates regarding themerits of area studies versus cross-national large-Afstudies need to shift toward the collaborative possibilities between the two rather than the focus on competition. Jointwork between area specialists as well asmethodologists can considerably enhance thequality of cross-national data sets. However, there are considerable collective action problems inherent in organizing such efforts.APSA or otherumbrella institutionsmay be able play a leadership role by support ingpartnerships between area specialists andmethodologists to improve existing data sets. Finally, and on a more positive note, we wish to draw attention to some promising developments inrecent yearswith regard tochanging the incentive structuresforresearch ers inconstructingdata sets.The Comparative Politics section ofAPSA, forexample, now offers an award fordata sets, and theComparative Politics newsletter reviews new data 17The State Department analysis of terrorismprovides a textbook case of how transparency of coding rules and availability of data can improve data quality. In April 2004, the State Department issued a report entitled "Patterns of Global Terrorism," claiming terrorist attacks had declined in recent years. Using the State Department's own guidelines which accompanied the report,Alan Krueger and David Laitin reviewed these data and found that "significant" terroristattacks had actually risen between 2002 and 2003. They published this review of thedata in an op-ed piece in theWashington Post and in an article inForeign Affairs. In response, the State Department admitted that the reportwas wrong. For additional analysis of theState Department report, as well as recommendations for improving U.S. government data, see Krueger and Laitin (2004). This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions ImprovingData Quality 385 sets. In addition, a relatively new section of APSA, theQualitative Methods section, is largely oriented toward taking empirical work, including the content of data sets,more seriously.And therehave recentlybeen a risingnumber of panels at professional meetings devoted to the consideration of the quality of data sets on a range of topics including ethnicity,democracy, and war. There are growing signs that institutionalmechanisms for changing scholars' incentives?that is, reducing costs forproducing high-quality data, and increasing rewards forusing high-quality data?are underway. There aremany more ways thatdata quality can be improvedwhich we have not had space todiscuss here.We have endeavored tooutline some problems with data quality and also todevelop an explanation for thepersistence of thisproblem, focused inparticular on the incentives and capabilities among thedata producers and users. Our goal has been to encourage furtherdebate and serious consideration of thequality of political science data. References Abdelal, Rawi, Yoshiko Herrera, Alastair I. Johnston, and Rose McDermott. 2006. Identity as a variable. Perspectives on Politics 4:695-711. Achen, Christopher. 1985. Proxy variables and incorrect signs on regression coefficients. Political Methodology 11:288-316. Adcock, Robert, and David Collier. 2001. Measurement validity: A shared standard for qualitative and quan titative research. American Political Science Review 95:529-46. Aiyar, Swamininathan Anklesaria. 2001. Poverty-stricken statistics. Economic Times, September 1, 2001. Becker, Gary. 1968. Crime and punishment: An economic approach. Journal ofPolitical Economy 76:169-217. Berkowitz, Daniel, Katharina Pistor, and Jean-Francois Richard. 2003. Economic development, legality, and the transplant effect. European Economic Review 47:165-95. Brown, David, andWendy Hunter. 1999. Democracy and social spending inLatin America. American Political Science Review 93:779-90. Bueno de Mesquita, Bruce. 1981. The war trap. New Haven, CT: Yale University Press. Cederman, Lars-Erik, and Luc Girardin. 2005. Beyond fractionalization: Mapping ethnicity onto nationalist insurgencies. American Political Science Reviews 101:173-85. Chander, Ramesh. 1988. Strengthening information systems in SSA. Washington, DC: World Bank. Chandra, Kanchan, ed. 2001. Symposium: Cumulative findings in the study of ethnic politics 2001. APSA-CP Newsletter 12(l):7-25. Chandra, Kanchan, Rachel Giffelquist, Daniel Metz, Chris Wendt, and Adam Ziegfeld. 2005. A constructivist dataset on ethnicity and institutions. In Identity as a variable, ed. Rawi Abdelal, Yoshiko Herrera, Alastair Ian Johnston and Rose McDermott. New York: New York University. Cheibub, Jose Antonio. 1999. Data optimism in comparative politics: The importance of being earnest. APSA CP 10(2):21-5. Collier, David, and Robert Adcock. 1999. Democracy and dichotomies: A pragmatic approach to choices about concepts. Annual Review of Political Science 2:537-65. Coppedge, Michael. 2002. Democracy and dimensions: Comments on Munck and Verkuilen. Comparative Political Studies 35:35-9. Gleditsch, Kristian S., andMichael D. Ward. 1997. Double take: A reexamination of democracy and autocracy inmodern polities. The Journal of Conflict Resolution 41:361-83. Goertz, Gary. 2005. Social science concepts: A user's guide. Princeton, NJ: Princeton University Press. Goodhart, Charles. 1989. Money, information and uncertainty. 2nd ed. Cambridge, MA: MIT Press. Hoskin, Keith. 1996. The 'awful idea of accountability': Inscribing people into themeasurement of objects. In Accountability: Power, ethos, and the technologies of managing, ed. Rolland Munro and JanMouritsen, 265-82. London: Thomson International. Kapur, Devesh, JohnP. Lewis, and Richard Webb. 1997. The World Bank: Itsfirst half century.Washington, DC: Brookings Institution. Kaufmann, D., A. Kraay, and M. Mastruzzi. 2002. Governance matters III: Governance indicators for 1996-2002. World Bank Policy Research Working paper 3106. Kaufmann, D., A. Kraay, and P. Zoido-Lobaton. 1999a. Aggregating governance indicators. World Bank Working paper 2195. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions 386 Yoshiko M. Herrera and Devesh Kapur -. 1999b. Governance matters. World Bank Working paper 2196. King, Gary, Christopher J.L. Murray, Joshua A. Salomon, and Ajay Tandon. 2004. Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review 98: 191-207. King, Gary, and JonathanWand. 2007. Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis 15:46-66. Krueger, Alan, and David Laitin. 2004. Misunderestimating terrorism.Foreign Affairs 83(5):8-13. Kynge, James. 1999. China uncovers falsified accounts at state groups. Financial Times, December 24, 1999. Laitin, David, and Daniel Posner. 2001. The implications of constructivism for constructing ethnic fractional ization indices. APSA-CP Newsletter 12(1): 13-17. Marshall, Monty G., Ted Robert Gurr, Christian Davenport, and Keith Jaggers. 2002. Polity IV, 1800-1999: Comments on Munck and Verkuilen. Comparative Political Studies 35:40-5. Mishler, William, and Richard Rose. 2001. Political support for incomplete democracies: Realist vs. idealist theories and measures. International Political Science Review 22:303-20. Munck, Gerardo L., and JayVerkuilen. 2002a. Conceptualizing and measuring democracy: Evaluating alterna tive indices. Comparative Political Studies 35:5-34. -. 2002b. Generating better data: A response to discussants. Comparative Political Studies 35:52-7. Nagraj, R. 1999. How good are India's industrial statistics? An exploratory note. Economic and Political Weekly 34:350-5. Posner, Daniel N. 2004. Measuring ethnic fractionalization inAfrica. American Journal of Political Science 48:849-63. Rawski, Thomas G. 2000. China by the numbers: How reform affected Chinese economic statistics. http://www.pitt.edu/~tgrawski/parjere2()00/REVI^.HTM (accessed July 26, 2005). Rose, Richard. 2002/2003. Economies in transition: A multidimensional approach to a cross-cultural problem. East European Constitutional Review 11/12(4/1):62-70. Rozanski, J.,and A. Yeats. 1994. On the (in)accuracy of economic observations: An assessment of trends in the reliability of international trade statistics. Journal ofDevelopment Economics 44:103-30. Slantchev, Branislav L. 2004. How initiators end theirwars. American Journal of Political Science 48:813-29. Srinivasan, T. N. 1994. Data base for development analysis: An overview. Journal ofDevelopment Economics 44:3-27. Swonk, Diane. 2000. The value of good data. Financial Times, September 27, 2000. The Economist. 2002. Roll over, Enron. August 3, p. 44. Treier, Shawn, and Simon Jackman. 2006. Democracy as a latent variable, http://www.tc.umn.edu/~satreier/ Democracy AsLatentVariable_041906.pdf. United Nations. 2004. Current status of the collection of international migration statistics. World Economic and Social Survey. New York: United Nations, 211-7. United Nations Development Programme (UNDP). 2003. Human development report, 2003. New York: Oxford University Press. Velkoff, Victoria A., and Jane E. Miller. 1995. Trends and differentials in infantmortality in the Soviet Union, 1970-90: How much is due tomisreporting? Population Studies 49:241-58. Wallack, Jessica. 2006. The highs and lows of revenue estimating: Explaining bias and inaccuracy. San Diego, CA: University of California, http://irpshome.ucsd.edu/faculty/jwallack/revest_7_2006.pdf. Ward, Michael D. 2002. Green binders in cyberspace: A modest proposal. Comparative Political Studies 35:46-51. Watson, Reg, and Daniel Pauly. 2001. Systematic distortions inworld fisheries catch trends.Nature 414:534-6. White, Halbert. 1994. Estimation, inference, and specification analysis. New York: Cambridge University Press. Widner, Jennifer. 1999. Maintaining our knowledge base. APSA-CP 10(2): 17-21. Wilkinson, Steve. 2002. Memo on developing better indicators of ethnic and non-ethnic identities, http:// www.duke.edu/web/licep/5/wilkinson/wilkinson.pdf (accessed July 26, 2005). Wong, George Y, andWilliam M. Mason. 1991. Contextually specific effects and other generalizations of the hierarchical linear model for comparative analysis. Journal of the American Statistical Association 86: 487-503. World Bank. 2000. India: Policies to reduce poverty and accelerate sustainable development. Report No. 19471 IN.Washington, DC: World Bank. Yeats, Alexander. 1990. On the accuracy of African observations: Do Sub-Saharan trade statistics mean anything? World Bank Economic Review 2:135-56. This content downloaded from 142.103.160.110 on Mon, 12 Jan 2015 23:56:33 PM All use subject to JSTOR Terms and Conditions