CHAPTER 7 The Logic of Sampling Now you'll see how social scientists can select a few people for study—and discover things that apply to hundreds of millions of people not studied. Introduction A Brief History of Sampling President Alf Landon President Thomas E. Dewey Two Types of Sampling Methods Nonprobability Sampling Reliance on Available Subjects Purposive or Judgmental Sampling Snowball Sampling Quota Sampling Selecting Informants The Logic and Techniques of Probability Sampling Conscious and Subconscious Sampling Bias Representativeness and Probability of Selection Random Selection Probability Theory, Sampling Distributions, and Estimates of Sampling Error Populations and Sampling Frames Review of Populations and Sampling Frames Types of Sampling Designs Simple Random Sampling Systematic Sampling Stratified Sampling Implicit Stratification in Systematic Sampling Illustration: Sampling University Students Sample Modification Multistage Cluster Sampling Multistage Designs and Sampling Error Stratification in Multistage Cluster Sampling Probability Proportionate to Size (PPS) Sampling Disproportionate Sampling and Weighting Probability Sampling in Review The Ethics of Sampling Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Learning Objectives After studying this chapter, you will be able to . . . • HighLight some of the key events in the development of sampling in social, research. • Describe what is meant by "nonprobabiLity sampLing" and identify severaL techniques. • Identify and expLain the key eLements in probability sampling. • Explain the relationship between populations and sampling frames in social research. • Identify and describe several types of probability sampling designs. • Describe the steps involved in selecting a multistage cluster sample. • Discuss the key advantages of probability sampling. • Explain how the sampling design of a study could have ethical implications. Introduction One of the most visible uses of survey sampling lies in the political polling that is subsequently tested by election results. Whereas some people doubt the accuracy of sample surveys, others complain that political polls take all the suspense out of campaigns by foretelling the result. Going into the 2008 presidential elections, pollsters were in agreement as to who would win, in contrast to their experiences in 2000 and 2004, which were closely contested races. Table 7-1 reports polls conducted during the few days preceding the election. Despite some variations, the overall picture they present is amazingly consistent and pretty well matches the election results. Now, how many interviews do you suppose it took each of these pollsters to come within a couple of percentage points in estimating the behavior of more than 131 million voters? Often fewer than 2,000! In this chapter, we're going to find out how social researchers can achieve such wizardry. In the 2016 presidential election, the preelection polls again clustered closely around the actual popular votes for Hillary Clinton and Donald Trump. Most correctly predicted that Secretary Clinton would win the popular vote by 2 or 3 percentage points. Of course, the president is not elected by the nation's overall popular vote, but by the electoral college, determined by how the votes go in the individual states. Relatively small victories totaling 107,000 votes in three swing states—Michigan, Pennsylvania, Wisconsin—gave Trump all those states' electoral votes, and the presidency, while Clinton won the popular vote by 2.8 million (Washington Post 2016). FiveThirtyEight.com offers a useful analysis and rating of the many polling companies active in forecasting political outcomes. TABLE 7-1 Election-Eve Polls Reporting Presidential Voting Plans, 2008 Poll Date Ended Obama McCain Fox Nov 2 54 46 NBC/WSJ Nov 2 54 46 Marist College Nov 2 55 45 Harris Interactive Nov 3 54 46 Reuters/C-SPAN/Zogby Nov 3 56 44 ARG Nov 3 54 46 Rasmussen Nov 3 53 47 IBD/TIPP Nov 3 54 46 DailyKos.com/Research 2000 Nov 3 53 47 GWU Nov 3 53 47 Marist College Nov 3 55 45 Actual vote Nov 4 54 46 Note: For simplicity, since there were no "undecideds"in the official results and each of the third-party candidates received less than one percentage of the vote, I've apportioned the undecided and other votes according to the percentages saying they were voting for Obama or McCain. Source: Poll data are adapted from http://www.pollster.com/polls/us/08-us -pres-ge-mvo.php. The official election results are from the Federal Election Commission, http://www.fec.gov/pubrec/fe2008/2008presgeresults.pdf. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. What do you think? Chapter 7: The Logic of Sampling "189 In 1936, the Literary Digest collected the voting intentions of 2 million voters in order to predict whether Franklin D. Roosevelt or Alt Landon would be elected president of the United States. During more-recent election campaigns, with many more voters going to the polls, national polling firms have typically sampled around 2,000 voters across the country. Which technique do you think is the most effective? Why? See the What do you think?... Revisited box toward the end of the chapter. For another powerful illustration of the potency of sampling, look at Figure 7-1 for a graph of then-president George W. Bush's approval ratings prior to and following the September 11, 2001, terrorist attacks on the United States. The data reported by several different polling agencies describe the same pattern. 1 CO > o o. a < Political polling, like other forms of social research, rests on observations. But neither pollsters nor other social researchers can observe everything that might be relevant to their interests. A critical part of social research, then, is deciding what to observe and what not to observe. If you want to study voters, for example, which voters should you study? 00 90 80 70 60 50 40 Before After September 11 th attack ^ September 11 th attack 2$?- 0%^V>«- -f~Z~9-♦-W-M -O ♦ - . • .. _I_I_I_I_I_I_I_I_I_I_I_I_I_I_I_I_I_I_I_L flO ^ ft # # # & x# ^ # # & # ^ # # V_„ „_______A_ 2001 2002 Date Key: ♦ ABC/Post ■ CBS • Harris A. Ipsos-Reid ♦ Pew A Bloomberg ♦ Fox • IBD/CSM A NBC/WSJ O AmResGp A CNN/Time • Gallup O Zogby ■ Newsweek FIGURE 7-1 Bush Approval: Raw Poll Data. This graph demonstrates how independent polls produce the same picture of reality. It also shows the impact of a national crisis on the president's popularity: in this case, the 9/11 terrorist attack and then-president George W. Bush's popularity. Source: drlimerick.com Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 190 ■ Part Two The process of selecting observations is called sampling. Although sampling can mean any procedure for selecting units of observation—for example, interviewing every tenth passerby on a busy street—the key to generalizing from a sample to a larger population is probability sampling, which involves the important idea of random selection. Much of this chapter is devoted to the logic and skills of probability sampling. This topic is more rigorous and precise than some of the other topics in this book. Whereas social research as a whole is both art and science, sampling leans toward science. Although this subject is somewhat technical, the basic logic of sampling is not difficult to understand. In fact, the logical neatness of this topic can make it easier to comprehend than, say, conceptualization. Although probability sampling is central to social research today, we'll also examine a variety of nonprobability methods. These methods have their own logic and can provide useful samples for social inquiry. Before we discuss the two major types of sampling, I'll introduce you to some basic ideas by way of a brief history of sampling. As you'll see, the pollsters who correctly predicted recent elections have done so in part because researchers had learned to avoid some pitfalls that earlier pollsters had discovered "the hard way." A Brief History of Sampling Sampling in social research has developed hand in hand with political polling. This is the case, no doubt, because political polling is one of the few opportunities social researchers have to discover the accuracy of their estimates. On election day, they find out how well or how poorly they did. President Alf Landon President Alf Landon? Who's he? Did you sleep through an entire presidency in your U.S. history class? No—but Alf Landon would have been president if a famous poll conducted by the Literary Digest had proved to be accurate. The Literary Digest was a popular newsmagazine Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. published between 1890 and 1938. In 1916, Digest editors mailed postcards to people in six states, asking them whom they were planning to vote for in the presidential campaign between Woodrow Wilson and Charles Evans Hughes. Names were selected for the poll from telephone directories and automobile registration lists. Based on the postcards sent back, the Digest correctly predicted that Wilson would be elected. In the elections that followed, the Literary Digest expanded the size of its poll and made correct predictions in 1920, 1924, 1928, and 1932. In 1936 the Digest conducted its most ambitious poll: 10 million ballots were sent to people listed in telephone directories and on lists of automobile owners. Over 2 million people responded, giving the Republican contender, Alf Landon, a stunning 57 to 43 percent landslide over the incumbent, President Franklin Roosevelt. The editors modestly cautioned, We make no claim to infallibility. We did not coin the phrase "uncanny accuracy" which has been so freely applied to our Polls. We know only too well the limitations of every straw vote, however enormous the sample gathered, however scientific the method. It would be a miracle if every State of the forty-eight behaved on Election Day exactly as forecast by the Poll. (Literary Digest 1936a: 6) Two weeks later, the Digest editors knew the limitations of straw polls even better: The voters gave Roosevelt a second term in office by the largest landslide in history, with 61 percent of the vote. Landon won only 8 electoral votes to Roosevelt's 523. The editors were puzzled by their unfortunate turn of luck. Part of the problem surely lay in the 22 percent return rate garnered by the poll. The editors asked, Why did only one in five voters in Chicago to whom the Digest sent ballots take the trouble to reply? And why was there a preponderance of Republicans in the one-fifth that did reply?... We were getting better cooperation in what we have always regarded as a public service from Republicans than we were getting from Democrats. Do Republicans live nearer to mailboxes? Do Democrats generally disapprove of straw polls? (Literary Digest 1936b: 7) ■ in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). engage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 7: The Logic of Sampling "191 Actually, there was a better explanation— and it lay in what is technically called the sampling frame used by the Digest. In this case the sampling frame consisted of telephone subscribers and automobile owners. In the context of 1936, this design selected a disproportionately wealthy sample of the voting population, especially coming on the tail end of the worst economic depression in the nation's history. The sample effectively excluded poor people, and the poor voted predominantly for Roosevelt's New Deal recovery program. The Digest's poll may or may not have correctly represented the voting intentions of telephone subscribers and automobile owners. Unfortunately for the editors, it decidedly did not represent the voting intentions of the population as a whole. President Thomas E. Dewey The 1936 election also saw the emergence of a young pollster whose name would become synonymous with public opinion. In contrast to the Literary Digest, George Gallup correctly predicted that Roosevelt would beat Landon. Gallup's success in 1936 hinged on his use of something called quota sampling, which we'll examine later in the chapter. For now, it's enough to know that quota sampling is based on a knowledge of the characteristics of the population being sampled: the proportion of men, the proportion of women, that proportions of various incomes, ages, and so on. Quota sampling selects people to match a set of these characteristics: the right number of poor, white, rural men; the right number of rich, African American, urban women; and so on. The quotas are based on those variables most relevant to the study. In the case of Gallup's poll, the sample selection was based on levels of income; the selection procedure ensured the right proportion of respondents at each income level. Gallup and his American Institute of Public Opinion used quota sampling to good effect in 1936, 1940, and 1944—correctly picking the presidential winner each time. Then, in 1948, Gallup and most political pollsters suffered the embarrassment of picking Governor Thomas Dewey of New York Basing its decision on early political polls that showed Dewey leading Truman, the Chicago Tribune sought to scoop the competition with this unfortunate headline. over the incumbent, President Harry Truman. The pollsters' miscue continued right up to election night. A famous photograph shows a jubilant Truman—whose followers' battle cry was "Give 'em hell, Harry!"—holding aloft a newspaper with the banner headline "Dewey Defeats Truman." Several factors accounted for the pollsters' failure in 1948. First, most pollsters stopped polling in early October, despite a steady trend toward Truman toward the end of the campaign. In addition, many voters were undecided throughout the campaign, and they went disproportionately for Truman when they stepped into the voting booth. More important, Gallup's failure rested on the unrepresentativeness of his samples. Quota sampling—which had been effective in earlier years—was Gallup's undoing in 1948. This technique requires that the researcher know something about the total population (of voters, in this instance). For national political polls, such information came primarily from census data. By 1948, however, World War II had produced a massive movement from the country to cities, radically changing the character of the U.S. population from what the 1940 census showed, and Gallup relied on 1940 census data. City dwellers, moreover, tended to vote Democratic; hence, the over-representation of rural voters in his poll had the effect of underestimating the number of Democratic votes. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 192 ■ Part Two Two Types of Sampling Methods By 1948 some academic researchers had already been experimenting with a form of sampling based on probability theory. This technique involves the selection of a "random sample" from a list containing the names of everyone in the population being sampled. By and large, the probability-sampling methods used in 1948 were far more accurate than quota-sampling techniques. Today, probability sampling remains the primary method of selecting large, representative samples for social research, including national political polls. At the same time, probability sampling can be impossible or inappropriate in many research situations. Accordingly, before turning to the logic and techniques of probability sampling, we'll first take a look at techniques for nonprobability sampling and how they're used in social research. Nonprobability Sampling Social research is often conducted in situations that do not permit the kinds of probability samples used in large-scale social surveys. Suppose you wanted to study homelessness: There is no list of all homeless individuals, nor are you likely to create such a list. Moreover, as you'll see, there are times when probability sampling would not be appropriate even if it were possible. Many such situations call for nonprobability sampling. In this section, we'll examine four types of nonprobability sampling: reliance on available subjects, purposive or judgmental sampling, snowball sampling, and quota sampling. We'll conclude with a brief discussion of techniques for obtaining information about social groups through the use of informants. -1 nonprobability sampling Any technique in which samples are selected in some way not suggested by probability theory. Examples include reliance on available subjects as well as purposive (judgmental), snowball, and quota sampling. Reliance on Available Subjects Relying on available subjects, such as stopping people at a street corner or some other location, is sometimes called "convenience" or "haphazard" sampling. This is a common method for journalists in their "person-on-the-street" interviews, but it is an extremely risky sampling method for social research. Clearly, this method does not permit any control over the representativeness of a sample. It's justified only if the researcher wants to study the characteristics of people passing the sampling point at specified times or if less risky sampling methods are not feasible. Even when this method is justified on grounds of feasibility, researchers must exercise great caution in generalizing from their data. Also, they should alert readers to the risks associated with this method. University researchers frequently conduct surveys among the students enrolled in large lecture classes. The ease and frugality of this method explains its popularity, but it seldom produces data of any general value. It may be useful for pretesting a questionnaire, but such a sampling method should not be used for a study purportedly describing the student body as a whole. Consider this report on the sampling design in an examination of knowledge and opinions about nutrition and cancer among medical students and family physicians: The fourth-year medical students of the University of Minnesota Medical School in Minneapolis comprised the student population in this study. The physician population consisted of all physicians attending a "Family Practice Review and Update " course sponsored by the University of Minnesota Department of Continuing Medical Education. (Cooper-Stephenson and Theologides 1981:472) After all is said and done, what will the results of this study represent? They do not provide a meaningful comparison of medical students and family physicians in the United States or even in Minnesota. Who were the physicians who attended the course? We can guess that they were probably more concerned about their continuing education than were other physicians, but we can't say for sure. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 7: The Logic of Sampling "193 Although such studies can provide useful insights, we must take care not to overgeneralize from them. Purposive or Judgmental Sampling Sometimes it's appropriate to select a sample on the basis of knowledge of a population, its elements, and the purpose of the study. This type of sampling is called purposive sampling (or judgmental sampling). In the initial design of a questionnaire, for example, you might wish to select the widest variety of respondents to test the broad applicability of questions. Although the study findings would not represent any meaningful population, the test run might effectively uncover any peculiar defects in your questionnaire. This situation would be considered a pretest, however, rather than a final study. In some instances, you may wish to study a small subset of a larger population in which many members of the subset are easily identified, but the enumeration of them all would be nearly impossible. For example, you might want to study the leadership of a student-protest movement; many of the leaders are visible, but it would not be feasible to define and sample all leaders. In studying all or a sample of the most visible leaders, you may collect data sufficient for your purposes. Or let's say you want to compare left-wing and right-wing students. Because you may not be able to enumerate and sample from all such students, you might decide to sample the memberships of left- and right-leaning groups, such as the Green Party and the Young Americans for Freedom. Although such a sample design would not provide a good description of either left-wing or right-wing students as a whole, it might suffice for general comparative purposes. Field researchers are often particularly interested in studying deviant cases—those do not fit into patterns of mainstream attitudes and behaviors—in order to improve their understanding of the more usual pattern. For example, you might gain important insights into the nature of school spirit, as exhibited at a pep rally, by interviewing people who did not appear to be caught up in the emotions of the crowd or by interviewing students who did not attend the rally at all. Selecting deviant cases for study is another example of purposive study. In qualitative research projects, the sampling of subjects may evolve as the structure of the situation being studied becomes clearer and certain types of subjects seem more central to understanding than others. Let's say you're conducting an interview study among the members of a radical political group on campus. You may initially focus on friendship networks as a vehicle for the spread of group membership and participation. In the course of your analysis of the earlier interviews, you may find several references to interactions with faculty members in one of the social science departments. As a consequence, you may expand your sample to include faculty in that department and other students that they interact with. This is called "theoretical sampling," since the evolving theoretical understanding of the subject steers the sampling in certain directions. Snowball Sampling Another nonprobability-sampling technique, which some consider to be a form of accidental sampling, is called snowball sampling. This procedure is appropriate when the members of a special population are difficult to locate, such as homeless individuals, migrant workers, or undocumented immigrants. In snowball sampling, the researcher collects data on the few members of the target population he or she can locate, then asks those individuals to provide the information needed to locate other members of that population whom they happen to know. "Snowball" refers to the process of accumulation as each located subject suggests other subjects. Because this procedure also results in samples with questionable representativeness, it's used primarily for exploratory purposes. Sometimes, the term chain referral is used in reference to purposive sampling A type of nonprobability sampling in which the units to be observed are selected on the basis of the researcher's judgment about which ones will be the most useful or representative. Also called judgmental sampling. snowball sampling A nonprobability-sampling method, often employed in field research, whereby each person interviewed may be asked to suggest additional people for interviewing. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 194 ■ Part Two snowball sampling and other, similar techniques in which the sample unfolds and grows from an initial selection. Suppose you wish to learn a community organization's pattern of recruitment over time. You might begin by interviewing fairly recent recruits, then asking them who introduced them to the group. You might then interview the people named, asking them who introduced them to the group. You might then interview the next round of people named, and so forth. Or, in studying a loosely structured political group, you might ask one of the participants who he or she believes to be the most influential members of the group. You might interview those people and, in the course of the interviews, ask who they believe to be the most influential. In each of these examples, your sample would "snowball" as each of your interviewees suggested other people to interview. In another example, Karen Farquharson (2005) provides a detailed discussion of how she used snowball sampling to discover a network of tobacco policy makers in Australia: both those at the core of the network and those on the periphery. Kath Browne (2005) used snowballing through social networks to develop a sample of nonheterosexual women in a small town in the United Kingdom. She reports that her own membership in such networks greatly facilitated this type of sampling and that potential subjects in the study were more likely to trust her than to trust heterosexual researchers. In more general, theoretical terms, Chaim Noy argues that the process of selecting a snowball sample reveals important aspects of the populations being sampled: "the dynamics of natural and organic social networks" (2008: 329). Do the people you interview know others like themselves? Are they willing to identify those people to researchers? In this way, snowball sampling can be more than a simple technique for finding people to study. It, in itself, can be a revealing part of the inquiry. quota sampling A type of nonprobability sampling in which units are selected for a sample on the basis of prespecified characteristics, so that the total sample will have the same distribution of characteristics assumed to exist in the population being studied. Jaime Waters (2015) discovered some of the limitations of snowball sampling. In an attempt to study adult (over 40) users of illegal drugs, he discovered that his initial subjects were reluctant or unable to identify other users. Partly, this seemed to reflect a feeling that they had more to lose if their drug use were discovered. Also, he found that his adult users were not as involved in drug-using networks as younger users. Still, snowball sampling is sometimes an effective way to reach hard-to-find subjects. Quota Sampling Quota sampling is the method that helped George Gallup avoid disaster in 1936—and set up the disaster of 1948. Like probability sampling, quota sampling addresses the issue of representativeness, although the two methods approach the issue quite differently. Quota sampling begins with a matrix, or table, describing the characteristics of the target population. Depending on your research purposes, you may need to know what proportion of the population is male and what proportion female, as well as what proportions of each sex fall into various categories of age, educational level, ethnic group, and so forth. In establishing a national quota sample, you might need to know what proportion of the national population is urban, Eastern, male, under 25, white, working class, and the like, and all the possible combinations of these attributes. Once you've created such a matrix and assigned a relative proportion to each cell in the matrix, you proceed to collect data from people having all the characteristics of a given cell. You then assign to all the people in a given cell a weight appropriate to their portion of the total population. When all the sample elements are so weighted, the overall data should provide a reasonable representation of the total population. Although quota sampling resembles probability sampling, it has several inherent problems. First, the quota frame (the proportions that different cells represent) must be accurate, and it's often difficult to get up-to-date information for this purpose. The Gallup failure Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 7: The Logic of Sampling "195 to predict Truman as the presidential victor in 1948 stemmed partly from this problem. Second, the selection of sample elements within a given cell may be biased even if its proportion of the population is accurately estimated. Instructed to interview five people who meet a given, complex set of characteristics, an interviewer may still avoid people living at the top of seven-story walk-ups, having particularly rundown homes, or owning vicious dogs. In recent years, some researchers have attempted to combine probability and quota-sampling methods, but the effectiveness of this effort remains to be seen. At present, you should treat quota sampling warily if your purpose is statistical description. At the same time, the logic of quota sampling can sometimes be applied usefully to a field research project. In the study of a formal group, for example, you might wish to interview both leaders and nonleaders. In studying a student political organization, you might want to interview radical, moderate, and conservative members of that group. You may be able to achieve sufficient representativeness in such cases by using quota sampling to ensure that you interview both men and women, both younger and older people, and so forth. J. Michael Brick (2011), in pondering the future of survey sampling, suggested the possibility of a rebirth for quota sampling. Perhaps it is a workable solution to the problem of representativeness that bedevils falling response rates and online surveys. We'll return to this issue in Chapter 9 on survey research. Selecting Informants When field research involves the researcher's attempt to understand some social setting— a juvenile gang or local neighborhood, for example—much of that understanding will come from a collaboration with some members of the group being studied. Whereas social researchers speak of respondents as people who provide information about themselves, allowing the researcher to construct a composite picture of the group those respondents represent, an informant is a member of the group who can talk directly about the group per se. Anthropologists in particular depend on informants, but other social researchers rely on them as well. If you wanted to learn about informal social networks in a local public-housing project, for example, you would do well to locate individuals who understand what you are looking for and help you find it. When Jeffrey Johnson (1990) set out to study a salmon-fishing community in North Carolina, he used several criteria to evaluate potential informants. Did their positions allow them to interact regularly with other members of the camp, for example, or were they isolated? (He found that the carpenter had a wider range of interactions than did the boat captain.) Was their information about the camp limited to their specific jobs, or did it cover many aspects of the operation? These and other criteria helped determine how useful the potential informants might be to his study. We'll return to this example in a bit. Usually, you'll want to select informants who are somewhat typical of the groups you're studying. Otherwise, their observations and opinions may be misleading. Interviewing only physicians will not give you a well-rounded view of how a community medical clinic is working, for example. Along the same lines, an anthropologist who interviews only men in a society where women are sheltered from outsiders will get a biased view. Similarly, although informants fluent in English are convenient for English-speaking researchers from the United States, they do not typify the members of many societies or even many subgroups within English-speaking countries. Simply because they're the ones willing to work with outside investigators, informants will almost always be somewhat "marginal" or atypical within their group. Sometimes this is obvious. Other times, however, you'll learn about their marginality only in the course of your research. In Johnson's study, a county agent identified one fisherman who seemed squarely in the mainstream of the community. Moreover, he was cooperative and helpful to Johnson's research. The more informant Someone who is well versed in the social phenomenon that you wish to study and who is willing to tell you what he or she knows about it. Not to be confused with a respondent. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 196 ■ Part Two With so many possible informants, how can the researcher begin to choose? Johnson worked with the fisherman, however, the more he found the man to be a marginal member of the fishing community. First, he was a Yankee in a southern town. Second, he had a pension from the Navy [so he was not seen as a " serious fisherman" by others in the community]____Third, he was a major Republican activist in a mostly Democratic village. Finally, he kept his boat in an isolated anchorage, far from the community harbor. (Johnson 1990:56) Informants' marginality may not only bias the view you get but also limit their access (and hence yours) to the different sectors of the community you wish to study. These comments should give you some sense of the concerns involved in nonprobability sampling, typically used in qualitative research projects. I conclude with the following injunction from John Lofland, a particularly thoughtful and experienced qualitative researcher: Your overall goal is to collect the richest possible data. By rich data, we mean a wide and diverse range of information collected over a relatively prolonged period of time in a persistent and systematic manner. Ideally, such data enable you to grasp the meanings associated with the actions of those you are studying and to understand the contexts in which those actions are embedded. (Lofland et al. 2006:15) probability sampling The general term for samples selected in accordance with probability theory, typically involving some random-selection mechanism. Specific types of probability sampling include EPSEM, PPS, simple random sampling, and systematic sampling. In other words, nonprobability sampling does have its uses, particularly in qualitative research projects. But researchers must take care to acknowledge the limitations of nonprobability sampling, especially regarding accurate and precise representations of populations. This point will become clearer as we discuss the logic and techniques of probability sampling. The Logic and Techniques of Probability Sampling Although appropriate to some research purposes, nonprobability-sampling methods cannot guarantee that the sample we observed is representative of the whole population. When researchers want precise, statistical descriptions of large populations—for example, the percentage of the population that is unemployed, that plans to vote for Candidate X, or that feels a rape victim should have the right to an abortion—they turn to probability sampling. All large-scale surveys use probability-sampling methods. Although the application of probability sampling involves a somewhat sophisticated use of statistics, the basic logic of probability sampling is not difficult to understand. If all members of a population were identical in all respects—all demographic characteristics, attitudes, experiences, behaviors, and so on— there would be no need for careful sampling procedures. In this extreme case of perfect homogeneity, in fact, any single case would suffice as a sample to study characteristics of the whole population. In fact, of course, the human beings who compose any real population are quite heterogeneous, varying in many ways. Figure 7-2 offers a simplified illustration of a heterogeneous population: The 100 members of this small population differ by gender and race. We'll use this hypothetical micropopula-tion to illustrate various aspects of probability sampling. The fundamental idea behind probability sampling is this: In order to provide useful descriptions of the total population, a sample of individuals from a population must contain Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 7: The Logic of Sampling "197 50 40 a. o o. a) .a 30 20 10 0 . 44 44 - 6 1 1 6 1 1 White African White African women American men American women men FIGURE 7-2 A Population of 100 Folks. Typically, sampling aims at reflecting the characteristics and dynamics of large populations. For the purpose of some simple illustrations, let's assume our total population has only 100 members. essentially the same variations that exist in the population. This isn't as simple as it might seem, however. Let's take a minute to look at some of the ways researchers might go astray. Then, we'll see how probability sampling provides an efficient method for selecting a sample that should adequately reflect variations that exist in the population. Conscious and Subconscious Sampling Bias At first glance, it may look as though sampling is pretty straightforward. To select a sample of 100 university students, you might simply interview the first 100 students you find walking around campus. Although untrained researchers often use this kind of sampling method, it runs a high risk of introducing biases into the samples. In connection with sampling, bias simply means that those selected are not typical or representative of the larger populations they've been chosen from. This kind of bias does not have to be intentional. In fact, it's virtually inevitable when you pick people by the seat of your pants. Figure 7-3 illustrates what can happen when researchers simply select people who are convenient for study. Although women make up 50 percent of our micropopulation, the people closest to the researcher (in the lower right corner) happen to be 70 percent women, and although the population is 12 percent African American, none were selected for the sample. Beyond the risks inherent in simply studying people who are convenient, other problems can arise. To begin, the researcher's personal leanings African American IMjMWMIMIMIIMIMWI J^?^ *jgul* * * * * *Jg^* * * * *JjGut FIGURE 7-3 A Sample of Convenience: Easy, but Not Representative. Selecting and observing those people who are most readily at hand is the simplest method, perhaps, but it's unlikely to provide a sample that accurately reflects the total population. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 198 ■ Part Two may affect the sample to the point where it does not truly represent the student population. Suppose you're a little intimidated by students who look particularly "cool," feeling that they might ridicule your research effort. You might consciously or subconsciously avoid interviewing such people. Or, you might feel that the attitudes of "super-straight-looking" students would be irrelevant to your research purposes, and so you avoid interviewing them. Even if you sought to interview a "balanced" group of students, you wouldn't know the exact proportions of different types of students making up such a balance, and you wouldn't always be able to identify the different types just by watching them walk by. Further, even if you made a conscientious effort to interview, say, every tenth student entering the university library, you could not be sure of a representative sample, because different types of students visit the library with different frequencies. Your sample would overrepresent students who visit the library more often than do others. Similarly, the "public opinion" call-in polls—in which radio stations or newspapers ask people to call specified telephone numbers, text, or tweet to register their opinions— cannot be trusted to represent general populations. At the very least, not everyone in the population will even be aware of the poll. This problem also invalidates polls by magazines and newspapers who publish questionnaires for readers to complete and mail in. Even among those who are aware of such polls, not all will express an opinion, especially if doing so will cost them a stamp, an envelope, and their time. Similar considerations apply to polls taken over the Internet. Ironically, the failure of such polls to represent all opinions equally was inadvertently acknowledged by Phillip Perinelli (1986), a staff manager of AT&T Communications' DIAL-IT 900 Service, which offers a call-in poll facility to organizations. Perinelli attempted to counter criticisms by saying, "The 50-cent charge assures that only interested parties respond and helps assure also that no individual 'stuffs' the ballot box." Social researchers cannot determine general public opinion while considering "only interested parties." This excludes those who don't care 50-cents' worth, as well as those who recognize that such polls are not valid. Both types of people may have opinions and may even vote on election day. Perinelli's assertion that the 50-cent charge will prevent ballot stuffing actually means that only those who can afford it will engage in ballot stuffing. The possibilities for inadvertent sampling bias are endless and not always obvious. Fortunately, several techniques can help us avoid bias. Representativeness and Probability of Selection Although the term representativeness has no precise, scientific meaning, it carries a com-monsense meaning that makes it useful here. For our purpose, a sample is representative of the population from which it is selected if the aggregate characteristics of the sample closely approximate those same aggregate characteristics in the population. If, for example, the population contains 50 percent women, then a sample must contain "close to" 50 percent women to be representative. Later, we'll discuss "how close" in detail. See "Applying Concepts in Everyday Life: Representative Sampling" for more on this. Note that samples need not be representative in all respects; representativeness concerns only those characteristics that are relevant to the substantive interests of the study. However, you may not know in advance which characteristics are relevant. A basic principle of probability sampling is that a sample will be representative of the population from which it is selected if all members of the population have an equal chance of being selected for the sample. (We'll see shortly that the size of the sample selected also affects the degree of representativeness That quality of a sample of having the same distribution of characteristics as the population from which it was selected. By implication, descriptions and explanations derived from an analysis of the sample may be assumed to represent similar ones in the population. Representativeness is enhanced by probability sampling and provides for generalizability and the use of inferential statistics. _ Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 7: The Logic of Sampling "199 Applying Concepts in Everyday Life Representative Sampling Representativeness applies to many areas of life, not just survey sampling. Consider quality control, for example. Imagine running a company that makes light bulbs. You want to be sure that they actually light up, but you can't test them all. You could, however, devise a method of selecting a sample of bulbs drawn from different times in the production day, on different machines, in different factories, and so forth. Sometimes the concept of representative sampling serves as a protection against overgeneralization, discussed in Chapter 1. representativeness.) Samples that have this quality are often labeled EPSEM samples (EPSEM stands for "equal probability of selection method"). Later we'll discuss variations of this principle, which forms the basis of probability sampling. Moving beyond this basic principle, we must realize that samples—even carefully selected EPSEM samples—seldom, if ever, perfectly represent the populations from which they are drawn. Nevertheless, probability sampling offers two special advantages. First, probability samples, although never perfectly representative, are typically more representative than other types of samples, because the biases previously discussed are avoided. In practice, a probability sample is more likely than a nonprobability sample to be representative of the population from which it is drawn. Second, and more important, probability theory permits us to estimate the accuracy or representativeness of the sample. Conceivably, an uninformed researcher might, through wholly haphazard means, select a sample that nearly perfectly represents the larger population. The odds are against doing so, however, and we would be unable to estimate the likelihood that he or she has achieved representativeness. The probability sample, on the other hand, can provide an accurate estimate of success or failure. Shortly we'll see exactly how this estimate can be achieved. I've said that probability sampling ensures that samples are representative of the population we wish to study. As we'll see in a moment, probability sampling rests on the use of a random-selection procedure. To develop this idea, though, Suppose you go to a particular restaurant and don't like the food or service. You're ready to cross it off your list of dining possibilities, but then you think about it—perhaps you hit them on a bad night. Perhaps the chef had just discovered her boyfriend in bed with that "witch" from the Saturday wait staff and her mind wasn't on her cooking. Or perhaps the"witch" was serving your table and kept looking over her shoulder to see if anyone with a meat cleaver was bursting out of the kitchen. In short, your first experience might not have been representative. we need to give more-precise meaning to two important terms: element and population. An element is that unit about which information is collected and that provides the basis of analysis. Typically, in survey research, elements are people or certain types of people. However, other kinds of units can constitute the elements of social research: Families, social clubs, or corporations might be the elements of a study. In a given study, elements are often the same as units of analysis, though the former are used in sample selection and the latter in data analysis. Up to now we've used the term population to mean the group or collection that we're interested in generalizing about. More formally, a population is the theoretically specified aggregation of study elements. Whereas the vague term Americans might be the target for a study, the delineation of the population would include the definition of the element "Americans" (for example, citizenship, residence) and the time referent for the study (Americans as of when?). Translating the abstract "adult New Yorkers" into a workable population would require a specification of the age defining adult EPSEM (equal probability of selection method) A sample design in which each member of a population has the same chance of being selected for the sample. element That unit of which a population is composed and that is selected for a sample. Elements are distinguished from units of analysis, which are used in data analysis. population The theoretically specified aggregation of the elements in a study. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 200 ■ Part Two and the boundaries of New York. Specifying "college student" would include a consideration of full- and part-time students, degree candidates and non-degree candidates, undergraduate and graduate students, and so forth. A study population is that aggregation of elements from which the sample is actually selected. As a practical matter, researchers are seldom in a position to guarantee that every element meeting the theoretical definitions laid down actually has a chance of being selected in the sample. Even where lists of elements exist for sampling purposes, the lists are usually somewhat incomplete. Some students are always inadvertently omitted from student rosters. Some telephone subscribers have unlisted numbers. Often, researchers decide to limit their study populations more severely than indicated in the preceding examples. National polling firms may limit their national samples to the 48 adjacent states, omitting Alaska and Hawaii for practical reasons. A researcher wishing to sample psychology professors may limit the study population to those in psychology departments, omitting those in other departments. Whenever the population under examination is altered in such fashion, you must make the revisions clear to your readers. Random Selection With these definitions in hand, we can define the ultimate purpose of sampling: to select a set of elements from a population in such a way that descriptions of those elements accurately portray the total population from which the elements are selected. Probability sampling enhances the likelihood of accomplishing this aim and also provides methods for estimating the degree of probable success. Random selection is the key to this process. In random selection, each element has an study population That aggregation of elements from which a sample is actually selected. random selection A sampling method in which each element has an equal chance of being selected independently of any other event in the selection process. sampling unit That element or set of elements considered for selection in some stage of sampling. parameter The summary description of a given variable in a population. equal chance of being selected independently of any other event in the selection process. Flipping a coin is the most frequently cited example: Provided that the coin is perfect (that is, not biased in terms of coming up heads or tails), the "selection" of a head or a tail is independent of previous selections of heads or tails. No matter how many heads turn up in a row, the chance that the next flip will produce "heads" is exactly 50-50. Rolling a perfect set of dice is another example. Such images of random selection, though useful, seldom apply directly to sampling methods in social research. More typically, social researchers use tables of random numbers or computer programs that provide a random selection of sampling units. A sampling unit is that element or set of elements considered for selection at some stage of sampling. A little later, we'll see how computers are used to select random telephone numbers for interviewing, a technique called random-digit dialing. There are two reasons for using random-selection methods. First, this procedure serves as a check on conscious or subconscious bias on the part of the researcher. The researcher who selects cases on an intuitive basis might very well select cases that would support his or her research expectations or hypotheses. Random selection erases this danger. Second, and more important, random selection offers access to the body of probability theory, which provides the basis for estimating the characteristics of the population as well as estimates of the precision of sample results. Now let's examine probability theory in greater detail. Probability Theory, Sampling Distributions, and Estimates of Sampling Error Probability theory is a branch of mathematics that provides the tools researchers need (1) to devise sampling techniques that produce representative samples and (2) to statistically analyze the results of their sampling. More formally, probability theory provides the basis for estimating the parameters of a population. A parameter is the summary description of a given variable in a population. The mean income of all families in a city is a Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 7: The Logic of Sampling ■ 201 How would researchers conduct a random sample of this neighborhood? What are the pitfalls they would need to avoid? parameter; so is the age distribution of the city's population. When researchers generalize from a sample, they're using sample observations to estimate population parameters. Probability theory enables them to make these estimates and also to arrive at a judgment of how likely it is that the estimates will accurately represent the actual parameters in the population. So, for example, probability theory allows pollsters to infer from a sample of 2,000 voters how a population of 100 million voters is likely to vote—and to specify exactly what the probable margin of error in the estimates is. Probability theory accomplishes these seemingly magical feats by way of the concept of sampling distributions. A single sample selected from a population will give an estimate of the population parameter. Other samples would give the same or slightly different estimates. Probability theory tells us about the distribution of estimates that would be produced by a large number of such samples. The logic of sampling error can be applied to different kinds of measurements: mean income or mean age, for example. Measurements expressed as percentages, however, provide the simplest introduction to this general concept. To see how this works, we'll look at two examples of sampling distributions, beginning with a simple example in which our population consists of just ten cases. The Sampling Distribution of Ten Cases Suppose that there are ten people in a group and that each has a certain amount of money in his or her pocket. To simplify, let's assume that one person has no money, another has one dollar, another has two dollars, and so forth up to the person with nine dollars. Figure 7-4 presents the population of ten people. FIGURE 7-4 A Population of Ten People with $0 to $9. Let's imagine a population of only ten people with differing amounts of money in their pockets—ranging from $0 to $9. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 202 ■ Part Two Our task is to determine the average amount of money one person has: specifically, the mean number of dollars. If you simply add up the money shown in Figure 7-4, you'll find that the total is $45, so the mean is $4.50. Our purpose in the rest of this exercise is to estimate that mean without actually observing all ten individuals. We'll do that by selecting random samples from the population and using the means of those samples to estimate the mean of the whole population. To start, suppose we were to select—at random—a sample of only one person from the ten. Our ten possible samples thus consist of the ten cases shown in Figure 7-4. The ten dots shown on the graph in Figure 7-5 represent these ten samples. Because we're taking samples of only one, they also represent the "means" we would get as estimates of the population. The distribution of the dots on the graph is called the sampling distribution. Obviously, it wouldn't be a very good idea to select a sample of only one, because we'll very likely miss the true mean of $4.50 by quite a bit. Now suppose we take a sample of two. As shown in Figure 7-6, increasing the sample size improves our estimations. There are now forty-five possible samples: [$0, $1], [$0, $2],... [$7, $8], [$8, $9]. Moreover, some of those samples produce the same means. m 0 Q. 1 ° Cu 1— M ii ° s