PAR 3 Modes of Observation Learning Objectives After studying this chapter, you will be able to . . . • Give exampLes of topics well suited to experimental, studies and topics that wouLd not be appropriate. • Diagram and expLain the key eLements in the cLassicaL experiment. • Discuss three methods for seLecting and assigning subjects in an experiment. • Understand severaL types of experimental designs. • Provide examples illustrating the use of the experimental model in social research. • Discuss the advantages and disadvantages of Web-based experiments. • Describe what is meant by "natural" experiments, giving examples to illustrate. • Identify and discuss both the strengths and weaknesses of experiments in social research. • Explain some of the ethical issues involved in the use of the experimental model. Introduction This chapter addresses the controlled experiment: a research method commonly associated with the natural sciences. Although this is not the approach most commonly used in the social sciences, Part 3 begins with this method because it illustrates fundamental elements in the logic of explanatory research. If you can grasp the logic of the controlled experiment, you'll find it a useful backdrop for understanding techniques that are more commonly used. Of course, this chapter will also present some of the inventive ways social scientists have conducted experiments, and it will demonstrate some basic experimental techniques. At the most basic level, experiments involve (1) taking action and (2) observing the consequences of that action. Social researchers typically select a group of subjects, do something to them, and observe the effect of what was done. It's worth noting that experiments are often used in nonscientific human inquiry. In preparing a stew, for example, we add salt, taste, add more salt, and taste again. In defusing a bomb, we clip the red wire, observe whether the bomb explodes, clip the blue wire, and____ We also experiment copiously in our attempt to develop an overall understanding of the world we live in. All skills are learned through experimentation: eating, walking, riding a bicycle, and so forth. This chapter will discuss some ways social researchers use experiments to develop generalized understandings. We'll see that, like other methods available to the social researcher, experimenting has its special strengths and weaknesses. Topics Appropriate for Experiments Experiments are more appropriate for some topics and research purposes than for others. Experiments are especially well suited to research projects involving relatively limited and well-defined concepts and propositions. In terms of the traditional image of science, discussed earlier in this book, the experimental model is especially appropriate for hypothesis testing. Because experiments focus on determining causation, they're also better suited to explanatory than to descriptive purposes. Let's assume, for example, that we want to discover ways of reducing prejudice against Muslims. We hypothesize that learning about the contribution of Muslims to U.S. history will reduce prejudice, and we decide to test this hypothesis experimentally. To begin, we might test a group of experimental subjects to determine their levels of prejudice against Muslims. Next, we might show them a documentary film depicting the many important ways Muslims have contributed to the scientific, literary, political, and social development of the nation. Finally, we would measure our subjects' levels of prejudice against Muslims to determine whether the film has actually reduced prejudice. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 230 ■ Part Three What do you think? The impact of the observer raises many serious questions regarding the usefulness of experiments in social research. How can the manipulation of people in a controlled, experimental environment tell us anything about "natural" human behavior? After all is said and done, doesn't an experiment simply tell us how people behave when they participate in an experiment? See the What do you think?... Revisited box toward the end of the chapter. Experimentation has also been successful in the study of small-group interaction. Thus, we might bring together a small group of experimental subjects and assign them a task, such as making recommendations for popularizing car pools. Then we would observe how the group organizes itself and deals with the problem. Over the course of several such experiments, we might systematically vary the nature of the task or the rewards for handling the task successfully. By observing differences in the way groups organize themselves and operate under these varying conditions, we could learn a great deal about the nature of small-group interaction and the factors that influence it. For example, attorneys sometimes present evidence in different ways to different mock juries, to see which method is the most effective. Experiments often involve putting people in unusual, controlled situations to see how they will respond. We typically think of experiments as being conducted in laboratories. Indeed, most of the examples in this chapter involve such a setting. This need not be the case, however. Increasingly, social researchers are using the World Wide Web as a vehicle for conducting experiments. Further, sometimes we can construct what are called natural experiments: "experiments" that occur in the regular course of social events. The latter portion of this chapter deals with such research. The Classical Experiment In both the natural and the social sciences, the most conventional type of experiment involves three major pairs of components: (1) independent and dependent variables, (2) pretesting and posttesting, and (3) experimental and control groups. This section looks at each of these components and the way they're put together in the execution of an experiment. Independent and Dependent Variables Essentially, an experiment examines the effect of an independent variable on a dependent variable. Typically, the independent variable takes the form of an experimental stimulus, which is either present or absent. That is, the stimulus is a dichotomous variable, having two attributes—present or not present. In this typical model, the experimenter compares what happens when the stimulus is present to what happens when it is not. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 231 In the example concerning prejudice against Muslims, prejudice is the dependent variable and exposure to Muslim history is the independent variable. The researcher's hypothesis suggests that prejudice depends, in part, on a lack of knowledge of Muslim history. The purpose of the experiment is to test the validity of this hypothesis by presenting some subjects with an appropriate stimulus, such as a documentary film. In other words, the independent variable is the cause and the dependent variable is the effect. Thus, we might say that watching the film caused a change in prejudice or that reduced prejudice was an effect of watching the film. The independent and dependent variables appropriate for experimentation are nearly limitless. Moreover, a given variable might serve as an independent variable in one experiment and as a dependent variable in another. For example, prejudice is the dependent variable in the previous example, but it might be the independent variable in an experiment examining the effect of prejudice on voting behavior. To be used in an experiment, both independent and dependent variables must be operationally defined. Such operational definitions might involve a variety of observational methods. Responses to a questionnaire, for example, might be the basis for defining prejudice. Speaking to or ignoring Muslims, or agreeing or disagreeing with them, might be elements in the operational definition of interaction with Muslims in a small-group setting. Conventionally, in the experimental model, dependent and independent variables must be operationally defined before the experiment begins. However, as you'll see in connection with survey research and other methods, it's sometimes appropriate to make a wide variety of observations during data collection and then determine the most useful operational definitions of variables during later analyses. Ultimately, however, experimentation, like other quantitative methods, requires specific and standardized measurements and observations. Pretesting and Posttesting In the simplest experimental design, pretesting occurs first, whereby subjects are measured in terms of a dependent variable. Then the subjects are exposed to a stimulus representing an independent variable. Finally, in posttesting, they are remeasured in terms of the dependent variable. Any differences between the first and last measurements on the dependent variable are then attributed to the independent variable. In the example of prejudice and exposure to Muslim history, we would begin by pretesting the extent of prejudice among our experimental subjects. Using a questionnaire asking about attitudes toward Muslims, for example, we could measure the extent of prejudice exhibited by each individual subject and the average prejudice level of the whole group. After exposing the subjects to the Muslim history film, we could administer the same questionnaire again. Responses given in this posttest would permit us to measure the later extent of prejudice for each subject and the average prejudice level of the group as a whole. If we discovered a lower level of prejudice during the second administration of the questionnaire, we might conclude that the film had indeed reduced prejudice. In the experimental examination of attitudes such as prejudice, we face a special practical problem relating to validity. As you may already have imagined, the subjects might respond differently to the questionnaires the second time even if their attitudes remain unchanged. During the first administration of the questionnaire, the subjects may be unaware of its purpose. By the second measurement, they may have figured out that the researchers are interested in measuring their prejudice. Because no one wishes to seem prejudiced, the subjects may "clean up" their answers the second time around. Thus, the film will seem to have reduced prejudice although, in fact, it has not. This is an example of a more general problem that plagues many forms of social research: The very act of studying something may change it. The techniques for dealing with this problem in the context of experimentation will be discussed in various places throughout the chapter. The first technique involves the use of control groups. pretesting The measurement of a dependent variable among subjects before they are exposed to a stimulus representing an independent variable. posttesting The remeasurement of a dependent variable among subjects after they've been exposed to a stimulus representing an independent variable. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 232 ■ Part Three Experimental and Control Groups Laboratory experiments seldom, if ever, involve only the observation of an experimental group to which a stimulus has been administered. In addition, the researchers observe a control group, which does not receive the experimental stimulus. In the example of prejudice and Muslim history, we might examine two groups of subjects. To begin, we give each group a questionnaire designed to measure their prejudice against Muslims. Then we show the film only to the experimental group. Finally, we administer a posttest of prejudice to both groups. Figure 8-1 illustrates this basic experimental design. Using a control group allows the researcher to detect any effects of the experiment itself. If the posttest shows that the overall level of prejudice exhibited by the control group has dropped as much as that of the experimental group, then the apparent reduction in prejudice must be a function of the experiment or of some external factor rather than a function of the film. If, on the other hand, prejudice is reduced only in the experimental group, this reduction would seem to be a consequence of exposure to the film, because that's the only difference between the two groups. Alternatively, if prejudice is reduced in both groups but to a greater degree in the experimental group than in the control group, that, too, would be grounds for assuming that the film reduced prejudice. The need for control groups in social research became clear in connection with a series of studies of employee satisfaction, conducted by F. J. Roethlisberger and W. J. Dickson (1939) in the late 1920s and early 1930s. These researchers were interested in discovering what kinds of changes in working conditions would improve employee satisfaction and productivity. To pursue experimental group In experimentation, a group of subjects to whom an experimental stimulus is administered. control group In experimentation, a group of subjects to whom no experimental stimulus is administered and who resemble the experimental group in all other respects. The comparison of the control group and the experimental group at the end of the experiment points to the effect of the experimental stimulus. Randomization of Experimental and Control Groups Experimental Group I Groups Measure dependent variable i Administer experimental stimulus (film) V Remeasure dependent variable Compare: Same? Control Group Measure dependent variable Remeasure _■ dependent variable Compare: Different? FIGURE 8-1 Diagram of Basic Experimental Design. Ihe fundamental purpose of an experiment is to isolate the possible effect of an independent variable (called the stimulus in experiments) on a dependent variable. Members of the experimental group(s) are exposed to the stimulus and those in the control group(s) are not. this objective, they studied working conditions in the telephone "bank wiring room" of the Western Electric Works in the Chicago suburb of Hawthorne, Illinois. To the researchers' great satisfaction, they discovered that improving the working conditions increased satisfaction and productivity consistently. As the workroom was brightened up through better lighting, for example, productivity went up. When lighting was further improved, productivity went up again. To further substantiate their scientific conclusion, the researchers then dimmed the lights. Whoops—productivity again improved! At this point it became evident then that the wiring-room workers were responding more to the attention given them by the researchers than to the improved working conditions. As a result of this phenomenon, often called the Hawthorne effect, social researchers have become more sensitive to and cautious about the possible effects of experiments themselves. In the wiring-room study, the use of a proper control group— one that was studied intensively without any other changes in the working conditions—would have pointed to the existence of this effect. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 233 The need for control groups in experimentation has been nowhere more evident than in medical research. Time and again, patients who participate in medical experiments have appeared to improve, but it has been unclear how much of the improvement has come from the experimental treatment and how much from the experiment. In testing the effects of new drugs, then, medical researchers frequently administer a placebo—a "drug" with no relevant effect, such as sugar pills—to a control group. Thus, the control-group patients believe they, like the experimental group, are receiving an experimental drug. Often, they improve. If the new drug is effective, however, those receiving the actual drug will improve more than those receiving the placebo. In social science experiments, control groups provide an important guard against not only the effects of the experiments themselves but also against the effects of any events outside the laboratory during the experiments. In the example of the study of prejudice, suppose that a popular Muslim leader is assassinated in the middle of, say, a weeklong experiment. Such an event might horrify the experimental subjects, requiring them to examine their own attitudes toward Muslims, resulting in reduced prejudice. Because such an effect should happen about equally for members of the control group and the experimental group, a greater reduction of prejudice among the experimental group would, again, point to the impact of the experimental stimulus: the documentary film. Sometimes an experimental design will require more than one experimental or control group. In the case of the documentary film, for example, we might also want to examine the impact of reading a book on Muslim history. In that case, we might have one group see the film and read the book, another group only see the movie, still another group only read the book, and the control group do neither. With this kind of design, we could determine the impact of each stimulus separately, as well as their combined effect. The Double-Blind Experiment Like patients who improve when they merely think they're receiving a new drug, sometimes experimenters tend to prejudge results. In medical research, the experimenters may be more likely to "observe" improvements among patients receiving the experimental drug than among those receiving the placebo. (This would be most likely, perhaps, for the researcher who developed the drug.) A double-blind experiment eliminates this possibility, because neither the subjects nor the experimenters know which is the experimental group and which is the control group. In the medical case, those researchers responsible for administering the drug and for noting improvements would not be told which subjects were receiving the drug and which the placebo. Conversely, the researcher who knew which subjects were in which group would not administer the experiment. In social science experiments, as in medical ones, the danger of experimenter bias is further reduced to the extent that the operational definitions of the dependent variables are clear and precise. For example, medical researchers would be less likely to unconsciously bias their reading of a patient's temperature than they would be to bias their assessment of how lethargic the patient was. Similarly, the small-group researcher would be less likely to misperceive which subject spoke, or to whom he or she spoke, than whether the subject's comments sounded cooperative or competitive, a more subjective judgment that's difficult to define in precise behavioral terms. The role of the placebo may be more complex than you think, according to a 2010 medical experiment on irritable bowel syndrome. One group of sufferers was given pills in a bottle marked "Placebo" and it was explained that a placebo, sometimes called a sugar pill, contained no active ingredients. Subjects were told that people sometimes seemed to benefit from the placebos. A control group was given no treatment at all. After 21 days the placebo group had improved significantly, while the control group had not. This study was further complicated, however, by the fact that those receiving the placebo pills also received examinations and counseling sessions, while the control group received no double-blind experiment An experimental design in which neither the subjects nor the experimenters know which is the experimental group and which is the control. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 234 ■ Part Three attention at all. Perhaps, as the researchers acknowledge, the positive results were produced by the comprehensive treatment package, not by the placebo pills alone. Also, they note, the measures of improvement were self-assessments. It is possible that physiological measurements might have shown no improvement. But, to complicate matters further, isn't "feeling better" the goal of such treatments? As I've indicated several times, seldom can we devise operational definitions and measurements that are wholly precise and unambiguous. This is another reason why employing a double-blind design in social research experiments might be appropriate. Selecting Subjects In Chapter 7 we discussed the logic of sampling, which involves selecting a sample that is representative of some populations. Similar considerations apply to experiments. Because most social researchers work in colleges and universities, it seems likely that most social research laboratory experiments are conducted with college undergraduates as subjects. Typically, the experimenter asks students enrolled in his or her classes to participate in experiments or advertises for subjects in a college newspaper. Subjects may or may not be paid for participating in such experiments. (See Chapter 3 for more on the ethical issues involved in this situation.) In relation to the norm of generalizability in science, this tendency clearly represents a potential defect in social research. Simply put, college undergraduates do not typify the public at large. There is a danger, therefore, that we may learn much about the attitudes and actions of college undergraduates but not about social attitudes and actions in general. However, this potential defect is less significant in explanatory research than in descriptive research. Although it is true that having noted the level of prejudice among a group of college undergraduates in our pretesting, we would have little confidence that the same level existed among the public at large. On the other hand, if we found that a documentary film reduced whatever level of prejudice existed among those undergraduates, we would have more confidence—without being certain—that it would have a similar effect in the community at large. Social processes and patterns of causal relationships appear to be more generalizable and more stable than specific characteristics such as an individual's level of prejudice. This problem of generalizing from students isn't always seen as problematic, as Jerome Taylor reports in a commentary on research concerning the common cold, a disease he traces back to ancient Egypt. This elusive illness attacks only humans and chimpanzees, so you can probably guess which subjects medical researchers have tended to select. However, you might be wrong: Chimpanzees were too expensive to import en masse, so during the first half of the 20th century British scientists began looking into how the common cold worked by conducting experiments on medical students at St. Bartholomew's Hospital in London. (Taylor 2008) Aside from the question of generalizability, the cardinal rule of subject selection and experimentation concerns the comparability of experimental and control groups. Ideally, the control group represents what the experimental group would have been like if it had not been exposed to the experimental stimulus. The logic of experiments requires, therefore, that experimental and control groups be as similar as possible. There are several ways to accomplish this, as will be discussed next. Probability Sampling The discussions of the logic and techniques of probability sampling in Chapter 7 outline one method for selecting two groups that are similar to each other. Beginning with a sampling frame composed of all the people in the population under study, the researcher might select two probability samples. If these samples each resemble the total population from which they're selected, they'll also resemble each other. Recall also, however, that the degree of resemblance (representativeness) achieved by probability sampling is largely a function of the sample size. As a general guideline, probability samples of less than 100 are not likely to be Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 235 representative, and social science experiments seldom involve that many subjects in either experimental or control groups. As a result, then, probability sampling is seldom used in experiments to select subjects from a larger population. Researchers do, however, use the logic of random selection when they assign subjects to groups. Randomization Having recruited, by whatever means, a total group of subjects, the experimenter may randomly assign those subjects to either the experimental or the control group. Such randomization might be accomplished by numbering all of the subjects serially and selecting numbers by means of a random-number table, or the experimenter might assign the odd-numbered subjects to the experimental group and the even-numbered subjects to the control group. Let's return again to the basic concept of probability sampling. If we recruit 40 subjects (in response to a newspaper advertisement, for example), there's no reason to believe that the 40 subjects represent the entire population from which they've been drawn. Nor can we assume that the 20 subjects randomly assigned to the experimental group represent that larger population. We can have greater confidence, however, that the 20 subjects randomly assigned to the experimental group will be reasonably similar to the 20 assigned to the control group. Following the logic of our earlier discussions of sampling, we can see our 40 subjects as a population from which we select two probability samples—each consisting of half the population. Because each sample reflects the characteristics of the total population, the two samples will mirror each other. As we saw in Chapter 7, our assumption of similarity in the two groups depends in part on the number of subjects involved. In the extreme case, if we recruited only two subjects and assigned, by the flip of a coin, one as the experimental subject and one as the control, there would be no reason to assume that the two subjects are similar to each other. With larger numbers of subjects, however, randomization makes good sense. Matching Another way to achieve comparability between the experimental and control groups is through matching. This process is similar to the quota-sampling methods discussed in Chapter 7. If 12 of our subjects are young white men, we might assign 6 of those at random to the experimental group and the other 6 to the control group. If 14 are middle-aged African American women, we might assign 7 to each group. The overall matching process could be most efficiently achieved through the creation of a quota matrix constructed of all the most relevant characteristics. Figure 8-2 provides a simplified illustration of such a matrix. In this example, the experimenter has decided that the relevant characteristics are race, age, and gender. Ideally, the quota matrix is constructed to result in an even number of subjects in each cell of the matrix. Then, half the subjects in each cell go into the experimental group and half into the control group. Alternatively, we might recruit more subjects than our experimental design requires. We might then examine many characteristics of the large initial group of subjects. Whenever we discover a pair of quite similar subjects, we might assign one at random to the experimental group and the other to the control group. Potential subjects who are unlike anyone else in the initial group might be left out of the experiment altogether. Whatever method we employ, the desired result is the same. The overall average description of the experimental group should be the same as that of the control group. For example, they should have about the same average age, the same proportions of males and females, the same racial composition, and so forth. This test of comparability should be used whether randomization A technique for assigning experimental subjects to experimental and control groups randomly. matching In connection with experiments, the procedure whereby pairs of subjects are matched on the basis of their similarities on one or more variables, and one member of the pair is assigned to the experimental group and the other to the control group. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 236 ■ Part Three FIGURE 8-2 Quota Matrix Illustration. Sometimes the experimental and control groups are created by finding pairs of matching subjects and assigning one to the experimental group and the other to the control group. the two groups are created through probability sampling or through randomization. Thus far, I've referred to the "relevant" variables without saying clearly what those variables are. Of course, I can't give a definite answer to this question, any more than I could specify in Chapter 7 which variables should be used in stratified sampling. Which variables are relevant ultimately depends on the nature and purpose of the experiment. As a general rule, however, the control and experimental groups should be comparable in terms of those variables most likely to be related to the dependent variable under study. In a study of prejudice, for example, the two groups should be alike in terms of education, ethnicity, and age, among other characteristics. In some cases, moreover, we may delay assigning subjects to experimental and control groups until we've initially measured the dependent variable. Thus, for example, we might administer a questionnaire measuring subjects' prejudice and then match the experimental and control groups to assure ourselves that the two groups exhibit the same overall level of prejudice. Matching or Randomization? When assigning subjects to the experimental and control groups, you should be aware of two arguments in favor of randomization over matching. First, you may not be in a position to know in advance which variables will be relevant for the matching process. Second, most of the statistics used to analyze the results of experiments assume randomization. Failure to design your experiment that way, then, makes your later use of those statistics less meaningful. On the other hand, randomization makes sense only if you have a fairly large pool of subjects so that the laws of probability sampling apply. With only a few subjects, matching would be a better procedure. Sometimes researchers can combine matching and randomization. When conducting an experiment in the educational enrichment of young adolescents, for example, Milton Yinger and his colleagues (1977) needed to assign a large number of students, ages 13 and 14, to several different experimental and control groups to ensure the comparability of students composing each of the groups. They achieved this goal using the following method. Beginning with a pool of subjects, the researchers first created strata of students nearly identical to one another in terms of some 15 variables. From each of the strata, students were randomly assigned to the different experimental and control groups. In this fashion, the researchers actually improved on conventional randomization. Essentially, they used a stratified sampling procedure (recall the discussion in Chapter 7), except that they employed far more Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 237 stratification variables than are typically used in, say, survey sampling. Thus far, I've described the classical experiment—the experimental design that best represents the logic of causal analysis in the laboratory. In practice, however, social researchers use a great variety of experimental designs. In the next section, we'll look at some of these approaches. Variations on Experimental Design In their classic book on research design, Experimental and Quasi-Experimental Designs for Research, Donald Campbell and Julian Stanley (1963) describe sixteen different experimental and quasi-experimental designs. This section summarizes a few of these variations to help show the potential for experimentation in social research. Preexperimental Research Designs To begin, Campbell and Stanley discuss three preexperimental designs, not to recommend them but because they're frequently used in less-than-professional research. These designs are called preexperimental to indicate that they do not meet the scientific standards of experimental designs, and sometimes they may be used because the conditions for full-fledged experiments are impossible to meet. In the first such design—the one-shot case study—a single group of subjects is measured on a dependent variable following the administration of some experimental stimulus. Suppose, for example, that we show the previously mentioned Muslim history film to a group of people and then administer a questionnaire that seems to measure prejudice against Muslims. Suppose further that the answers given to the questionnaire seem to represent a low level of prejudice. We might be tempted to conclude that the film reduced prejudice. Lacking a pretest, however, we can't be sure. Perhaps the questionnaire doesn't really represent a very sensitive measure of prejudice, or perhaps the group we're studying was low in prejudice to begin with. In either case, the film might have made no difference, though our experimental results might mislead us into thinking it did. The second preexperimental design discussed by Campbell and Stanley adds a pretest for the experimental group but lacks a control group. This design—which the authors call the one-group pretest-posttest design—suffers from the possibility that some factor other than the independent variable might cause a change between the pretest and posttest results, such as the scenario described earlier concerning the assassination of a respected Muslim leader. Thus, although we can see that prejudice has been reduced, we can't be sure the film caused that reduction. To round out the possibilities for preexperimental designs, Campbell and Stanley point out that some research is based on experimental and control groups but has no pretests. They call this design the static-group comparison. For example, we might show the Muslim history film to one group but not to another and then measure prejudice in both groups. If the experimental group had less prejudice at the conclusion of the experiment, we might assume the film was responsible. But unless we had randomized our subjects, we would have no way of knowing that the two groups had the same degree of prejudice initially; perhaps the experimental group started out with less prejudice. Figure 8-3 illustrates these three preexperimental research designs, using a different research question: "Does exercise cause weight reduction?" To make the several designs clearer, the figure shows individuals rather than groups, but the same logic pertains to group comparisons. Let's review the three preexperimental designs in this new example. The one-shot case study design represents a common form of logical reasoning in everyday life. Asked whether exercise causes weight reduction, we may bring to mind an example that would seem to support the proposition: someone who exercises and is thin. There are problems with this reasoning, however. Perhaps the person was thin long before beginning to exercise. Or perhaps he became thin for some other reason, such as eating less or suffering from an illness. The observations shown in the diagram do not guard against these other possibilities. Moreover, the observation that the man in the diagram is in trim shape depends on our intuitive idea of what constitutes trim and overweight body shapes. All told, this is very weak evidence for testing the relationship between exercise and weight loss. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 238 ■ Part Three One-Shot Case Study A man who exercises is observed to be in trim shape Some intuitive standard of what constitutes a trim shape Time 1 Time 2 Time 3 One-Group Pretest-Posttest Design An overweight man who exercises is later observed to be in trim shape •]lil>fSllKl»] Time 2 Time 3 Static-Group Comparison A man who exercises is observed to be in trim shape while one who doesn't is observed to be overweight Time 1 Time 2 Time 3 FIGURE 8-3 Three Preexperimental Research Designs. These preexperimental designs anticipate the logic of true experiments but remain open to errors of interpretation. Can you see the errors that might be made in each of these designs? The various risks are solved by the addition of control groups, pretesting, and posttesting. The one-group pretest-posttest design offers somewhat better evidence that exercise produces weight loss. Specifically, we've ruled out the possibility that the man was thin before beginning to exercise. However, we still have no assurance that it was his exercising that caused him to lose weight. Finally, the static-group comparison eliminates the problem of our questionable definition of what constitutes trim or overweight body shapes. In this case, we can compare the shapes of the man who exercises and the one who does not. This design, however, reopens the possibility that the man who exercises was thin to begin with. Validity Issues in Experimental Research At this point, I want to present in a more systematic way the factors that affect experimental research—those I've already discussed as well as additional factors. First we'll look at what Campbell and Stanley call the sources of internal invalidity, reviewed and expanded in a follow-up Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 239 book by Thomas Cook and Donald Campbell (1979). Then we'll consider the problem of generalizing experimental results to the "real" world, referred to as external invalidity. Having examined these, we'll be in a position to appreciate the advantages of some of the more sophisticated experimental and quasi-experimental designs that social science researchers sometimes use. Sources of Internal Invalidity The problem of internal invalidity refers to the possibility that the conclusions drawn from experimental results may not accurately reflect what has gone on in the experiment itself. The threat of internal invalidity is present whenever anything other than the experimental stimulus can affect the dependent variable. Donald Campbell and Julian Stanley (1963: 5-6) and Thomas Cook and Donald Campbell (1979: 51-55) point to several sources of internal invalidity. I will touch on eight of them here to illustrate this concern: 1. History. During the course of the experiment, historical events may occur that confound the experimental results. The assassination of a Muslim leader during the course of an experiment on reducing anti-Muslim prejudice is one example. 2. Maturation. People are continually growing and changing, and such changes affect the results of the experiment. In a long-term experiment, the fact that the subjects grow older (and wiser?) can have an effect. In shorter experiments, they can grow tired, sleepy, bored, or hungry—or change in other ways that affect their behavior in the experiment. 3. Testing. Often the process of testing and retest-ing influences people's behavior, thereby confounding the experimental results. Suppose we administer a questionnaire to a group as a way of measuring their prejudice. Then we administer an experimental stimulus and remeasure their prejudice. As we saw earlier, by the time we conduct the posttest, the subjects will probably have become more sensitive to the issue of prejudice and will be more thoughtful in their answers. In fact, they may have figured out that we're trying to find out how prejudiced they are, and, because few people want to appear prejudiced, they may give answers that they think the researchers are seeking or that will make themselves "look good." 4. Instrumentation. The process of measurement in pretesting and posttesting brings in some of the issues of conceptualization and operationaliza-tion discussed earlier in the book. For example, if we use different measures of the dependent variable (say, different questionnaires about prejudice), how can we be sure they're comparable? Perhaps prejudice will seem to decrease simply because the pretest measure was more sensitive than the posttest measure. Or if the measurements are being made by the experimenters, their standards or abilities may change over the course of the experiment. 5. Statistical regression. Sometimes it's appropriate to conduct experiments on subjects who start out with extreme scores on the dependent variable. If you were testing a new method for teaching math to hard-core failures in math, you would want to conduct your experiment on people who previously have done extremely poorly in math. But consider for a minute what's likely to happen to the math achievement of such people over time without any experimental interference. They're starting out so low that they can only stay at the bottom or improve: They can't get worse. Even without any experimental stimulus, then, the group as a whole is likely to show some improvement over time. Referring to a regression to the mean, statisticians often point out that extremely tall people as a group are likely to have children shorter than themselves, and extremely short people as a group are likely to have children taller than themselves. There is a danger, then, that changes occurring by virtue of subjects starting out in extreme positions will be attributed erroneously to the effects of the experimental stimulus. 6. Selection biases. We discussed selection bias earlier when we examined different ways of selecting subjects for experiments and assigning them to experimental and control groups. Comparisons have no meaning unless the groups are comparable at the start of an experiment. 7. Experimental mortality. Although some social experiments could, I suppose, kill subjects, experimental mortality refers to a more general and less extreme problem. Often, experimental subjects will drop out of the experiment before internal invalidity Refers to the possibility that the conclusions drawn from experimental results may not accurately reflect what went on in the experiment itself. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 240 ■ Part Three it's completed, and this can affect statistical comparisons and conclusions. In the classical experiment involving an experimental and a control group, each with a pretest and posttest, suppose that the bigots in the experimental group are so offended by the Muslim history film that they leave before it's over. Those subjects sticking around for the posttest will have been less prejudiced to start with, so the group results will reflect a substantial "decrease" in prejudice. 8. Demoralization. On the other hand, feelings of deprivation within the control group may result in some giving up. In educational experiments, control-group subjects may feel the experimental group is being treated better and they may become demoralized, stop studying, act up, or get angry. These, then, are some of the sources of internal invalidity in experiments, as cited by Campbell, Stanley, and Cook. Aware of these pitfalls, experimenters have devised designs aimed at managing them. The classical experiment, coupled with proper subject selection and assignment, addresses each of these problems. Let's look again at that study design, presented in Figure 8-4, as it applies to our hypothetical study of prejudice. If we use the experimental design shown in Figure 8-4, we should expect two findings from our Muslim history film experiment. For the experimental group, the level of prejudice measured in their posttest should be less than that found in their pretest. In addition, when the two posttests are compared, less prejudice should be found in the experimental group than in the control group. This design also guards against the problem of history, in that anything occurring outside the experiment that might affect the experimental group should also affect the control group. Consequently, the two posttest results should still differ. The same comparison guards against problems of maturation as long as the subjects have been randomly assigned to the two groups. Testing and instrumentation can't be problems, because both the experimental and control groups are subject to the same tests and experimenter effects. If the subjects have been assigned to the two groups randomly, statistical regression should affect both equally, even if people with extreme scores on prejudice (or whatever the dependent variable is) are being studied. Selection bias is ruled out by the random assignment of subjects. Experimental mortality is more complicated to handle, but the data provided in this study design offer several ways to deal with it. Pretest measurements would let us discover any differences in the dropouts of the experimental Pretest Stimulus Posttest Experimental Group Control Group FIGURE 8-4 The Classical Experiment: Using a Muslim History Film to Reduce Prejudice. This diagram illustrates the basic structure of the classical experiment as a vehicle for testing the impact of a film on prejudice. Notice how the control group, the pretesting, and the posttesting function. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 241 and control groups. Slight modifications to the design—administering a placebo (such as a film having nothing to do with Muslims) to the control group, for example—can make the problem even easier to manage. Finally, demoralization can be watched for and taken into account in evaluating the results of the experiment. Sources of External Invalidity Internal invalidity accounts for only some of the complications faced by experimenters. In addition, there are problems of what Campbell and Stanley call external invalidity, which relates to the generalizability of experimental findings to the "real" world. Even if the results of an experiment provide an accurate gauge of what happened during that experiment, do they really tell us anything about life in the wilds of society? Campbell and Stanley describe four forms of this problem; I'll present one of them to you as an illustration. The generalizability of experimental findings is jeopardized, as the authors point out, if there's an interaction between the testing situation and the experimental stimulus (1963: 18). Here's an example of what they mean. Staying with the study of prejudice and the Muslim history film, let's suppose that our experimental group—in the classical experiment—has less prejudice in its posttest than in its pretest, and that its posttest shows less prejudice than that of the control group. We can be confident that the film actually reduced prejudice among our experimental subjects. But would it have the same effect on the public if the film were shown in theaters or on television? We can't be sure, because the film might be effective only when people have been sensitized to the issue of prejudice, as the subjects may have been while taking the pretest. This is an example of interaction between the testing and the stimulus. The classical experimental design cannot control for that possibility. Fortunately, experimenters have devised other designs that can. The Solomon four-group design (Campbell and Stanley 1963: 24-25) addresses the problem of testing interaction with the stimulus. As the name suggests, it involves four groups of subjects, assigned randomly from a pool. Figure 8-5 presents this design. Notice that Groups 1 and 2 in Figure 8-5 compose the classical experiment. Group 3 is administered the experimental stimulus without a pretest, and Group 4 is only posttested. This latest experimental design permits four meaningful comparisons. If the Muslim history film really reduces prejudice—unaccounted for by the problem of internal invalidity and unaccounted for by an interaction between the testing and the stimulus—we should expect four findings: 1. 3. In Group 1, posttest prejudice should be less than pretest prejudice. In Group 2, prejudice should be the same in the pretest and the posttest. The Group 1 posttest should show less prejudice than the Group 2 posttest. The Group 3 posttest should show less prejudice than the Group 4 posttest. Group 1 Group 2 (control) Group 3 Group 4 (control) j- o=—i Pretest ^™u'us Posttest +m (film) Pretest No stimulus Posttest No Stimulus Posttest ^ pretest (film) No No ... Posttest -+J pretest stimulus = TIME ==^^^^ Expected Findings Oln Group 1, posttest prejudice should be less than pretest prejudice. ©In Group 2, prejudice should be the same in the pretest and the posttest. OThe Group 1 posttest should show less prejudice than the Group 2 posttest does. OThe Group 3 posttest should show less prejudice than the Group 4 posttest does. FIGURE 8-5 The Solomon Four-Group Design. The classical experiment runs the risk that pretesting will have an effect on subjects, so the Solomon four-group design adds experimental and control groups that skip the pretest. Thus, it combines the classical experiment and the after-only design or"static-group comparison." external invalidity Refers to the possibility that conclusions drawn from experimental results may not be generalizable to the "real" world. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 242 ■ Part Three Notice that finding number 4 rules out any interaction between the testing and the stimulus. Remember that these comparisons are meaningful only if subjects have been assigned randomly to the different groups, thereby providing groups of equal prejudice initially, even though their preexperimen-tal prejudice is measured only in Groups 1 and 2. There is a side benefit to this research design, as the authors point out. Not only does the Solomon four-group design rule out interactions between testing and the stimulus, it also provides data for comparisons that will reveal the amount of such interaction that occurs in the classical ex- field experiment A formal experiment conducted outside the laboratory, in a natural setting. many other possible experimental designs as well. Some involve more than one stimulus and combinations of stimuli. Others involve several tests of the dependent variable over time and the administration of the stimulus at different times for different groups. If you're interested in pursuing this topic, you might want to look at the Campbell and Stanley book. Examples of Experimentation Experiments have been used to study a wide variety of topics in the social sciences. Some experiments take place within laboratory situations; others occur out in the "real world"—these are referred to as field experiments. The following discussion will give you a glimpse of both. We'll begin with an example of a field experiment. In George Bernard Shaw's well-loved play, Pygmalion—the basis for the musical My Fair Lady—Eliza Doolittle speaks of the powers others have in determining our social identity. Here's how she distinguishes between the ways she's treated by her tutor, Professor Higgins, and by Higgins's friend, Colonel Pickering: You see, really and truly, apart from the things anyone can pick up (the dressing and the proper way of speaking, and so on), the difference between a lady and a flower girl is not how she behaves, but how she's treated. I shall always be a flower girl to Professor Higgins, because he always treats me as a flower girl, and always will, but I know I can be a lady to you, because you always treat me as a lady, and always will. (Act 5) The sentiment Eliza expresses here is basic social science, addressed more formally by sociologists such as Charles Horton Cooley ("looking-glass self") and George Herbert Mead ("the generalized other"). The basic point is that who we think we are—our self-concept—and how we behave is largely a function of how others see and treat us. Further, the way others perceive us is largely conditioned by their expectations. If they've been told we're stupid, for example, they're likely to see us that way—and we may come to see ourselves that way and, in fact, act stupidly. "Labeling theory" addresses the phenomenon of people acting in accord with the ways they are perceived and labeled by others. These theories perimental design. This knowledge would allow a researcher to review and evaluate the value of any prior research that used the simpler design. The last experimental design I'll mention here is what Campbell and Stanley (1963: 25-26) call the posttest-only control-group design; it consists of the second half—Groups 3 and 4—of the Solomon design (refer again to Figure 8-5). As the authors argue persuasively, with proper randomization, only Groups 3 and 4 are needed for a true experiment that controls for the problems of internal invalidity as well as for the interaction between testing and stimulus. With randomized assignment to experimental and control groups (which distinguishes this design from the static-group comparison discussed earlier), the subjects will be initially comparable on the dependent variable—comparable enough to satisfy the conventional statistical tests used to evaluate the results—so it's not necessary to measure them. Indeed, Campbell and Stanley suggest that the only justification for pretesting in this situation is tradition. Experimenters have simply grown accustomed to pretesting and feel more secure with research designs that include it. Be clear, however, that this point applies only to experiments in which subjects have been assigned to experimental and control groups randomly, because that's what justifies the assumption that the groups are equivalent— without actually measuring them to find out. This discussion has introduced the intricacies of experimental design, its problems, and some solutions. Of course, researchers use a great Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 243 have served as the premise for numerous movies, such as the 1983 film Trading Places, in which Eddie Murphy and Dan Ackroyd play a derelict converted into a stockbroker and vice versa. The tendency to see in others what we've been led to expect takes its name from Shaw's play and is called the Pygmalion effect. This effect is nicely suited to controlled experiments. In one of the best-known experiments on this topic, Robert Rosenthal and Lenore Jacobson (1968) administered what they called a "Harvard Test of Inflected Acquisition" to students in a West Coast school. Subsequently, they met with the students' teachers to present the results of the test. In particular, Rosenthal and Jacobson identified certain students as very likely to exhibit a sudden spurt in academic abilities during the coming year, based on the results of the test. When IQ test scores were compared later, the researchers' predictions proved accurate. The students identified as "spurters" far exceeded their classmates during the following year, suggesting that the predictive test was a powerful one. In fact, the test was a hoax! The researchers had made their predictions randomly among both good and poor students. What they told the teachers did not really reflect students' test scores at all. The progress made by the spurters was simply a result of the teachers' expecting the improvement and paying more attention to those students, encouraging them, and rewarding them for achievements. (Notice the similarity between this situation and the Hawthorne effect, discussed earlier in this chapter.) The Rosenthal-Jacobson study attracted a lot of popular, as well as scientific, attention. Subsequent experiments have focused on specific aspects of what has become known as the attribution process, or the expectations communication model. This research, largely conducted by psychologists, parallels research primarily by sociologists, which takes a slightly different focus and is often gathered under the label expectations-states theory. The psychological studies focus on situations in which the expectations of a dominant individual affect the performance of subordinates—as in the case of a teacher and students or a boss and employees. The sociological research has tended to focus more on the role of expectations among equals in small, task-oriented groups. In a jury, for example, how do jurors initially evaluate each other, and how do those initial assessments affect their later interactions? Here's a different kind of social science experiment. Shelley Correll, Stephen Benard, and In Paik (2007) were interested in learning whether race, gender, and/or parenthood might produce discrimination in hiring. Specifically, they wanted to find out if there was a "motherhood penalty." They decided to explore this topic with an experiment using college undergraduates. The student-subjects chosen for the study were told that a new communications company was looking for someone to manage the marketing department of their East Coast office. They heard that the communications company was interested in receiving feedback from younger adults since young people are heavy consumers of communications technology. To further increase their task orientation, participants were told that their input would be incorporated with the other information the company collects on applicants and would impact actual hiring decisions. (Correll, Benard, and Paik 2007:1311) The researchers had created several resumes describing fictitious candidates for the manager's position. Initially, the resumes had no indication of race, gender, or parenthood. A group of subjects was asked to evaluate the quality of the candidates. The group decided that the resumes reflected equivalent quality. In the next part of the experiment, the resumes were augmented with additional information. Gender became apparent when names were added to the resumes. Moreover, the use of typically African American names (such as Latoya and Ebony for women, Tyrone and Jamal for men) or typically white names (such as Allison and Sarah for women, Brad and Matthew for men) allowed subjects to guess the candidates' races. Finally, including participation in a parent-teacher association (PTA) or listing names of children identified some candidates as parents. Over the course of the experiment, these different status indicators were added to the same resumes used in the initial trial. Thus, a particular resume might appear as an African American mother, a white non-mother, a white father, and so forth. Of course, no student-subject would evaluate the same resume with different status indicators. Finally, the experimental subjects were given sets of resumes to evaluate in several ways. For example, they were asked how competent they felt the candidates were and how committed Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 244 ■ Part Three they seemed. They were asked to suggest a salary that might be offered a given candidate and to predict how likely it was that the candidate would eventually be promoted within the organization. They were even asked to indicate how many days the candidate should be allowed to miss work or come late before being fired. Since each of the resumes was evaluated with different status indicators attached, the experimenters could determine whether those statuses made a difference. Specifically, they could test for the existence of a motherhood penalty. And they found it. Among other things, » Mothers were judged to be less competent and less committed than non-mothers. Students offered mothers lower salaries than they did non-mothers and would allow mothers fewer missed or late days on the job. » They felt that mothers were less likely to be promoted than non-mothers. They recommended hiring non-mothers almost twice as often as they did mothers. Rounding out the analysis of gender and parenthood, the researchers found that, while the differences were smaller for men than for women, fathers were rated higher than non-fathers—just the opposite of the pattern found among women candidates. The motherhood penalty was found among both white and African American candidates. Moreover, it did not matter what the gender of the subject evaluators were. Both women and men rated mothers lower than non-mothers. Web-Based Experiments Increasingly, researchers are using the World Wide Web as a vehicle for conducting social science experiments. Because representative samples are not essential in most experiments, researchers can often use volunteers who respond to invitations online. To get a better idea of this form of experimentation, go to www. socialpsychology.org/expts.htm. This website offers links to numerous professional and student research projects on such topics as "interpersonal relations," "beliefs and attitudes," and "personality and individual differences." In addition, the site offers resources for conducting Web experiments. Participating as a subject will provide data for other researchers, may offer you some insights into your own thinking patterns, and can suggest experiments you might want to conduct. Web experiments have raised new ethical questions as we saw in Chapter 3. Here's the abstract of an online experiment: We show, via a massive (N = 689,003) experiment on Facebook, that emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness. We provide experimental evidence that emotional contagion occurs without direct interaction between people (exposure to a friend expressing an emotion is sufficient), and in the complete absence of nonverbal cues. (Kramer et al. 2014) What is most interesting about this online research report lies in the numerous comments. Most raise ethical concerns, chiefly concerning informed consent. More generally, every new research technique will raise new issues of research ethics as well as scientific validity. Social research is unlikely to ever become dull. "Natural" Experiments Although people tend to equate the terms experiment and laboratory experiment, we've seen that experiments are sometimes conducted outside the lab (field experiments) and can be conducted on the Web. Other important social science experiments occur outside controlled settings altogether, often in the course of normal social events. Sometimes nature designs and executes experiments that we can observe and analyze; sometimes social and political decision makers serve this natural function. Imagine, for example, that a hurricane has struck a particular town. Some residents of the town suffer severe financial damages, whereas others escape relatively lightly. What, we might ask, are the behavioral consequences of suffering a natural disaster? Are those who suffer the most more likely to take precautions against future disasters than are those who suffer the least? To answer these questions, we might interview residents of the town some time after the hurricane. We might question them regarding the precautions they had taken before the hurricane and those they're Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 245 currently taking. We could then compare the precautionary actions of the people who suffered a great deal from the hurricane with those taken by citizens who suffered relatively little. In this fashion, we might take advantage of a natural experiment, which we could not have arranged even if we'd been perversely willing to do so. Because in natural experiments the researcher must take things pretty much as they occur, such experiments raise many of the validity problems discussed earlier. Thus, when Stanislav Kasl, Rupert Chisolm, and Brenda Eskenazi (1981) chose to study the impact that the Three Mile Island (TMI) nuclear accident in Pennsylvania had on plant workers, they had to be especially careful when devising the study design: Disaster research is necessarily opportunistic, quasi-experimental, and after-the-fact. In the terminology of Campbell and Stanley's classical analysis of research designs, our study falls into the "static-group comparison " category, considered one of the weak research designs. However, the weaknesses are potential and their actual presence depends on the unique circumstances of each study. (1981:474) The foundation of this study was a survey of the people who had been working at Three Mile Island on March 28, 1979, when the cooling system failed in the number 2 reactor and began melting the uranium core. The survey was conducted 5 to 6 months after the accident. Among other things, the survey questionnaire measured workers' attitudes toward working at nuclear power plants. If they had measured only the TMI workers' attitudes after the accident, the researchers would have had no idea whether attitudes had changed as a consequence of the accident. But they improved their study design by selecting another, nearby—seemingly comparable—nuclear power plant (abbreviated as PB) and surveyed workers there as a control group: hence their reference to a static-group comparison. Even with an experimental and a control group, the authors were wary of potential problems in their design. In particular, their design was based on the idea that the two sets of workers were equivalent to each other, except for the single fact of the accident. The researchers could have assumed this if they had been able to assign workers to the two plants randomly, but of course they couldn't. Instead, they compared characteristics of the two groups to see whether they were equivalent. Ultimately, the researchers concluded that the two sets of workers were very much alike, and the plant the employees worked at was merely a function of where they lived. Even granting that the two sets of workers were equivalent, the researchers faced another problem of comparability. They could not contact all the workers who had been employed at TMI at the time of the accident. The researchers discuss the problem as follows: One special attrition problem in this study was the possibility that some of the no-contact nonrespondents among the TMI subjects, but not PB subjects, had permanently left the area because of the accident. This biased attrition would, most likely, attenuate the estimated extent of the impact. Using the evidence of disconnected or "not in service" telephone numbers, we estimate this bias to be negligible (1 percent). (Kasl, Chisolm, and Eskenazi 1981:475) The TMI example points both to the special problems involved in natural experiments and to the possibility of taking those problems into account. Social research generally requires ingenuity and insight, and natural experiments are certainly no exception. Earlier in this chapter, we used a hypothetical example of studying whether an ethnic history film reduced prejudice. Sandra Ball-Rokeach, Joel Grube, and Milton Rokeach (1981) were able to address that topic in real life through a natural experiment. In 1977, the television dramatization of Alex Haley's Roots, a historical saga about African Americans, was presented by ABC on eight consecutive nights. It garnered the largest audiences in television history at that time. Ball-Rokeach and her colleagues wanted to know whether Roots changed white Americans' attitudes toward African Americans. Their opportunity arose in 1979, when a sequel—Roots: The Next Generation—was televised. Although it would have been nice (from a researcher's point of view) to assign random samples of Americans either to watch or not watch the show, this wasn't possible. Instead, the researchers selected four samples in Washington State and mailed questionnaires (before the broadcast) that measured attitudes toward African Americans. Following the last episode of the show, Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 246 ■ Part Three respondents were called and asked how many, if any, episodes they had watched. Subsequently, questionnaires were sent to respondents, remea-suring their attitudes toward African Americans. By comparing attitudes before and after for both those who watched the show and those who didn't, the researchers reached several conclusions. For example, they found that people with already egalitarian attitudes were much more likely to watch the show than were those who were more prejudiced toward African Americans: a self-selection phenomenon. Comparing the before and after attitudes of those who watched the show, moreover, suggested that the show itself had little or no effect. Those who watched it were no more egalitarian afterward than they had been before. This example anticipates the subject of Chapter 12, evaluation research, which can be seen as a special type of natural experiment. As you'll see, evaluation research involves taking the logic of experimentation into the field to observe and evaluate the effects of stimuli in real life. Because this is an increasingly important form of social research, an entire chapter is devoted to it. Strengths and Weaknesses of the Experimental Method Experiments are the primary tool for studying causal relationships. However, like all research methods, experiments have both strengths and weaknesses. The chief advantage of a controlled experiment lies in the isolation of the experimental variable's impact over time. This is seen most clearly in terms of the basic experimental model. A group of experimental subjects are found, at the outset of the experiment, to have a certain characteristic; following the administration of an experimental stimulus, they are found to have a different characteristic. To the extent that subjects have experienced no other stimuli, we may conclude that the change of characteristics is caused by the experimental stimulus. Further, because individual experiments are often rather limited in scope, requiring relatively little time and money and relatively few subjects, we often can replicate a given experiment several times using many different groups of subjects. (This isn't always the case, of course, but it's usually easier to repeat experiments than, say, surveys.) As in all other forms of scientific research, replication of research findings strengthens our confidence in the validity and generalizability of those findings. The greatest weakness of laboratory experiments lies in their artificiality. Social processes that occur in a laboratory setting might not necessarily occur in natural social settings. For example, a Muslim history film might genuinely reduce prejudice among a group of experimental subjects. This would not necessarily mean, however, that the same film shown in neighborhood movie theaters throughout the country would reduce prejudice among the general public. Artificiality is not as much of a problem, of course, for natural experiments as for those conducted in the laboratory. In discussing several of the sources of internal and external invalidity mentioned by Campbell, Stanley, and Cook, we saw that we can create experimental designs that logically control such problems. This possibility points to one of the great advantages of experiments: They lend themselves to a logical rigor that is often much more difficult to achieve in other modes of observation. Ethics and Experiments As you've seen, many important ethical issues come up in the conduct of social science experiments. I'll mention only two here. First, experiments almost always involve deception. In most cases, explaining the purpose of the experiment to subjects would probably cause them to behave differently—trying to look good, for example. It's important, therefore, to determine (1) whether a particular deception is essential to the experiment and (2) whether the value of what may be learned from the experiment justifies the ethical violation. Second, experiments typically intrude on the lives of the subjects. Experimental researchers commonly put subjects in unusual situations and ask them to undergo unusual experiences. Rarely, if ever, do they physically injure the subjects (don't do that, by the way); however, psychological damage to subjects may occur, as some of the examples in this chapter illustrate. As with the matter of deception, then, researchers must balance the potential value of the research against the potential damage to subjects. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Chapter 8: Experiments ■ 247 What do you think?...Revisited As we've seen, the impact of the experiment itself on subjects'responses is a major concern in social research. Several elements of experimental designs address this concern. First, the use of control groups allows researchers to account for any effects of the experiment that are not related to the stimulus. Second, the Solomon four-group design tests for the possible impact of pretests on the dependent variable. And, finally, so-called natural experiments are done in real-life situations, imposing an experimental template over naturally occurring events. Thus, although the impact of the observer can affect experimental results negatively, researchers have developed methods for addressing it. MAIN POINTS Introduction In experiments, social researchers typically select a group of subjects, do something to them, and observe the effect of what was done. Topics Appropriate for Experiments Experiments provide an excellent vehicle for the controlled testing of causal processes. The Classical Experiment The classical experiment tests the effect of an experimental stimulus (the independent variable) on a dependent variable through the pretesting and posttesting of experimental and control groups. A double-blind experiment guards against experimenter bias because neither the experimenter nor the subject knows which subjects are in the control and experimental groups. Selecting Subjects It's generally less important that a group of experimental subjects be representative of some larger population than that experimental and control groups be similar to each other. Probability sampling, randomization, and matching are all methods of achieving comparability in the experimental and control groups. Randomization is the generally preferred method. In some designs, it can be combined with matching. Variations on Experimental Design Campbell and Stanley describe three forms of preexperiments: the one-shot case study, the one-group pretest-posttest design, and the static-group comparison. Campbell and Stanley list, among others, eight sources of internal invalidity in experimental design: history, maturation, testing, instrumentation, statistical regression, selection biases, experimental mortality, and demoralization. The classical experiment with random assignment of subjects guards against each of these. Experiments also face problems of external invalidity, in that experimental findings might not reflect real life. The interaction of testing with the stimulus is an example of external invalidity that the classical experiment does not guard against. The Solomon four-group design and other variations on the classical experiment can safeguard against external invalidity. Campbell and Stanley suggest that, given proper randomization in the assignment of subjects to the experimental and control groups, there is no need for pretesting in experiments. Examples of Experimentation In a controlled field experiment, researchers exposed the Pygmalion effect as one phenomenon that researchers must account for in experimental design. One recent experiment in a laboratory setting showed that a "motherhood penalty" exists in the work world. Web-Based Experiments The World Wide Web has become an increasingly common vehicle for performing social science experiments. "Natural" Experiments Natural experiments often occur in the course of social life in the real world, and social researchers can implement them in somewhat the same way they would design and conduct laboratory experiments. Strengths and Weaknesses of the Experimental Method Like all research methods, experiments have strengths and weaknesses. The primary weakness of experiments is artificiality: What happens in an experiment may not reflect what happens in the outside world. The strengths of experimentation include the isolation of the independent variable, which permits causal inferences; the relative ease of replication; and scientific rigor. Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 248 ■ Part Three Ethics and Experiments Experiments typically involve deceiving subjects. By their intrusive nature, experiments open the possibility of inadvertently causing damage to subjects. KEY TERMS control group internal invalidity double-blind experiment matching experimental group posttesting external invalidity pretesting field experiment randomization be administered, as well as detailing the experimental and control groups you'll use. You'll need to describe the pretesting and posttesting that will be involved in your experiment. Where will you conduct your experiments—in a laboratory setting or under natural circumstances? If you plan to conduct a double-blind experiment, you should describe how you'll accomplish it. You may also want to explore some of the internal and external problems of validity that might complicate the analysis of your results. Finally, the experimental model is typically used to test specific hypotheses, so you should specify how you'll accomplish that in your study. What standard will determine whether hypotheses are accepted or rejected? PROPOSING SOCIAL RESEARCH: EXPERIMENTS In the next series of exercises, we focus on specific data-collection techniques, beginning here with experiments. If you're doing these exercises as part of an assignment in the course, your instructor will tell you whether you should skip those chapters dealing with methods you won't be using. If you're doing them on your own, to improve your understanding of the topics in the book, I suggest that you do all of these exercises. You can temporarily modify your proposed data-collection method and explore how you would research your topic using the method at hand. In the case of experimentation, your proposal should make clear why you chose the experimental model over other forms of research and how it best serves your goals. More specifically, you'll want to describe the experimental stimulus and how it will REVIEW QUESTIONS 1. What are some examples of internal invalidity? Pick four of the eight sources discussed in the book and make up your own examples to illustrate each. 2. Think of a recent natural disaster you've witnessed or read about. What research question might be studied by treating that disaster as a natural experiment? In two or three paragraphs, outline how the study might be done. 3. Say you want to evaluate a new operating system or other software. How might you set up an experiment to see what people really think of it? Keep in mind the use of control groups and the placebo effect. 4. Think of a recent, highly publicized trial. How might the attorneys have used mock juries to evaluate different strategies for presenting evidence? Copyright 2021 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.