GENERAL DESIGN ISSUES 71 CHAPTER 4 General design issues This chapter: • develops a framework for designing a real world study linking purpose, conceptual framework, research questions, methods and sampling strategy; • sensitizes the reader to the issues involved in selecting a research strategy; • introduces experimental and non-experimental fixed design strategies; • suggests that flexible design strategies particularly appropriate for real world studies include case studies, ethnographic studies and grounded theory studies; • covers a range of multi-strategy (mixed-method) designs; • emphasizes that it is advisable to read the other chapters in Part II before making decisions about strategy; and • concludes by considering the trustworthiness of research findings, and its relationship to research design. Introduction Design is concerned with turning research questions into projects. This is a crucial part of any research project, but it is often slid over quickly without any real consideration of the issues and possibilities. There is a strong tendency, both for those carrying out projects and those who want them carried out, to assume that there is no alternative to their favoured approach. Comments have already been made on the assumption by many psychologists that an experimental design is inevitably called for. For other social scientists, and for quite a few clients when commissioning studies, designs involving the statistical analysis of sample survey data are seen as the only possible approach. As stressed in the previous chapter, the strategies and tactics you select in carrying out a piece of research depend very much on the type of research question you are trying to answer. Hakim (2000), in one of the few books which focuses on design issues across a range of social science disciplines, makes a comparison between designers of research projects and architects, and then goes on to extend this to suggest that those who actually carrv out projects are like builders. For her: Design deals primarily with aims, purposes, intentions and plans within the practical constraints of location, time, money and availability of staff. It is also very much about style, the architect's own preferences and ideas (whether innovative or solidly traditional) and the stylistic preferences of those who pay for the work and have to live with the final result (p. 1, emphasis in original). In small-scale research, the architect-designer and builder-researcher are typically-one and the same person. Hence the need for sensitivity to design issues, to avoid the research equivalent of the many awful houses put up by speculative builders without benefit of architectural expertise. Such muddling through should be distinguished from the opportunity to develop and revise the original plan, which is easier in a small-scale project than in one requiring the coordination of many persons' efforts. Design modification is more feasible with some research strategies than with others - it is an integral part of what are referred to in this text as flexible designs. I Iowever, this kind of flexibility calls for a concern for design throughout the project, rather than providing an excuse for not considering design at all. A framework for research design Design, in the sense discussed above, concerns the various things which should be thought about and kept in mind when carrying out a research project. Many models have been put forward and Figure 4.1 is my attempt. The components are: Figure 4.1: Framework for research design. 72 REAL WORLD RESEARCH GENERAL DESIGN ISSUES 75 • Purpose(s). What is this study trying to achieve? Why is it being done? Are you seeking to describe something, or to explain or understand something? Are you trying to assess the effectiveness of something? Is it in response to some problem or issue for which solutions arc sought? Is it hoped to change something as a result of the study? • Conceptual framework. Your theory about what is going on, of what is happening and why. What are the various aspects or features involved, and how might they be related to each other? • Research questions. To what questions is the research geared to providing answers? What do you need to know to achieve the purpose(s) of the study? What is it feasible to ask given the time and resources that you have available? • Methods. What specific techniques (e.g. semi-structured interviews, participant observation) will you use to collect data? How will the data be analysed? How do you show that the data are trustworthy? • Sampling procedures. Who will you seek data from? Where and when? How do you balance the need to be selective with that of collecting the data needed? Ethical considerations, though not included in the design framework, inevitably arise when carrying out research involving people and should be laken into account both in the planning and carrying out of your project (see Chapter 9). All these aspects need to be interrelated and kept in balance. The diagram suggests that there is some directionality about the whole process. Both your purposes and the conceptual framework feed in to, and help you specify, the research questions. When you know something about the research questions you want to be answered, then you arc able to make decisions about the methods and the procedures to be used when sampling. I Iowcver, unless you are dealing with a fixed design which is tightly pre-specified, this should not be taken to imply a once only consideration of the different aspects. In flexible designs there should be a repeated revisiting of all of the aspects as the research takes place. In other words, the detailed framework of the design emerges during the study. The various activities of collecting and analysing data, of refining and modifying the set of research questions, of developing theory; of changing the intended sample to follow up interesting lines or to seek answers to rather different questions and perhaps even reviewing the purposes of the study in the light of a changed context arising from the way in which the other aspects are developing - are likely to be going on together. This might suggest that a better representation of the relationship between these aspects in flexible designs would show two-way arrows between each of the components in the figure. Maxwell (2005, p. 5) approximates to this in a very similar diagram which he refers to as an 'Interactive" model of research design. Or even that one might revert to what Martin (1981) has called the 'garbage can' model of research design where such components are 'swirling around in the garbage can or decision space of the particular research project' (Grady and Wallston, 1988, p. 12). However, providing the interactive nature of what goes on in this kind of project is understood, Figure 4.1 has the advantage of presenting a simple and logical structure. The design framework should have high compatibility between purposes, research questions, conceptual framework and sampling strategy. Some mismatches call for serious attention. For example: • If the only research questions to which you can think of ways to get answers to are not directly relevant to the purposes of the study, then something has to change. Probably the research questions. . If the methods and/or the sampling strategy are not providing answers to the research questions, something should change. Collect additional data and/or change the data collection mcthod(s), extend the sampling or cut down on or modify the research questions. . If there are research questions which do not link to the conceptual framework, or parts of the conceptual framework which are not represented in the set of research questions, then one or other (or both) needs changing. This is something of a counsel of perfection. Don't let it block any further progress if you can't get it quite right. You may not get an ideal solution with the time and resources you have available. Go for a practical solution that seems reasonably adequate (an example of the strategy of satisficing as advocated by Simon, 1979). In fixed research designs you should get as much of this right as you can before embarking on the major phase of data collection. Hence the importance of pilot work, where you have the opportunity of testing out the feasibility of what you propose. In flexible research designs you have to get all of this sorted out by the end of the study. As Brewer and Hunter (2005, p. 45) put it 'Once a study is published, it is in many ways irrelevant whether the research problem prompted the study or instead emerged from it'. This is not a licence to rewrite history. In many qualitative research traditions there is an expectation that you provide an account of your journey, documenting the various changes made along the way. However, you are definitely not bound to some form of 'honour code' where, say, you declare your initial set of research questions and then stick to them through thick and thin. Your aim is to come up with a final set of research questions, which arc relevant to the purposes of the study (which may, or may not, have been renegotiated along the way); and which show clear linkage to the conceptual structure (from whatever source it has obtained) and for which the sampling has been such that the data you have collected and analysed provides answers to those questions. In the real world, of course, it won't be as neat and tidy as this. Some research questions may remain stubbornly unanswerable given the amount of sampling and data collection your resources permit. This is not a capital offence. Providing you have answers to some of the questions which remain on your agenda, then you have made worthwhile progress. And the experience will no doubt be salutary in helping you to carry out more realistically designed projects in the future. You could even claim this for a study where you ended up with no answers to relevant research questions, but this is not going to further your career as a researcher. It may also be that you come up with unexpected findings which appear interesting and illuminative. These findings may well be assimilable into your framework by appropriate extension or modification of your research questions. There is nothing wrong with adding a further question providing it is relevant to your purposes and it can 74 REAL WORLD RESEARCH GENERAL DESIGN ISSUES 75 be incorporated within a (possibly modified) theoretical framework. If your ingenuity fails, and you can't link it in, then simply regard this as a bonus to be reported under the heading of 'an interesting avenue for further research'. Getting a feel for design issues The shaded pages below give an overview of what is involved in choosing a research strategy, including a short description of the strategies you might consider. This might be a good time for you to get hold of the reports of a range of published studies (journal articles, research reports, dissertations, etc.) and to read them through to get a feel for different designs. Try not to get bogged down in the details, and don't be put off by complex analyses. When you get on to the detailed design of your own study and its analysis, you can seek assistance on such matters. The obvious sources are academic and professional journals close to your own concerns but, as previously suggested, there is a lot to be said for 'spending some time in the next village'. If your field is, say, social work, browse through a few health-related, educational or management journals. The purpose here is not so much to build up knowledge of directly relevant literature, or to find something you can replicate, although both of these are reputable aims in their own right. It's the overall design that you are after. The website gives details of a mixed bag of studies with fixed, flexible and multi-strategy designs worth chasing up and looking through. Note that they won't necessarily use the terminology adopted here of research questions, purposes, etc. (it is instructive to try to work these out as an exercise). <0r The website gives references to a selection of examples of research using fixed, flexible and multi-strategy designs. If you follow up these examples you will notice that several of them involve evaluating some practice, intervention or programme, or have an action perspective where they are concerned with change of some kind taking place. Chapter 8 covers the additional features to be considered in studies which have these purposes. Choosing a Research Design Strategy This section seeks to sensitize you to the issues involved in choosing a research design strategy. A. Is a FIXED, FLEXIBLE or MULTI-STRATEGY design strategy appropriate? • A fixed design calls for a tight pre-specification before you reach the main data collection stage. If you can't pre-specify the design, don't use the fixed approach. Data are almost always in the form of numbers; hence this type is commonly referred to as a quantitative strategy. See Chapter 5 for details. B. D. . A flexible design evolves during data collection. Data arc typically non-numerical (usually in the form of words); hence this type is often referred to as a qualitative strategy. See Chapter 6 for details. • A multi-strategy design combines substantial elements of both fixed and flexible design. A common type has a flexible phase followed by a fixed phase (the reverse sequence is more rare). See Chapter 7 for details. Note: Flexible designs can include the collection of small amounts of quantitative data (Chapter 6, p. 135) Similarly, fixed designs can include the collection of small amounts of qualitative data (Chapter 5, p. 81). Is your proposed study an EVALUATION? Are you trying to establish the worth or value of something such as an intervention, innovation or service? This could be approached using either a fixed, flexible or multi-strategy design strategy depending on the specific purpose of the evaluation. If the focus is on outcomes, a fixed design is probably indicated, if it is on processes, a flexible design is probably preferred. Many evaluations have an interest in both outcomes and processes and use a multi-strategy design. See Chapter 8, p. 176, for details. Do you wish to carry out ACTION RESEARCH? Is an action agenda central to your concerns? This typically involves direct participation in the research by others likely to be involved, coupled with an intention to initiate change. A flexible design is almost always used. See Chapter 8, p. 188, for details. If you opt for a FIXED design strategy, which type is most appropriate? Two broad traditions are widely recognized; experimental and non-experimental designs. Box 4.1 on p. 78 summarizes their characteristics. If you opt for a FLEXIBLE design strategy, which type is most appropriate? Flexible designs have developed from a wide range of very different traditions. Three of these are widely used in real world studies. These are case studies, ethnographic studies and grounded theory studies. Box 4.2 on p. 79 summarizes their characteristics. If you are considering a MULTI-STRATEGY design strategy, which type is most appropriate? It may well be that a strategy which combines fixed and flexible design elements seems to be appropriate for the study with which you are involved. One or more case studies might be linked to an experiment. Alternatively, a small experiment might be incorporated actually within a case study. Issues involved in the carrying out of multi-strategy designs are discussed in Chapter 7. Note: The research strategies discussed above by no means cover all possible real world research designs. They arc more of a recognition of the camps into which researchers have tended to put themselves, signalling their preferences for certain ways of working. Such camps have the virtue of providing secure bases within which fledgling researchers can be inculcated in the ways of the tribe, and, more generally, high professional standards can be maintained. They carry the danger of research being 'strategy driven' in the 76 REAL WORLD RESEARCH GENERAL DESIGN ISSUES v sense that someone skilled in, say, doing experiments assumes automatically that every problem has to be attacked through that strategy. G. The purpose(s) helps in selecting the strategy The strategics discussed above represent different ways of collecting and analysing empirical evidence. Each has its particular strengths and weaknesses. It is also commonly suggested that there is a hierarchical relationship between the different strategies, related to the purpose of the research; that • flexible (qualitative) strategies arc appropriate for exploratory work; • non-experimental fixed strategies are appropriate for descriptive studies; • experiments are appropriate for explanatory studies. There is some truth in this assertion - certainly as a description of how the strategies have tended to be used in the past. There is a further sense in which a flexible strategy lends itself particularly well to exploration, a sense in which certain kinds of description can be readily achieved using non-experimental (typically survey approaches) and a traditional view that the experiment is a particularly appropriate tool for getting at cause and effect relationships (although see the discussion in Chapter 2, p. 32). However, these are not necessary or immutable linkages. Each strategy (fixed, flexible or multi-strategy) can be used for any or all of the purposes. For example, grounded theory studies aim to be explanatory through the development of theory; also there can be, and have been, exploratory, descriptive and explanatory case studies (Yin, 2003, 2009). Real world studies are very commonly evaluations, i.e. their purpose is to assess the worth or value of something. A fixed, flexible or multi-strategy design may be appropriate depending on the specific focus of the evaluation (see B above). If a purpose is to initiate change and/or to involve others, then an action research strategy may be appropriate. A flexible design is probably called for (sec C above). H. The research questions have a strong influence on the strategy to be chosen While purpose is of help in selecting the research design strategy, the type of research questions you are asking is important. For example, questions asking 'how many?' or 'how much?' or 'who' or 'where' suggest the use of a non-experimental fixed strategy such as a survey. 'What' questions concerned with 'what is going on here?' lend themselves to some form of flexible design study. 'How?' and 'why?' questions are more difficult to pin down. They often indicate a flexible design. However, if the research can have control over events and if there is substantial prior knowledge about the problem and the likely mechanisms involved, then an experiment might be indicated. Box 4.3 on p. 80 considers the research questions set out in Box 3.3, p. 60, and discusses research strategies that might be appropriate. I. Specific methods of investigation need not he tied to particular research strategies The methods or techniques used to collect information, what might be called the tactics of enquiry, such as questionnaires or various kinds of observation, are sometimes regarded as necessarily linked to particular research strategies. Thus, in fixed non-experimental designs, surveys may be seen as being carried out bv structured questionnaire and experiments through specialized forms of observation, often requiring the use of measuring instruments of some sophistication. In flexible designs, grounded theory studies were often viewed as interview-based and ethnographic studies seen as entirely based on participant observation. However, this is not a tight or necessary linkage. For example, while participant observation is a central feature of the ethnographic approach, it can be augmented by interviews and documentary analysis. Similarly, there is no reason in principle for particular fixed design studies to be linked to specific data collection techniques. Non-cxpcrimcntal surveys could well be carried out using observation, the effect of an experiment assessed through questionnaire responses. You should now some appreciation of what is involved in selecting an appropriate research strategy. Before plunging in and making a decision, you need to know more about the issues involved in working within these strategies to help you get a feel for what might be involved. The rest of the chapters in Part II cover them in some detail. Establishing trustworthiness How do you persuade your audiences, including yourself, that the findings of your research are worth taking account of? What is it that makes the study believable and trustworthy? What are the kinds of argument that you can use? What questions should you ask? What criteria are involved? In this connection validity and gerieralizabitity are central concepts. Validity is concerned with whether the findings are 'really' about what they appear to be about. Generalizability refers to the extent to which the findings of the enquiry are more generally applicable outside the specifics of the situation studied. These issues, together with the related one of reliability (the consistency or stability of a measure; for example, if it were to be repeated would the same result be obtained), were initially developed in the context of traditional fixed designs and there is considerable debate about their applicability to flexible designs. Hence trustworthiness is considered separately in each of the following chapters covering different types of research designs. Further reading The website gives annotated references to further reading for Clmptcr 4. 78 REAL WORLD RESEARCH Experimental and non-experimental fixed design research strategies Experimental strategy The central feature is that the researcher actively and deliberately introduces some form of change in the situation, circumstances or experience of participants with a view to producing a resultant change in their behaviour. In 'experiment-speak' this is referred to as measuring the effects of manipulating one variable on another variable. The details of the design are fully pre-specified before the main data collection begins (there is typically a 'pilot' phase before this when the feasibility of the design is checked and changes made if needed). Typical features: • selection of samples of individuals from known populations; • allocation of samples to different experimental conditions; • introduction of planned change on one or more variables; • measurement on very small number of variables; • control of other variables; and • testing of formal hypotheses. Non-experimental strategy The overall approach is the same as in the experimental strategy but the researcher does not attempt to change the situation, circumstances or experience of the participants. The details of the design are fully pre-specified before the main data collection begins (there is typically a 'pilot' phase before this when the feasibility of the design is checked and changes made if needed). Typical features: • selection of samples of individuals from known populations; • allocation of samples to different experimental conditions; • measurement on relatively small number of variables; ■ control of other variables; and ■ may or may not involve hypothesis testing. GENERAL DESIGN ISSUES 79 Three widely used flexible design research strategies Case study Development of detailed, intensive knowledge about a single 'case', or of a small number of related 'cases'. The details of the design typically 'emerge' during data collection and analysis. Typical features: • selection of a single case (or a small number of related cases) of a situation, individual or group of interest or concern; • study of the case in its context; and • collection of information via a range of data collection techniques including observation, interview and documentary analysis (typically, though not necessarily exclusively, producing qualitative data). Ethnographic study Seeks to capture, interpret and explain how a group, organization or community live, experience and make sense of their lives and their world. It typically tries to answer questions about specific groups of people, or about specific aspects of the life of a particular group. Typical features: • selection of a group, organization or community of interest or concern; • immersion of the researcher in that setting; and • use of participant observation. Grounded theory study The central aim is to generate theory from data collected during the study. Particularly useful in new, applied areas where there is a lack of theory and concepts to describe and explain what is going on. Data collection, analysis and theory development and testing interspersed throughout the study. Typical features: • applicable to a wide variety of phenomena; • commonly interview-based; and is • a systematic but flexible research strategy which provides detailed prescriptions for data analysis and theory generation. Noft-s; There are many other types of flexible design, some of which are summarized in Chapter 6. Many studies involving flexible designs focus on a particular 'case' in its context and can be conceptualized as case studies. Case studies can follow an ethnographic or grounded theory approach, but don't have to. 80 REAL WORLD RESEARCH Linking research questions to research strategy Consider the research questions discussed in Box 3.3 (p. 60): 1. Do the children read better as a result of this programme? or 2. Do the children read better in this programme compared with the standard programme? or 3. For what type of special need, ability level, class organization or school is the programme effective? If the interest is in quantitative outcome measures, and it is feasible lo exert some degree of control over the situation (e.g. setting up different groups of children for the innovatory and standard programmes), these questions could be approached using an experimental strategy. If random allocation is used, this becomes a true experiment; if not, a quasi-cxperimenl. If this control were not feasible, or not desired, but quantitative data were still sought, a non-experimental fixed design is possible. If there is a broader notion of what is meant by 'reading better' or of an 'effective' programme than that captured by a small number of quantitative variables, some type of flexible strategy is called for. This is likely to be a multimethod case study, and could also be ethnographic or grounded theory in style. A multi-strategy approach where the case study could incorporate, say, an experimental component, could be considered. 4. What is the experience of children following the programme? 5. What are teachers' views about the programme? andfor 6. To what extent are parents involved in and supportive of the programme? These questions could be approached using any of the flexible strategies; though (4) might particularly indicate an ethnographic approach. Questions (5) and (6) could, alternatively or additionally, follow a non-experimental fixed design if quantitative data are sought. The overall message is that, while the research questions help in deciding research strategy, much is still dependent on your oum preferences and on the type of design and data which are going to speak most strongly to the stakeholders. CHAPTER 5 Fixed designs This chapter: • covers general features of fixed design research, typically involving the collection of quantitative data; • discusses how the trustworthiness (including reliability, validity and general-izability) of findings from this style of research can be established; • explores the attractions and problems of doing experiments in real world research; • gives particular attention to the randomized controlled trial (RCT) and whether it can be legitimately viewed as the 'gold standard' of research designs; • attempts to provide a balanced view of the ubiquitous evidence-based movement; • differentiates between true experimental, quasi-experimental and single case experimental designs; • considers non-experimental fixed designs; and • concludes by discussing how to decide on sample sizes in fixed design research. Introduction This chapter deals with approaches to social research where the design of the study is fixed before the main stage of data collection takes place. In these approaches the phenomena of interest are typically quantified. This is not a necessary feature. As pointed out by Oakley (2000, p. 306) there is nothing intrinsic to such designs which rules out qualitative methods or data (see Murphy et ah, 1998, for examples of purely qualitative fixed design studies, and of others using both qualitative and quantitative methods, in the field of health promotion evaluation). 82 REAL WORLD RESEARCH FIXED DESIGNS 83 It has already been argued in Chapter 3 that there can be considerable advantage in linking research to theory. With fixed designs, that link is straightforward: fixed designs are theory-driven. The only way in which we can, as a fixed design requires, specify in advance the variables to be included in our study and the exact procedures to be followed, is by having a reasonably well-articulated theory of the phenomenon we are researching. Put in other terms, we must already have a substantial amount of conceptual understanding about a phenomenon before it is worthwhile following the risky strategy of investing precious time and resources into such designs. This may be in the form of a model, perhaps represented pictorially as a conceptual framework as discussed in Chapter 3. Such models help to make clear the multiple and complex causality of most things studied in social research. Hard thinking to establish this kind of model before data collection is invaluable. It suggests the variables we should target: those lo be manipulated or controlled in an experiment and those to be included in non-experimental studies. In realist terms, this means that you have a pretty clear idea about the mechanisms likely to be in operation and the specific contexts in which they will, or will not, operate. You should also know what kind of results you are going to get, and how you will analyse them, before you collect the data. If the study does deliver the expected relationships, it provides support for the existence of these mechanisms and their actual operation in this study. This does not preclude your following up interesting or unexpected patterns in the data. They may suggesl the existence of other mechanisms which you had not thought of. Large-scale studies can afford to draw the net relatively wide. Large numbers of participants can be involved: several subgroups established, perhaps a range of different contexts covered, more possible mechanisms tested out. For the small-scale studies on which this text focuses, and in real world settings where relevant previous work may be sparse or non-existent, there is much to be said for a multi-strategy design (see Chapter 7) with an initial flexible design stage which is primarily exploratory in purpose. This seeks to establish, both from discussions with professionals, participants and others involved in the initial phase, and from the empirical data gathered, likely 'bankers' for mechanisms operating in the situalion, contexts where they are likely to operate and the characteristics of participants best targeted. The second fixed design phase then incorporates a highly focused survey, experiment or other fixed design study. Even with a preceding exploratory phase, fixed designs should always be piloted. You carry out a mini-version of the study before committing yourself to the big one. This is, in part, so you can sort out technical matters to do with methods of data collection to ensure that, say, the questions in a questionnaire are understandable and unambiguous. Just as importantly, it gives you a chance to ensure you are on the right lines conceptually. Have you 'captured' the phenomenon sufficiently well for meaningful data to be collected? Do you really have a good grasp of the relevant mechanisms and contexts? This is an opportunity to revise the design: to sharpen up the theoretical framework; develop the research questions; rethink the sampling strategy. And perhaps to do a further pilot. Also, while the central part of what you are going to do with your data should be thought through in advance, i.e. you are primarily engaged in a confirmatory task in fixed designs, there is nothing to stop you also carrying out exploratory data analysis (see Chapter 16, p. 419). It may be that there are unexpected patterns or relationships which reveal inadequacies in your initial understanding of the phenomenon. You cannot expect to confirm these revised understandings in the same study but thev niay well provide an important breakthrough suggesting a basis for further research. This chapter seeks to provide a realist-influenced view of fixed design research. There is coverage of true experimental, single-case experimental, quasi-experimental and non-experimental fixed designs. The differences between these types of design are brought out and some examples given. In the 'true' experiment, two or more groups are set up, with random allocation of people to the groups. The experimenter then actively manipulates the situation so that different groups get different treatments. Single-case design,1 as the name suggests, focuses on individuals rather than groups and effectively seeks to use persons as their own control, with their being subjected to different experimentally manipulated conditions at different times. Quasi-experiments lack the random allocation to different conditions found in true experiments. Non-experimental fixed designs do not involve active manipulation of the situation by the researcher. However, the different fixed designs are similar in many respects, as discussed in the following section. General features of fixed designs Fixed designs are usually concerned with aggregates: with group properties and with general tendencies. In traditional experiments, results arc reported in terms of group averages rather than what individuals have done. Because of this, there is a danger of the ecological fallacy - that is of assuming that inferences can be made about individuals from such aggregate data (Connolly, 2006). Single-case experimental designs are an interesting exception to this rule. Most non-experimental fixed research also deals with averages and proportions. The relative weakness of fixed designs is that they cannot capture the subtleties and complexities of individual human behaviour. Even single-case designs arc limited to quantitative measures of a single simple behaviour or, at most, a small number of such behaviours. The advantage of fixed designs is in being able to transcend individual differences and identify patterns and processes which can be linked to social structures and group or organizational features. Fixed designs traditionally assume a 'detached' researcher to guard against the researcher having an effect on the findings of the research. Researchers typically remain 1 Single-case fixed designs are, typically, very different from case studies. The latter are almost always of flexible design using several data collection methods (see Chapter 6, p. 135). However, it would be feasible to have a multi-strategy design which incorporated a single-case fixed design element within a case study. 84 REAL WORLD RESEARCH FIXED DESIGNS 85 at a greater physical and emotional distance from the study than those using flexible designs. In experimental research, the experimenter effect is well known. It is now widely acknowledged that the beliefs, values and expectations of the researcher can influence the research process at virtually all of its stages (Rosenthal, 1976, 2003; Rosnow and Rosenthal, 1997). Hence the stance now taken is that all potential biases should be brought out into the open by the researcher and every effort made to counter them. There are often long periods of preparation and design preliminaries before data collection and a substantial period of analysis after data collection. This docs not, of course, in any way absolve the researcher from familiarity with the topic of the research, which is typically acquired vicariously from others, or from a familiarity with the literature, or from an earlier, possibly qualitative, study. There will be involvement during the data collection phase, but with some studies such as postal surveys this may be minimal. Your personal preference for a relatively detached, or a more involved, style of carrying out research is a factor to take into account when deciding the focus of your research project and the selection of a fixed or flexible design. It has been fashionable in some academic and professional circles to denigrate the contribution of quantitative social research. As Bentz and Shapiro (1998) comment, in a text primarily covering qualitative approaches: There is currently an antiquantitative vogue in some quarters, asserting or implying that quantitative research is necessarily alienating, positivistic, dehumanizing, and not 'spiritual'. In fact, it is clear that using quantitative methods to identify causes of human and social problems and suffering can be of immense practical, human, and emancipatory significance, and they arc not necessarily positivistic in orientation. For example, quantitative methods are currently being used in the analysis of statistics to help identify the principal causes of rape. Baron and Straus have analyzed police records on rape quantitatively to look at the relative roles of gender inequality, pornography, gender cultural norms about violence, and social disorganization in causing rape (1989). Clearly knowing the relative contribution of these factors in causing rape would be of great significance for social policy, economic policy, the law, socialization, and the criminal justice system, and it is difficult to see how one would arrive at compelling conclusions about this without quantitative analysis (p. 124). They also point out that quantitative and experimental methods have been used to understand social problems and criticize prevailing ideologies in a way which contributes to social change and the alleviation of human suffering (i.e. for emancipatory purposes as discussed in Chapter 2, p. 39). Oakley (2000) suggests that this antipathy to quantitative, and in particular experimental, research derives in part from the influence of feminist methodologists who have viewed quantitative research as a masculine enterprise, contrasting it with qualitative research which is seen as embodying feminine values. She rejects this stereotyping and in her own work has made the transition from being a qualitative researcher to a staunch advocate of true randomized experiments. Establishing trustworthiness in fixed design research ■Tjys is to a considerable extent a matter of common sense. Have you done a good, thorough and honest job? Have you tried to explore, describe or explain in an open and unbiased way? Or are you more concerned with delivering the required answer or selecting the evidence to support a case? If you can't answer these questions with yes, yes and no, respectively, then your findings are essentially worthless in research terms. However, pure intentions do not guarantee trustworthy findings. You persuade others bv clear, well-written and presented, logically argued accounts which address the questions that concern them. These are all issues to which we will return in Chapter 18 on reporting. This is not simply a presentational matter, however. Fundamental issues about the research itself are involved. Two key ones are validity and generalizability. Validity, from a realist perspective, refers to the accuracy of a result. Does it capture the real state of affairs? Are any relationships established in the findings true, or due to the effect of something else? Ccneralizability refers to the extent to which the findings of the research are more generally applicable, for example in other contexts, situations or times, or to persons other than those directly involved. Suppose that we have been asked to carry out some form of research study to address the research question: Is educational achievement in primary schools improved by the introduction of standard assessment tests at the age of seven? Leave on one side issues about whether or not this is a sensible question and about the most appropriate way to approach it. Suppose that the findings of the research indicated a 'yes' answer - possibly qualified in various ways. In other words, we measure educational achievement, and it appears to increase following the introduction of the tests. Is this relationship what it appears to be - is there a real, direct, link between the two things? Central to the scientific approach is a degree of scepticism about our findings and their meaning (and even greater scepticism about other people's). Can we have been fooled so that we are mistaken about them? Unfortunately, yes - there is a wide range of possibilities for confusion and error. Reliability Some problems come under the heading of reliability. This is the stability or consistency with which we measure something. For example, consider how we are going to assess 86 REAL WORLD RESEARCH FIXED DESIGNS educational achievement. This is no easy task. Possible contenders, each with their own problems, might include: • a formal 'achievement test' administered at the end of the primary stage of schooling; • teachers' ratings, also at the end of the primary stage; or • the number, level and standard of qualifications gained throughout life. Let's say we go for the first. It is not difficult to devise something which will generate a score for each pupil. However, this might be unreliable in the sense that if a pupil had, say, taken it on a Monday rather than a Wednesday, she would have got a somewhat different score. There are logical problems in assessing this, which can be attacked in various ways (e.g. by having parallel forms of the test which can be taken at different times, and their results compared). These arc important considerations in test construction - see Chapter 12 for further details. Unless a measure is reliable, it cannot be valid. However, while reliability is necessary, it is not sufficient. A test for which all pupils always got full marks would be totally consistent but would be useless as a way of discriminating between the achievements of different pupils (there could of course be good educational reasons for such a test if what was important was mastery of some material). Unreliability may have various causes, including: Participant error In our example the pupil's performance might fluctuate widely from occasion to occasion on a more or less random basis. Tiredness due to late nights could produce changes for different times of the day, pre-menstrual tension monthly effects or hay fever seasonal ones. There are tactics which can be used to ensure that these kinds of fluctuations do not bias the findings, particularly when specific sources of error can be anticipated (e.g. keep testing away from the hay fever season). Participant bias This is more problematic from a validity point of view. It could be that pupils might seek to please or help their teacher, knowing the importance of 'good results' for the teacher and for the school, by making a particularly strong effort at the test. Or for disaffected pupils to do the reverse. Here it would be very difficult to disentangle whether this was simply a short-term effect which had artificially affected the test scores, or a more long-lasting side-effect of a testing-oriented primary school educational system. Consideration of potential errors of these kinds is part of the standard approach to experimental design. Observer error This would be most obvious if the second approach, making use of teachers' ratings as the measure of pupil achievement, had been selected. These could also lead to more or I less random errors if, for example, teachers made the ratings at a time when they were tired or overstretched and did the task in a cursory way. Again, there are pretty obvious remedies (perhaps involving the provision of additional resources). Observer bias This is also possible and, like participant bias, causes problems in interpretation. It could be that teachers in making the ratings were, consciously or unconsciously, biasing the ratings they gave in line with their ideological commitment either in favour of or against the use of standard assessment tests. This is also a well-worked area methodologically, with procedures including 'blind' assessment (the ratings being made by someone in ignorance of whether the pupil had been involved in standard assessment tests) and the use of two independent assessors (so that inter-observer agreements could be computed). Further details are given in Chapter 13, p. 341. Types of validity If you have made a serious attempt to get rid of participant and observer biases and have demonstrated the reliability of whatever measure you have decided on, you will be making a pretty good job of measuring something. The issue then becomes - docs it measure what you think it measures? In the jargon - does it have construct validity! There is no easy, single, way of determining construct validity. At its simplest, one might look for what seems reasonable, sometimes referred to as face validity. An alternative looks at possible links between scores on a test and the third suggested measure - the pupils' actual educational achievement in their later life (i.e. how well does it predict performance on the criterion in question, or predictive criterion validity). These and other aspects of construct validity are central to the methodology of testing. The complexities of determining construct validity can lead to an unhealthy concentration on this aspect of carrying out a research project. For many studies there is an intuitive reasonableness to assertions that a certain approach provides an appropriate measure. Any one way of measuring or gathering data is likely to have its shortcomings, which suggests the use of multiple methods of data collection. One could use all three of the approaches to assessing educational achievement discussed above (achievement tests, teachers' ratings and 'certificate counting") rather than relying on any one measure. This is one form of triangulation - see Chapter 6, p. 158. Similar patterns of findings from very different methods of gathering data increase confidence in the validity of the findings. Discrepancies between them can be revealing in their own right. It is important to realize, however, that multiple methods do not constitute a panacea for all methodological ills. They raise their own theoretical problems; and they may in many cases be so resource-hungry as to be impracticable (see Chapter 14, p. 385). Let us say that we have jumped the preceding hurdle and have demonstrated satisfactorily that we have a valid measure of educational achievement. However, a finding that achievement increases after the introduction of the tests does not necessarily mean that it increased because of the tests. This gets us back to the consideration of causation which occupied us in Chapter 2 (see p. 32). REAL WORLD RESEARCH FIXED DESIGNS 89 What we would like to do is to find out whether the treatment (introduction of the tests) actually caused the outcome (the increase in achievement). If a study can plausibly demonstrate this causal relationship between treatment and outcome, it is referred to as having internal validity. This term was introduced by Campbell and Stanley (1963), who provided an influential and widely used analysis of possible 'threats' to internal validity. These threats are other things that might happen which confuse the issue and make us mistakenly conclude that the treatment caused the outcome (or obscure possible relationships between them). Suppose, for example, that the teachers of the primary school children involved in the study are in an industrial dispute with their employers at the same time that testing is introduced. One might well find, in those circumstances, a decrease in achievement related to the disaffection and disruption caused by the dispute, which might be mistakenly ascribed to the introduction of tests per se. This particular threat is labelled as 'history' by Campbell and Stanley - something which happens at the same time as the treatment. There is the complicating factor here that a case might be made for negative effects on teaching being an integral part of the introduction of formal testing into a child-centred primary school culture, i.e. that they are part of the treatment rather than an extraneous factor. However, for simplicity's sake, let's say that the industrial dispute was an entirely separate matter. Campbell and Stanley (1963) suggested eight possible threats to internal validity which might be posed by other extraneous variables. Cook and Campbell (1979) have developed and extended this analysis, adding a further four threats. All 12 are listed in Box 5.1 (Onwuegbuzie and McLean, 2003, expand this list to 22 threats at the research design and data collection stage, with additional threats present at the data analysis and interpretation stages). The labels used for the threats are not to be interpreted too literally - mortality doesn't necessarily refer to the death of a participant during the study (though it might). Not all threats are present for all designs. For example, the 'testing' threat is only there if a pre-test is given, and in some cases, its likelihood, or perhaps evidence that you had gained from pilot work that a 'testing' effect was present, would cause you to avoid a design involving this feature. Threats to internal validity History. Things that have changed in the participants' environments other than those forming a direct part of the enquiry (e.g. occurrence of major air disaster during study of effectiveness of desensitization programme on persons with fear of air travel). Testing. Changes occurring as a result of practice and experience gained by participants on any pre-tests (e.g. asking opinions about factory farming of animals before some intervention may lead respondents to think about the issues and develop more negative attitudes). 3. Instrumentation. Some aspect(s) of the way participants were measured changed between pre-test and post-test (e.g. raters in observational study using a wider or narrower definition of a particular behaviour as they get more familiar with the situation). i. Regression. If participants are chosen because they are unusual or atypical (e.g. high scorers), later testing will tend to give less unusual scores ('regression to the mean'); e.g. an intervention programme with pupils with learning difficulties where 10 highest-scoring pupils in a special unit are matched with 10 of the lowest-scoring pupils in a mainstream school - regression effects will tend to show the former performing relatively worse on a subsequent test; see further details on p. 113. 5. Mortality. Participants dropping out of the study (e.g. in a study of an adult literacy programme - selective drop-out of those who are making little progress). 6. Maturation. Growth, change or development in participants unrelated to the treatment in the enquiry (e.g. evaluating extended athletics training programme with teenagers - intervening changes in height, weight and general maturity). 7. Selection. Initial differences between groups prior to involvement in the enquiry (e.g. through use of arbitrary non-random rule to produce two groups: ensures they differ in one respect which may correlate with others). 8. Selection by maturation interaction. Predisposition of groups to grow apart (or together if initially different); e.g. use of groups of boys and girls initially matched on physical strength in a study of a fitness programme. 9. Ambiguity about causal direction. Does A cause B, or B cause A? (e.g. in any correlational study, unless it is known that A precedes B, or vice versa - or some other logical analysis is possible). 10. Diffusion of treatments. When one group learns information or otherwise inadvertently receives aspects of a treatment intended only for a second group (e.g. in a quasi-experimental study of two classes in the same school). 11. Compensatory equalization of treatments. It one group receives 'special' treatment there will be organizational and other pressures for a control group to receive it (e.g. nurses in a hospital study may improve the treatment of a control group on grounds of fairness). 12. Compensatory rivalry. As above but an effect on the participants themselves (referred to as the 'John Henry' effect after the steel worker who killed himself through over-exertion to prove his superiority to the new steam drill); e.g. when a group in an organization sees itself under threat from a planned change in another part of the organization and improves performance. (after Cook and Campbell, 1979, pp. 51-5) 90 REAL WORLD RESEARCH FIXED DESIGNS 91 In general design terms, there arc two strategies to deal with these threats. If you know what the threat is, you can take specific steps to deal with it. For example, the use of comparison groups who have the treatment at different times or places will help to neutralize the 'history' threat. This approach of designing to deal with specific threats calls for a lot of forethought and is helped by knowledge and experience of the situation that you are dealing with. However, you can only hope to deal with a fairly small number of predefined and articulated threats in this way. In flexible design research it is feasible to address such threats to validity after the research has begun, as discussed in the following chapter. The alternative strategy, central to the design philosophy of true experiments as developed by Fisher (1935,1960), is to use randomization, which helps offset the effect of a myriad of unforeseen factors. While true experiments are therefore effective at dealing with these threats, they are by no means totally immune to them. The threats have to be taken very seriously with quasi-experimental designs, and non-experimental fixed designs, and a study of the plausibility of the existence of various threats provides a very useful tool in interpretation. The intcrpretability of designs in the face of these threats depends not only on the design itself but also on the specific pattern of results obtained. If you rule out these threats, you have established internal validity. You will have shown (or, more strictly, demonstrated the plausibility) that a particular treatment caused a certain outcome. Note however, that while an experiment can be effective in doing this, it tells you nothing about the actual mechanisms by which it did so, except insofar as you have anticipated possible alternative mechanisms and controlled for them in your design. As Shadish, Cook and Campbell (2002) put it: The unique strength of experimentation is in describing the consequences attributable to deliberately varying a treatment. We call this causal description. In contrast, experiments do less well in clarifying the mechanisms through which and the conditions under which that causal relationship holds - what we call causal explanation (p. 9, emphases in original). This limitation of experiments is central to Pawson and Tilley's (1997) critique of randomized controlled trials (RCTs) discussed later in the chapter (p. 100). It is important to appreciate that 'validity threats are made implausible by evidence, not methods; methods are only a way of getting evidence that can help you rule out these threats' (Maxwell, 2005, p. 105, emphasis in original). The view that methods themselves can guarantee validity is characteristic of the discredited positivist approach and is itself untenable. Whatever method is adopted there is no such guarantee. The realist assumption is that all methods are fallible: 'a realist conception of validity . . . sees the validity of an account as inherent, not in the procedures used to produce and validate it, but in its relationship to those things that it is intended to be an account of (Maxwell, 1992, p. 281, emphasis in original). See also House (1991). The whole 'threat' approach sits well with a realist analysis, which is not surprising as Campbell was an avowed realist (see, however, House, Mathison and McTaggart, 1989, which makes a case for his approach, particularly in Cook and Campbell, 1979, as being essentially eclectic, taking aspects from a whole range of theoretical positions). These threats tend to be only discussed in relation to experimental and quasi-experimental designs. However, validity is an important issue for all types of fixed designs and Onwuegbuzie and McLean (2003) have expanded Campbell and Stanley's framework for use with non-experimental fixed designs. Generalizability Sometimes one is interested in a specific finding in its own right. You may have shown, say, that a new group workshop approach leads, via a mechanism of increases in self-esteem, to subsequent maintained weight loss in obese teenagers at a residential unit. This may be the main thing that you are after if you are only concerned with whether or not the approach works with that specific group of individuals at the unit. If, however, you are interested in what would happen with other client groups or in other settings, or with these teenagers when they return home, then you need to concern yourself with the generalizability of the study. Campbell and Stanley (1963) used the alternative term 'external validity'. Both this and generalizability are in common use. Internal and external validity tend to be inversely related in the sense that the various controls imposed in order to bolster internal validity often fight against generalizability. In particular, the fact that the laboratory is the controlled environment par excellence makes results obtained there very difficult to generalize to any settings other than close approximations to laboratory conditions. This aspect is sometimes referred to as a lack of ecological validity, i.e. findings from laboratory research may not be relevant to real world situations. If your teenagers are a representative sample from a known population, then the generalization to that population can be done according to rules of statistical inference (note, however, that experimenters rarely take this requirement seriously). Generalizability to other settings or to other client groups has to be done on other, non-statistical, bases. I.eCompte and Goetz (1982) have provided a classification of threats to external validity similar to that given for internal validity, which is listed in Box 5.2. Threats to generalizability (external validity) 1. Selection. Findings being specific to the group studied. 2. Setting. Findings being specific to, or dependent on, the particular context in which the study took place. 3. History. Specific and unique historical experiences may determine or affect the findings. 4. Construct effects. The particular constructs studied may be specific to the group studied. (after LeCompte and Goetz, 1982) 92 REAL WORLD RESEARCH FIXED DESIGNS 93 There are two general strategies for showing that these potential threats are discountable: direct demonstration and making a case. Direct demonstration involves you, or someone else who wishes to apply or extend your results, carrying out a further study involving some other type of participant, or in a different setting, etc. Making a case is more concerned with persuading that it is reasonable for the results to generalize, with arguments that the group studied, or setting, or period is representative (i.e. it shares certain essential characteristics with other groups, settings or periods and hence that the same mechanism is likely to apply in those also). This sorting out of the wheat of what is central to your findings from the chaff of specific irrelevancies can be otherwise expressed as having a theory or conceptual framework to explain what is going on. Such a theory or conceptual framework may be expressed in formal and explicit terms by the presenter of the findings as discussed in Chapter 3 (p. 67). A study may be repeated with a different target group or in a deliberately different setting to assess the generalizability of its findings. There is a strong case, particularly with important or controversial findings, for attempting a replication of the original study. While in practice no replication is ever exact, an attempt to repeat the study as closely as possible which reproduces the main findings of the first study is the practical test of the reliability of your findings. Whether it is worthwhile to devote scarce resources to replication depends on circumstances. Replication is nowhere near as common as it should be in social research. In consequence, we may well be seeking to build on very shaky foundations. The argument is sometimes put that as validity depends on reliability then we should simply worry about the validity; if we can show that validity is acceptable then, necessarily, so is reliability. The problem here is that it becomes more difficult to disentangle what lies behind poor validity. It might have been that the findings were not reliable in the first place. It is easy to guarantee unreliability. Carelessness, casualness and lack of commitment on the part of the researcher help, as does a corresponding lack of involvement by participants. Reliability is essentially a quality control issue. Punctilious attention to detail, perseverance and pride in doing a good job are all very important, but organization is the key. While validity and generalizability arc probably the central elements in establishing the value and trustworthiness of a fixed design enquiry, there are other aspects to which attention should be given. They include, in particular, objectivity and credibility. Objectivity The traditional, scientific approach to the problem of establishing objectivity is exemplified by the experimental approach. The solution here is seen to be to distance the experimenter from the experimental participant, so that any interaction that takes place between the two is formalized - indeed, some experimenters go so far as not only to have a standardized verbatim script but even to have it delivered via a tape-recorder. To some, this artificiality is lethal for any real understanding of phenomena involving people in social settings. An alternative is to erect an objective/subjective contrast. 'Objective' is taken to refer to what multiple observers agree to as a phenomenon, in contrast to the subjective experience of the single individual. In other words, the criterion for objectivity is intersubjective agreement. This stance tends to go along with an involved rather than a detached investigator, and notions of 'triangulation' (see Chapter 6, p. 158) where the various accounts of participants with different roles in the situation are obtained by investigators who, by combining them with their own perceptions and understandings, reach an agreed and negotiated account. Formulated in terms of threats, objectivity can be seen to be at risk from a methodology where the values, interests and prejudices of the enquirer distort the response (experiment being for some the answer, and for others an extreme version of the problem). Relying exclusively on data from a single individual can similarly threaten objectivity. And again, a project carried out for an ideological purpose other than that of research itself clearly threatens objectivity. Credibility Shipman (1997) has suggested that we should go beyond the traditional concerns for reliability, validity and generalizability when considering the trustworthiness of research and also ask whether there is sufficient detail on the way the evidence is produced for the credibility of the research to be assessed. We cannot satisfy ourselves about the other concerns unless the researcher provides detailed information on the methods used and the justification for their use. This is a responsibility which has always been accepted by those using experimentation. The report of an experiment in a journal article carries an explicit requirement that sufficient detail must be given about procedures, equipment, etc. for the reader to be able to carry out an exact replication of the study. This kind of requirement may be rejected as scientistic by some practitioners using flexible designs, relying largely on qualitative data. However, it could be argued that there is a strong case for such research calling for an even greater emphasis on explaining the methods used and the warrant for the conclusions reached, because of the lack of codification of the methods of data collection or of approaches to analysis. This need is increasingly recognized in the design of qualitative research (e.g. Marshall and Rossman, 2006). However, there is considerable debate about the applicability of concepts such as reliability and validity, and the possibility and appropriateness of objectivity, when assessing the trustworthiness of flexible qualitative research. The following chapter pays considerable attention to this issue. Experimental fixed designs //, following your reading of the previous chapter, it appears possible that an experimental fixed design may be appropriate for your project and its research questions, then perusal of this section should help in choosing a specific experimental design. However, before confirming that choice, it will be necessary to read the chapters in Part III of this book to help select appropriate methods of collecting data, and Chapter 16 to establish hoiv you will analyse the data after it has been collected. 94 REAL WORLD RESEARCH To 'experiment', or to 'carry out an experiment' can mean many things. In very general terms, to be experimental is simply to be concerned with trying new things - and seeing what happens, what the reception is. Think of 'experimental' theatre, or an 'experimental' car, or an 'experimental' introduction of a mini-roundabout at a road junction. There is a change in something, and a concern for the effects that this change might have on something else. However, when experimentation is contrasted with the other research designs, a stricter definition is employed, usually involving the control and active manipulation of variables by the experimenter. Experimentation is a research strategy involving: • the assignment of participants to different conditions; • manipulation of one or more, variables (called 'independent variables', JVs) by the experimenter; • the measurement of the effects of this manipulation on one or more other variables (called 'dependent variable's, DVs); and • the control of all other variables. Note the use of the term variable. This is widespread within the experimental strategy and simply denotes something which can vary. However, it carries within it the notion that there are certain specific aspects which can be isolated and which retain the same meaning throughout the study. The experimental strategy is a prime example of a fixed research design. You need to know exactly what you are going to do before you do it. It is a precise tool that can only map a very restricted range. A great deal of preparatory work is needed (either by you or someone else) if it is going to be useful. An experiment is an extremely focused study. You can only handle a very few variables, often only a single independent variable and a single dependent variable. These variables have to be selected with extreme care. You need to have a well-developed theory or conceptual framework. The major problem in doing experiments in the real world is that you often only have, at best, a pretty shaky and undeveloped theory; you don't know enough about the thing you are studying for this selectivity of focus to be a sensible strategy. This need to know what you are doing before you do it is a general characteristic of fixed research designs, but experiments are most demanding in this respect because of their extreme selectivity. Laboratory experiments Real world research seeks to address social problems and issues of current concern and to find ways of addressing such problems. Experiments typically take place in special places known as laboratories. In principle, just as certain kinds of academic research can be carried out in real world settings, which anthropologists and other social scientists refer to as 'field' settings, so research with a real world problem-solving focus might be carried out in a laboratory. However, the necessary artificiality of laboratories can limit their value. Aronson, Brewer and Carlsmith (1985) have distinguished two senses in which laboratory FIXED DESIGNS 95 experimentation may lack realism (incidentally, nothing to do with realist philosophy). One is experimental realism. In this sense an experiment is realistic if the situation which it presents to the participant is realistic, if it really involves the participants (then referred to as 'subjects'), and has impact upon them. In the well-known Asch (1956) experiment on conformity, subjects made what seemed to them to be straightforward judgements about the relative length of lines. These judgements were contradicted by others in the room whom they took also to be subjects in the experiment. This study showed experimental realism in the sense that subjects were undergoing an experience which caused them to show strong signs of tension and anxiety. They appeared to be reacting to the situation in the same realistic kind of way that they would outside the laboratory. However, it might be argued that the Asch study lacks what Aronson et al. term mundane realism (see also Aronson, Wilson and Akert, 2007). That is, the subjects were encountering events in the laboratory setting which were very unlikely to occur in the real world. Asch, following a common strategy in laboratory experimentation, had setup a verv clearly and simply structured situation to observe the effects of group pressure on individuals. The real life counterpart, if one could be found, would be more complex and ambiguous, and in all probability would result in findings which were less conclusive. (The ethics of Asch's study are a different matter - sec Chapter 9.) Notwithstanding worries about the realism of laboratory-based studies, they remain popular with researchers, including those with real world concerns. After a review of the two approaches, Levitt and List (2006) conclude that 'the sharp dichotomy sometimes drawn between lab experiments and data generated in natural settings is a false one. Each approach has strengths and weaknesses, and a combination of the two is likely to provide deeper insights than either in isolation' (p. i). Bias in experiments Simplification of the situation, which is central to the experimental approach, may lead to clear results, but it does not protect against bias in them. The effects of two types of bias have been investigated in some detail. These are the demand characteristics of the experimental situation, and experimenter expectancy effects. In a very general sense, these are Hie consequences of the participants and the experimenters being human beings. Bias due to demand characteristics occurs because participants know that they are in an experimental situation, know that they are being observed, know that certain things are expected or demanded of them (One, 1962; Strohmetz, 2008). Hence the way in which they respond is some complex amalgam of the experimental manipulation and their interpretation of what effect the manipulation is supposed to have on them. Their action based on that interpretation is likely to be cooperative but could well be obstructive. Even in situations where participants are explicitly told that there are no right or wrong answers, that one response is as valued as another, participants are likely to feel that certain responses show themselves in a better light than others. There is evidence that persons who volunteer for experiments are more sensitive to these effects than those who are required to be involved (Rosenthal and Rosnow, 1975; Rosnow, 1993). However, Berkowitz and Troccoli (1986) are not persuaded of the widespread existence of biasing effects from demand characteristics. 96 REAL WORLD RESEARCH The classic ploy to counteract this type of bias is deception by the experimenter. Participants are told that the experiment is about X when it is really about Y. X is made to appear plausible and is such that if the participants modify their responses in line with, or antagonistically to, what the experimenter appears to be after, there is no systematic effect on the experimenter's real area of interest. As discussed in Chapter 9 (p. 205) increasing sensitivity to the ethical issues raised by deceiving participants means that this ploy, previously common in some areas of social psychology, is now looked on with increasing suspicion. Experimenter expectancy effects are reactive effects produced by the experimenters who have been shown, in a wide variety of studies, to bias findings (usually unwittingly) to provide support for the experimental hypothesis. Rosenthal and Rubin (1980) discuss the first 345 such studies! The effects can be minimized by decreasing the amount of interaction between participant and experimenter: using taped instructions, automated presentation of materials, etc. However, for many topics (apart from studies in areas such as human-computer interaction) this further attenuates any real world links that the laboratory experiment might possess. Double-blind procedures can also be used, where data collection is subcontracted so that neither the person working directly with the participants, nor the participants themselves, are aware of the hypothesis being tested. Knowledge about determinants of laboratory behaviour (demand characteristics, etc.) can be of value in real life settings. For example, police identity parades can be thought of as experiments, and suggestions for improving them have been based on this knowledge and on general principles of experimental design (Wells et al., 1998). Experiments in natural settings The laboratory is essentially a place for maximizing control over extraneous variables. Move outside the laboratory door and such tight and comprehensive control becomes impossible. The problems of experimentation discussed in the previous section remain. Any special conditions marking out what is happening as 'an experiment' can lead to reactive effects. The classic demonstration of such effects comes from the well-known series of experiments carried out at the Hawthorne works of the Western Electric Company in the USA in the 1920s and 1930s (Dickson and Roethlisberger, 2003), and hence called the 'Hawthorne effect'. Their studies investigating changes in length of working day, heating, lighting and other variables, found increases in productivity during the study which were virtually irrespective of the specific changes. The workers were in effect reacting positively to the attention and special treatment given by the experimenters. Re-evaluations of the original study have cast serious doubt on the existence of the effect (Kompier, 2006) and the interpretation of the original study (Wickstrom and Bendix, 2000). However, new, more strictly controlled, studies have demonstrated the existence of (relatively small) Hawthorne effects (McCarney et al, 2007; Verstappen et al, 2004). Problems in carrying out experiments in natural settings are listed in Box 5.3. There are gains, of course. Notwithstanding some degree of artificiality, and related reactivity, generalizability to the 'real world' is almost self-evidently easier to achieve when the FIXED DESIGNS 97 Problems in carrying out experiments in natural settings Moving outside the safe confines of the laboratory may well be traumatic. Particular practical difficulties include: 1. Random assignment. There are practical and ethical problems of achieving random assignment to different experimental treatments or conditions (e.g. in withholding the treatment from a no-treatment control group). Random assignment is also often only feasible in atypical circumstances or with selected respondents, leading to questionable generalizability. Faulty randomization procedures arc not uncommon (e.g. when procedures are subverted through ignorance, kindness, etc.). For small samples of the units being randomly assigned, sampling variability is a problem. Treatment-related refusal to participate or continue can bias sampling. 2. Validity. The actual treatment may be an imperfect realization of the variable(s) of interest, or a restricted range of outcomes may be insensitively or imperfectly measured, resulting in questionable validity. A supposed no-treatment control group may receive some form of compensatory treatment, or be otherwise influenced (e.g. through deprivation effects). 3. Ethical issues. There are grey areas in relation to restricting the involvement to volunteers, the need for informed consent and the debriefing of participants after the experiment. Strict adherence to ethical guidelines is advocated, but this may lead to losing some of the advantages of moving outside the laboratory (e.g. leading to unnecessary 'obtrusiveness', and hence reactivity, of the treatment). Common sense is needed. If you are studying a natural experiment where some innovation would have taken place whether or not you were involved, then it may simply be the ethical considerations relating to the innovation which apply (fluoridation of water supplies raises more ethical implications for users than an altered design of a road junction). See Chapter 9 for further discussion. 4. Control. Lack of control over extraneous variables may mask the effects of treatment variables, or bias their assessment. Interaction between participants may vitiate random assignment and violate their assumed independence. study takes place outside the laboratory in a setting which is almost real 'real life'. Note, however, that there are claims of good generalization of some findings from laboratory to field settings (Locke, 1986). Other advantages are covered in Box 5.4. Experimental designs as such are equally applicable both inside and outside laboratories. The crucial feature of so-called 'true' experiments (distinguishing them from 'quasi-experiments' discussed below) is random allocation of participants to experimental conditions. If you can find a feasible and ethical means of doing this 98 REAL WORLD RESEARCH / AHvantacrec Advantages in carrying out experiments in natural settings Compared to a laboratory, natural settings have several advantages: 1. Ceneralizability. The laboratory is necessarily and deliberately an artificial setting where the degree of control and isolation sets it apart from real life. If we are concerned with generalizing results to the real world, the task is easier if experimentation is in a natural setting. Much laboratory experimentation is based on student participants, making generalization to the wider population hazardous. Although this is not a necessary feature of laboratory work, there is less temptation to stick to student groups when experiments take place in natural settings. 2. Validity. The demand characteristics of laboratory experiments, where participants tend to do what they think you want them to do, arc heightened by the artificiality and isolation of the laboratory situation. Real tasks in a real world setting are less prone to this kind of game playing. So you are more likely to be measuring what you think you are measuring. 3. Participant availability. It is no easy task to get non-student participants to come into the laboratory (although the development of pools of volunteers is valuable). You have to rely on them turning up. Although it depends on the type of study, many real life experiments in natural settings have participants in abundance, limited only by your energy and staying power - and possibly your charm. when planning a field experiment, then you should seriously consider carrying out a true experiment. The advantage of random allocation or assignment is that it allows you to proceed on the assumption that you have equivalent groups under the two (or more) experimental condi tions. This is a probabilistic truth, which allows you, among other things, to employ a wide battery of statistical tests of inference legitimately. It docs not guarantee that in any particular experiment the two groups will in fact be equivalent. No such guarantee is ever possible, although the greater the number of persons being allocated, the more confidence you can have that the groups do not differ widely. An alternative way of expressing this advantage is to say that randomization gets rid (probabilistically at least) of the selection threat to internal validity (see Box 5.1, p. 88). That is, it provides a defence against the possibility that any change in a dependent variable is caused not by the independent variable but by differences in the characteristics of the two groups. Other potential threats to internal validity remain and the discussion of some of the designs that follows is largely couched in terms of their adequacy, or otherwise, in dealing with these threats. FIXED DESIGNS 99 domized controlled trials and the 'gold standard' A randomized controlled trial (RCT) is a type of experiment where participants are randomly allocated, either to a group who receive some form of intervention or treatment, or to a control group who don't. Use of RCTs is a central feature of the evidence-based movement currently highly influential in many fields of social research. Proponents argue that it is the 'gold standard' - the scientific method of choice, primarily because they consider it to be the best means of assessing whether or not the intervention is effective. There is a growing tendency in some circles to equate the doing of science with the carrying out of RCTs. For example, the US Department of Education's Institute of Education Sciences in 2003 (www.eval.org/doe.fedrcg.htm) made it clear that it would privilege applications for funding of applied research and evaluation which used RCTs (with a grudging acceptance of other experimental approaches when RCTs were not feasible) in the interests of using 'rigorous scientifically based research methods'. This has sparked a heated debate among applied social researchers on 'what counts as credible evidence in applied research and evaluation practice' (Donaldson, Christie and Mark, 2009). Privileging RCTs in this way is a serious distortion of the nature of scientific activity. It is historically inaccurate and carries with it an inappropriately narrow view of what constitutes evidence. A review of practices in the natural sciences reveals the minor role played by RCTs. Phillips (2005) concludes that: One cannot help but be struck by the huge range of activities engaged in by researchers in the natural sciences, and the variety of types of evidence that have been appealed to: establishing what causal factors are operating in a given situation; distinguishing genuine from spurious effects; determining function; determining structure; careful description and delineation of phenomena; accurate measurement; development and testing of theories, hypotheses, and causal models; elucidation of the mechanisms that link cause with effect; testing of received wisdom; elucidating unexpected phenomena; production of practically important techniques and artefacts (p. 593). He gives a wide range of illustrative examples. In his view, any attempt to give a simple, single account of the nature of science appears quite arbitrary. Relying on RCTs, or any other specific methodology, as the criterion of scientific rigour 'detracts from the main question at hand when one is assessing an inquiry, which is this: Has the overall case made by the investigator been established to a degree that warrants tentative acceptance of the theoretical or empirical claims that were made?' (original emphasis). While the methodology used in a particular study is an important consideration it is the convincingness of the argument; how well the evidence is woven into the structure of the argument; how rigorously this evidence was gathered; and how well counterarguments and counter-claims are themselves countered or confronted with recalcitrant facts or data. loo REAL WORLD RESEARCH FIXED DESIGNS 101 The message here is not that RCTs should be avoided - they have an important rolgl and many practising real world researchers will be expected to be able to carry them out competently. However, they are by no means the only show in town. The website provides further discussion on the issues involved in social experimentation and the use of RCTs. Realist critique of RCTs Pawson and Tilley (1997, especially Chapter 2) elaborate the view that the methodology of the RCT is inappropria te for dealing with complex social issues. They consider that, as well as generating inconsistent findings, the concentration on outcomes does little or nothing to explain why an intervention has failed (or, in relatively rare cases, succeeded). Hence, there is not the cumulation of findings which would help to build up understanding. Experimentalists acknowledge the practical and ethical problems of achieving randomization of allocation to experimental and control groups in applied field experiments. Pawson and Tilley add causal problems to these perils. Allocation of participants to experimental or control groups by the experimenter removes that choice from the participants but 'choice is the very condition of social and individual change and not some sort of practical hindrance to understanding that change' (Pawson and Tilley, 1997, p. 36; emphasis in original). In their discussion of correctional programmes for prison inmates, they make the undeniable point that it is not the programmes themselves which work, but people cooperating and choosing to make them work. The traditional solution to this problem is to run volunteer-only experiments. Volunteers are called for, then assigned randomly to one of the two groups. The assumption is that motivation and cooperation will be the same in each of the groups. The reasonableness of this assumption will depend on the specific circumstances of the experiment. Pawson and Tilley illustrate, through an example of what they consider to be high-quality experimental evaluation research (Porperino and Robinson, 1995), the way in which participants' choice-making capacity cuts across and undermines a volunteer/non-volunteer distinction: The act of volunteering merely marks a moment in a whole evolving pattern of choice. Potential subjects will consider a program (or not), volunteer for it (or not), j co-operate closely (or not), stay the course (or not), learn lessons (or not), retain the lessons (or not), apply the lessons (or not). Each one of these decisions will be internally complex and take its meaning according to the chooser's circumstances. Thus the act of volunteering for a program such as 'Cog Skills' might represent an interest in rehabilitation, a desire for improvement in thinking skills, an opportunity for a good skive, a respite from the terror or boredom of the wings, an opening to display a talent in those reputedly hilarious role-plays, a chance to ogle a glamorous trainer, a way of playing the system to fast-track for early parole, and so on (p. 38). They back up this intuitive understanding of how prisoners find their way on to programmes by a detailed re-analysis of the findings. Their overall conclusion is that such volunteer-only experiments encourage us to make a pronouncement on whether a ramme wor|