tiiii Survey Question Construction INTRODUCTION ......., In. order to test: a hypothesis or answer a research question, survey researchers must '; measure- r.lie concepts they lvfercuie tts precisely as possible, striving always fur ihe least amount of error. In testing hypotheses, researchers use variables'to represent specific concepts. Survey questions tie the variables (traits or characteristics about:the population that vary from person to person) lo the theoretical concepts of interest. This . involves-paynig careftfl' attention to the design; of each questionnaire item' and; response .:;, ■ , option, so the survey instrument is as valid and reliable as it can be for the job it must do. This chapter elaborates some practical guidelines to aid in this important task. The chapter begins with a discussion of the types of concepts you may wish to measure. We then turn the focus to question development, including several topics and examples that ; ; , , 'Will help you generate precise and unbiased survey items. Following this, we address issues in the crafting of appropriate response categories for multiple choice questions, -,/:'-';'- and when to use open-ended responses instead. Finally, we conclude the chapter with ;-,' some closing remarks on sources of error in survey research and a summary checklist of major points lo aid in the design of your own survey research projects. ■ :':v" "V ' The types of measures developed for the survey instalment should ideally produce rele- vani; unbiased, error-free data. Therefore, the first step is to set concrete research goals at the outset of the design process. Make sure you can clearly answer the key questions: What are you trying to accomplish? What information needs to be collected in order to do so? How areyou going to collect this information? And how are you going to analyze the information collected? Next, make sure you can translate these needs into unambiguous questions and concepts that communicate clearly and directly with the wide range of respondents on whom you will rely for your clutu (Bmdburn, Sudman, and Wansinlc 2004). It is absolutely necessary that all respondents share a common interpretation of the meaning of each question, and furthermore, that this interpretation mirrors your intent as a researcher. The importance of the above questions cannot be overstated. Each element-research goals, data collection, and methods of analysis—shapes and constrains the others. For example, the design of questions and response options depends largely on the mode of delivery of the questionnaire—telephone interview online survey, pen-and-paper mailer, or face-to-face interview. Furthermore, the format of individual questions-such as open- versus closed-ended—informs the type of analyses you can later perform. For example, an open-ended response to a measure about ones attitudes toward abortion would not lend itself easily to the tight quantification required for a statistical analysis. The researcher must thus always be mindful of the optimal coordination of all of these elements for the most successful survey design.' CONCEPT MEASUREMENT: TRAITS, ASSESSMENTS, AND SENTIMENTS ~--v"••* —-«•-• • .....• *— ■ - - - ■ - ---- ~ - - - - ■ The primary goal of survey research in the social sciences is to gather data on individuals' demographics, behaviors, personal details, attitudes, beliefs, and opinions. This might seem like an easy task—akin, perhaps, to asking people questions in everyday, real life—but there are actually a number of measurement issues that must be considered in order to develop a high-quality questionnaire. The first issue relates to the complexity of the concepts about which you would like to ask Many measures are well established by previous survey research—such as asking ones age and gender—and relate to relatively simple (i.e., easily measured) concepts. Other concepts—perhaps those that are more novel, complex, abstract, or unobservable— require a more cautious and carefully strategized approach in order to avoid bias while eliciting accurate responses. Developing questions that are phrased correctly will help lead to more reliable and valid data collection efforts. Simple Concepts As mentioned above, some questions are more easily answered than others, such as asking a respondents age and gender. While referencing a slightly more complicated concept than these, single construct measures such as, "What is your religious preference?" similarly tap into a single, well-understood concept and are likewise typically unproblematic when encountered in a questionnaire. Such straightforward and well-established measures and questions tap into unidimensional or simple concepts: concepts that reflect a single idea, attitude, or behavior. UP SECT1UJN11: yuttS'J'iuwiNiviiirjjvi'joivuN Complex Concepts Unobservable and Multidimensional Concepts A survey question is a measurement tool, describing your respondent by reference to a singular concept (like age or gender), in something of the same way that a tape measure is a tool that singularly describes an object by its height. However, at times the concept you wish to measure may be something that is not so easily observable, something that lacks an established, singular metric to describe it. Depending on your research goals, a more complex and multidimensional measure maybe necessary to describe this obscure construct, just as some purposes might suggest the combination of a tape measure and a scale to take a multidimensional measurement of an objects height and weight. A multidimensional concept *s one that combines multiple singular constructs orattributes of an object in order to compose some new, abstract attribute that cannot be directly observed or measured. It is only by combining multiple questions that measure related singular concepts that this composite, multidimensional concept can be indirectly described. However, this is not to say that merely asking multiple questions about a single topic automatically produces a multidimensional concept. A group of related questions that yield similar answers aggregated into a single score are still considered to be unidi-mensional if they ultimately reflect the same underlying concept. On the other hand, multidimensional constructs require researchers to ask questions that span multiple concepts, from which combination a more complex and novel concept actually emerges. For example, a researcher might ask various questions all related to particulars of respondents' religious practices: what church they attend, what religious texts they use, the particular rites and practices they observe, et cetera. Yet each of these questions, and even the aggregate score resulting from their combination, aims at describing the same concept: their religious affiliation. But perhaps rather than a taking simple measure (or measures) identifying an individuals religious affiliation, a survey researcher might want to assess an individuals religious spirituality, behavior, and attitudes for a more comprehensive assessment of an individual's religiosity (i.e., a persons unobservable, overall level of religiousness). See Figure 4,1 for an example of how a researcher might take measurements across the various facets composing this multidimensional concept. Here, religiosity is a multidimensional concept, because it measures not only spirituality but also religious behaviors and religious attitudes—all similar but separate components of an individual's level of religiousness. Of course, religious attitudes, religious behaviors, and spirituality are probably closely associated with one another, so the responses on each question In Figure 4.1 maybe similar. However, the responses related to religious behaviors would likely be more closely associated with one another than with responses related to spirituality or attitudes—evidence of the essential independence of the composite dimensions. unuptcr 4 survey Question construction %X0 Figure 4.1 Multidimensional Measures for Religiosity Strongly Agree Agrca Neutral 'Disaaree Strongly • Disagree \ I Religion is important in my life. (attitudinal dimension) 1 2 _ 3 4 5 1 regularly attend religious services. (behavioral dimension) 1 2 3 4 1 feel connected to a higher power, (spiritual dimension) 1 2 3 4 5 1 fe 1 Religion is the one true path to eternal life, (attitudinal dimension) 1 2 3 4 5 I 1 1 frequently read religious literature ... {behavioral dimension) 1 2 3 4 5 I 1 believe that religion is sacred. (spiritual dimension) 1 2 3 4 5 QUESTION DEVELOPMENT J| Certain types of questions produce more accurate data than others (Schwarz 1999). As mentioned above, questions about the unidimensional concepts of age and gender tend to produce highly accurate results. But other questions—especially those measuring behaviors, opinions, and attitudes—may yield less precise responses. This is why survey research questions require careful calibration. Questions about behaviors, opinions, and attitudes must be carefully worded, so researchers can obtain responses that might be difficult for a respondent to communicate and express. Guidelines for Developing Questions Aim for Simplicity Be concise; use simple, clear language; and avoid vague, abstract terms. Recall the lessons proffered in chapter three about respondent burden and fatigue. When a question is too long or too complex, or it contains abstract language (Converse and Presser 1986), respondents may be unwilling or unable to complete the survey, may @ SECTION II: QUESTIONNAIRE DESIGN stop paying attention to the question, and/or become annoyed—possibly more annoyed—with the entire process. Limited attention or annoyance may lead respondents to provide rushed and potentially incorrect responses or to drop out altogether, further affecting the quality of the data you are able to collect. Thus, the best way to collect high-quality data is to keep survey items short, simple, and clear. Use as few words as possible to ask questions that everyone will understand in the exact same way. The examples below will lead you through this (and subsequently other) issues in question design by comparing some improperly worded sample questions with their improved, more efficient counterparts. la. When did you move from Philadelphia to Los Angeles? This is an unclear question that does not indicate exactly what type of answer is required. For example, an individual could answer "after high school," "when I was 17," "in 1999," or in another way. In contrast, a more clearly articulated question would read, lb. In what year did you move from Philadelphia to Los Angeles? 2a. Do you favor or oppose the systemic reform of immigration policies that will assist lawmakers with adequately addressing delays in visa processing and the enforcement of contemporary immigration laws? This question is just too long and too complex. (In fact, it was difficult to even write it without losing interest!) If a question requires a second, third, or fourth read to be completely understood, this is a surefire sign that it is too long or too complex to include in a survey. The following question is much more concise and effective: 2b. Do you favor or oppose immigration reform policies in the United States? 3a. How important are family values to you? The problem with this question is that family values could mean a number of different things to different people. Could family values be related to spending time with ones family? In that case, exactly how much time spent with ones family would constitute "family values?" For some socially conservative individuals, family values may mean opposition to premarital sex, same-sex marriage, and reproductive rights. For many socially liberal individuals, family values might imply accepting same-sex partnership adoptions, embracing nontraditional family forms, and providing financial assistance for underprivileged families. Thus in many respects, this question is not adequately measuring a Chapter 4 Survey Question Construction single construct that is interpreted the same way by all respondents. A better idea is to remove the guesswork by phrasing the question to clarify exactly what you mean: 3b. How much do you enjoy spending quality time with your immediate family? 4a. Do you favor or oppose the use of assisted reproductive technology? Researchers should avoid using technical language and jargon in their surveys. Assisted reproductive technology, a term commonly used by fertility specialists, may not be easily understood by all respondents. Although the practice of using technical language may appeal as a way to sound more professional, it can easily make questions appear unclear and confusing to respondents unfamiliar with the concept or phrasing. In other words, although the survey may sound less formal, using a conversational tone can sometimes yield higher quality data: \ 4b. Do you favor or oppose the use of fertility treatments in order to conceive children when an individual is unable to do so through sexual intercourse? Be Specific 5a. Are you a "social drinker?" This question is vague, because not all respondents will interpret the term social drinker to mean the same thing. For some people, social drinking could simply refer to enjoying a few drinks with friends once or twice a week. For others, casual drinking with friends might entail multiple drinks across several hours, possibly extended across several nights per week. In the latter case, you might have classed respondents as heavy drinkers rather than agreeing with their own interpretation and identification as social drinkers. A more direct question might ask exactly how many drinks an individual consumes in a given time period: 5b, In the past 30 days, how many alcoholic beverages have you consumed? Avoid Double-Barreled Questions There are times when you must ask multiple questions about a topic to obtain the information you desire; this requires the separation of these questions into completely distinct survey items. Failing to do this creates double-barreled questions "two-in-one" questions that are problematic because a respondent might be willing or able to answer only a single part of the question: 6a. How Important are family get-togethers to you and your family? © SECTION II: QUESTIONNAIRE DESIGN This two-part question is really asking two questions: How important are family get-togethers for you, and how important are family get-togethers for your family? The researchers assumption here that these two answers are always in agreement may confuse, discourage, or mislead a respondent and lead to lower quality data. In addition to this, the latter half of the question is asking the respondent to report on someone else's feelings toward something—a task that is virtually impossible to do! In most instances, this type of question can be easily broken out into two separate questions. For this particular example, however, it is better just to eliminate the second, highly speculative component altogether: 6b. How important do you think family get-togethers are? 7a. Dp you favor or oppose the use of fertility treatments in order to conceive children when an individual or couple is unable or unwilling to do so through sexual intercourse? □ Favor □ Oppose - This question introduces two additional confounding variables (individual vs. couple, unable vs. unwilling), each of y^hich is problematic on its own. What if a respondent favors treatments for couples but not individuals? Or for people unable to conceive, but not those who are merely unwilling? In combination, this creates the potential for multiple different answers (technically, 16!) that are hardly served by the two response options provided. The researcher needs to decide whether all of these variables are truly of interest and then separate the question into multiple, more precise items: 7b. Do you favor or oppose the use of fertility treatments in order to conceive children when a couple is unable to do so through sexual intercourse? 7c. Do you favor or oppose the use of fertility treatments in order to conceive children when an individual is unable to do so through sexual intercourse? 8a. Do you think that capital punishment is an archaic form of discipline and that the federal government should abolish it? This is another example of a double-barreled question—it assumes the respondent is going to respond to both parts of the question in the same way, either affirmative or negative across the board. However, if respondents do not feel that capital punishment is an outdated form of punishment, but does nevertheless have some other reason to oppose it (or perhaps even vice-versa, approve of archaic discipline), they may be conflicted about how to answer, adding burden and frustration. A more appropriate design for a question of this sort is to separate elements, or use a skip pattern if you are truly only interested in respondents who agree that capital punishment ia archaic; Chapter 4 Survey Question Construction |^ 8b. Do you think that capital punishment is an archaic form of discipline? (And then, depending on response): 8c. Do you think that the federal government should abolish capital punishment? Misleading Single-Barreled Questions. Certain phrases may appear to be double-barreled (because the word and appears) when they are simply using established terminology that refers to a single construct. For example, "Do you own your home free and clear?" is a question that may seem to be asking two questions in one. However, "free and clear" is a term in property law that indicates that property is owned outright (without an outstanding mortgage or lien). It is important to understand and be mindful of these terms when crafting survey questions with the simplest and clearest possible terminology. Avoid Biased and Leading Questions Designing questions with accurate, unbiased, and simple phrasing, especially when measuring complex, multidimensional, and intangible concepts, is one of the most difficult parts of designing a survey. A significant problem related to question phrasing in survey research design is that researchers may bias their resultsby "leading" respondents to answer questions in a specific way. Leading questions often use strong, biased language and/or contain unclear messages that can manipulate or mislead a respondent to answer in a specific way. Even the most ethical researchers can still do this unintentionally, if they are not attentive enough to this common pitfall. Consider the following questions: 9a. In the wake of the largest economic downturn since the Great Depression, did you support our Republican-led Congress's 2013 decision to shut down the government? The use of the phrases "largest economic downturn since the great depression" and "Republican-led congress" makes it difficult to disentangle whether the responses you receive are reactions to the strong language or even the unnecessary detail about our current congressional profile. Instead, consider the following: 9b. Did you support Congress's 2013 decision to shut down the government? 10a. You wouldn't say that you support affirmative action in California, would you? This question is judgment laden, suggesting strong disapproval of affirmative action that encourages ft respondent to answer "no." Questions like this make too clear the intentions of thai rantrchwr unci color the resulting data to the point that it is useless. A less biased question might uk thlm 9 © SECTION II: QUESTIONNAIRE DESIGN lQ\yt Do you agree or disagree with affirmative action in the state of California? Ha, Did our president, Barack Obama, make a mistake when he enacted the Patient Protection and Affordable Care Act, which forces individuals into universal health care? This question uses emotionally suggestive language such as "make a mistake" and "forces individuals" that may bias an individual toward taking a specific stance. There is an even more subtle bias in this example: With the word our before president, the question may appeal to nationalism and pressure respondents to answer favorably. (To be fair, the word our in this example is unlikely to create much confusion; nevertheless, it is certainly possible.) Such a subtle bias shows how survey respondents are influenced in many indirect and complex ways. It is, therefore, important to be vigilant and consistently use the most balanced, inoffensive, and unbiased language possible. Consider this less biased (and much simpler) alternative to our previous example: Hb. Tell us how much you agree or disagree with the following statement: Enacting the Patient Protection and Affordable Care Act benefited Americans. 12a. Do you favor or oppose the use of fertility treatments in order to conceive children when an individual is unable to do so through more traditional means? Though it is commonly used, the word traditional can potentially be a very loaded term, implying that something is deep-rooted and established. Respondents maybe influenced to respond based on their general attitudes toward tradition and change, rather than on careful consideration of the specific issue at hand. For example, people who are resistant to or unnerved by change might be more inclined to oppose fertility treatment simply because it bears the marker of something unconventional or conceptually foreign. A better question avoids introducing this potential bias while also adding greater specificity: 12b. Do you favor or oppose the use of fertility treatments in order to conceive children when an individual is unable to do so through sexual intercourse? 13a< The tragedy in Newtown, Connecticut, has motivated Americans to enter the debate on gun control. What do you believe is the root cause of gun violence in America? This question is appealing to respondents' emotions by referencing the "tragedy in Newtown" before asking a question about gun control. A less biased question will instead get to the point without sensationalism: 13b. What do you believe are the primary causes for gun violence in America? (Please mark all that apply.) 9 Chapter 4 Survey Question Construction ^) Weighing Bias Against Straightforwardness. Offensive and biased terms are occasionally difficult to avoid, such as when such terms are part of widely recognized colloquialisms to which no clear alternative is available. For example, a question probing whether or not an individual supports "partial birth abortion" is likely easier to answer than a question asking about "abortion in the instance of intact dilation and extraction." Although "partial birth abortion" may seem like a biased (and possibly even offensive) term, it may still elicit a more valid response than "intact dilation and extraction," a phrase likely to be unfamiliar to the vast majority of respondents, However, even in such cases, the inclusion of such potentially loaded terms and questions in your work will surely be identified and criticized by readers and peers. Be aware of this potentiality, and carefully weigh all phrasing options so that you can defend your choices later. Avoid Making Assumptions Premising a question on a controversial assumption can be just as problematic as the use of such biased language as described above. Consider the question below: 14a< In your opinion, does the increase in work hours among employed mothers have an influence on the lack of respect children now have for their families? This question makes two assumptions that might introduce error into the responses. First, the question proclaims that there has been an increase in work hours among employed mothers. The respondent may or may not even agree with this assertion. Second, the question assumes that the respondent agrees that youth have lost respect for their families. Therefore, even if.respondents have no opinion on this issue and accept that there has been an increase in work hours among employed mothers, they are still unable to answer accurately, because they do not agree that children have a lack of respect for their family. As is typically the case, simplicity and brevity can reduce the burden on respondents and improve the precision and quality of their responses: 14b. Has there been an increase in work hours among employed mothers recently? Carefully Ask Personal and Sensitive Questions Often, researchers need to ask for personal and sensitive information that respondents may be hesitant to provide out of concerns for their privacy, item nonrcsponse (i-e-> slapping questions) is normally higher for these questions (Tourangeau and Yan 2007). Try to reduce the potential sensitivity regarding such personal questions with careful phrasing and questlon/roaponso structures that make items quicker and easier to answer. Some common nxatnplM of sensitive questions include those relating to income, voting behavior, and thi moras Inllnmlo details of peoples' private lives, © SECTION II: QUESTIONNAIRE DESIGN Sensitive questions may be more successful when possible responses are presented as bracketed categories, in which each response choice encompasses a range of numbers or categories defined in relevance to the question (Tourangeau and Yan 2007), For example, asking respondents to report their number of sexual partners may elicit a greater number of responses when followed by a range of choices like the following: 0,1,3-5,6-10, and 11 or more. Identifying a category may feel less like a revelation of specific, personal information, and may thus encourage a higher response rate at the expense of a little bit of precision. This approach is helpful in another way, since some respondents might not remember their exact number of sexual partners—and the longer they think about it, the more burdensome the question becomes. Above all, do not force respondents to answer any questions. Respondents who feel they are being coerced into answering a personal question may skip the question at best, or abandon the survey altogether at worst, pimply adding a response option of "decline to state" can make the difference between a skipped item and an abandoned survey, and preserve respondents trust in the sensitivity of the researcher. Social Desirability. Social desirability refers to the inclination for respondents to overreport socially acceptable behaviors and underreport socially undesirable behaviors (Krumpal 2013). For example, it is hot socially desirable to express homophobic or racist attitudes, so respondents may be apprehensive about admitting these behaviors. This tendency is especially marked in face-to-face interviews, where the presence of an interviewer can magnify respondents perceptions of being judged. In the same vein, respondents will underreport socially undesirable behavior, such as illegal Substance abuse; again, this is even more likely in a face-to-face setting. In addition, earlier advice about avoiding bias is especially relevant here, as desirability effects can easily be triggered by loaded words like illegal or abuse (when referencing substance use; other topics will entail different Hnguistic sensibilities). Privacy and social desirability are important concerns that should be taken seriously so that respondents feel comfortable answering fully and truthfully. Emphasize Anonymity and Confidentiality. Again, it is important to emphasize that survey responses will remain anonymous or confidential at the outset and, if necessary, reiterate this assurance when touching on sensitive subjects. Choose Delivery Mode Carefully. There is more response bias and social desirability in face-to-face interview surveys. Organize Sensitive Questions Strategically. Do not start a survey with personal or sensitive items or questions prone to social desirability. When respondents begin a survey, their level of commitment is usually quite low, and personal questions early on may Chapter 4 Survey Question Construction ^) deter them from continuing. Also, placing these questions at'the end of the survey is ill-advised, because this might lead to respondents feeling apprehensive, unpleasant, or offended, as though they are being manipulated. This may even introduce selection bias into future results when follow-up surveys are necessary and respondents are unwilling to participate. It is best to include personal and sensitive questions in the middle of the survey, so those respondents who have committed time and energy to completing the survey will still want to complete it. Additional Guidelines for Question Development In addition to the general concerns detailed above, there are a number of minor details that, left unaddressed, may add to overall respondent burden and discourage participation and attention to quality answers. Many of these details are presented here, with positive and negative examples to illustrate each one: • Avoid abbreviations; spell out the entire phrase instead. If the abbreviation is absolutely necessary (which is rare), define it for the respondent. Incorrect: How do you feel about the GOP? Correct: How do you feel about the Republican Party? • Avoid slang and contractions. Incorrect How many kids live in your household? Correct How many children live in your household? • Avoid ambiguous phrases, even those that are common in everyday talk. Incorrect: Do you agree that abortion should be illegal most of the time? Correct: Do you agree or disagree that abortion should be illegal under X circumstance? Do you agree or disagree that abortion should be illegal under Y circumstance? • Avoid negatively phrased questions. Incorrect: How frequently do you not attend church? Correct: How many times in the last month did you attend religious services? • Avoid double negative questions. Incorrect: Should the Supreme Court not have opposed the right of same-sex couples to marry? Correct: Do you favor or oppose the Supreme Courts ruling on the Defense of Marriage Act? © SECTION II: QUESTIONNAIRE DESIGN • Use a realistic time frame when asking about attitudes and behaviors. Incorrect1 How many cigarettes have you smoked in your entire life? Correct In the Past month, how many cigarettes have you smoked? • Make sure all questions are absolutely necessary. One of the most important ways to elicit high-quality responses to survey questions is to have an organized questionnaire with questions that are clearly and impartially articulated. Verification that questions are relevant and do not contain (potentially inaccurate) assumptions will also safeguard against measurement error. Once you are satisfied that you have included all the relevant questions you need for your research purposes, excluded irrelevant and unnecessary items, and arrived at s question wording that is clear, concise, and unbiased, the next step is to focus on the response options you provide for the respondent to these questions. It is to this area of survey design that we turn our attention in the sections that follow. RESPONSES TO QUESTIONS _ Types of Responses There are many options to consider in the way you allow respondents to answer your survey questions. For some purposes you may wish to allow them to speak freely in their own words; other times, it will be more effective and efficient to provide them a range of responses from which to choose. The type of response you solicit will depend on the nature of the concept you wish to measure (and the research question or hypothesis that underlies it) as well as the subsequent analysis you intend to perform with the data collected. The sections below outline the strengths, weaknesses, and special considerations of the various response options, to help you choose the ones best suited to your specific research needs. Closed-Ended Questions Closed-ended questions are questions formatted such that the response possibilities are limited ("closed") to a specific list from which the respondent must choose—they are also referred to as fixed-choice questions- There are several different types of closed-ended questions, and depending on the concept being measured, some are more appropriate than others. Your questionnaire will likely involve a combination of different types and may require separate sets of instructions to ensure respondent comprehension. Nevertheless, questions of different types must flow together seamlessly In order for you to collect first-rate data. Chapter 4 Survey Question Construction Strengths of Closed-Ended Questions The strengths of using closed-ended questions are mostly on the back end, in the ease they bring to the compilation and analysis of data by the researcher. Strengths include the following: • Responses are easy to quantify (e.g., 1 = Strongly Disagree; 2 = Disagree; 3 = Neutral; 4 ~ Agree; 5 = Strongly Agree). • They are easier to enter into statistical analysis software and analyze. • Data collection is usually quicker. • Results are easier to summarize and present in tables, charts, and graphs. • There is more reliability across responses—especially when only a few collapsed response categories are included. • There is less interviewer and social desirability bias. • There is a higher degree of anonymity. Types of Closed-Ended Questions Multiple Choice , Most readers will be familiar with the standard multiple-choice format so prevalent in the exams we encounter in our years of schooling. The only difference in survey questionnaires is that, ideally, less guesswork will be involved. Which of the following is the mode of transportation you most frequently take to work? 1) Automobile (self-driven) 2) Automobile (carpool) 3) Bus 4) Train 5) Other (please specify) As in all aspects of survey design, it is important to check that all response categories are clear and unambiguous, that they do not suggest different meanings to different (types of) respondents. For example, questions asking respondents about their occupations often list "education" among the options. However, "education" does not mean the flame thing to everyone, "Education" includes students, teachers (at various levels), and administrators. These are all quite different but would be categorized as the same if the response options are ambiguous, © SECTION II: QUESTIONNAIRE DESIGN Dichotomous When there are only two possible response options (e.g., agree/disagree, yes/no, true/ false), the question is dichotomous. Technically, this is a type of multiple-choice question, but with only two options, which makes it simple to design. However, keep in mind that questions rarely have only two possible answers, and so responses to dichotomous questions are easy to misinterpret. They also make it even easier to -respond incorrectly or thoughtlessly for ambivalent respondents already prone to random guessing. Consider whether the item below is well suited for a dichotomous type of response: My neighbors are an important part of my life. □ Yes □ No Checklist A checklist is appropriate when you want to allow the respondent to select multiple responses. Note the use of check-boxes (rather than numbers or letters) to identify separate items, to remind respondents that choices are not mutually exclusive. For example: In the past 30 days, which of the following have caused you a lot of stress? (Check all that apply). • My friends • My partner • My job/finances • My family • My children • None of the above Scales Scales provide a set of response options representing ordered points on a continuum of possible answers. There are several distinct types of scales suited to different types of applications, as outlined below. Rating Scale. Rating scale questions are a type of multiple-choice option that uses ordered responses to represent a continuum from which respondents choose the single best answer choice. It is often helpful to include additional explication of the categories in parentheses to ensure all respondents (and researchers) interpret them in the same way. Consider this example: Chapter 4 Survey Question Construction 4& How often are you late for work? 1) Very frequently (almost daily) 2) Frequently (twice a week) 3) Occasionally (once a week) 4) Seldom (twice a month) 5) Rarely (once every six months) 6) Never Rank Order Scale. A rank order scale allows respondents to put answer choices in order themselves, according to some criteria expressed in the prompt. Note here that blanks or brackets should be used to differentiate the choices from a rating-type or checklist-type question. Here is an example: If you had to choose an alternate mode of transportation to work, what would be your order of preference among the following? (1 is first choice, 2 is second choice, etc.) □ Walk/run □ Bicycle □ Skateboard □ Rollerblade □ Taxi cab Likert Scale.m a Likert scale* participants are asked to indicate their agreement or disagreement with a statement (or number of statements) by scoring their response along a range. Responses typically range from "strongly agree" to "strongly disagree," with each response option on the scale associated with a numerical score. Likert scales are particularly helpful when measuring respondents' attitudes and opinions about particular topics, people, ideas, or experiences. Likert scales should not be used when the responses are not on a scale or when the items are not interrelated—these would, in effect, not even be called scales. Typically, Likert scales will have a midpoint for a neutral response between agree and disagree (see below). However, researchers sometimes use an even number of possible responses, to the exclusion of a neutral/undecided option. This is called a forced choice question' because ambivalent respondents are forced to form an opinion in one direction or the other. The issue of whether or not to include a midpoint is addressed on page 62, Overall, I feel that the current US. president Is...: « <2J) SECTION II: QUESTIONNAIRE DESIGN Strongly Agree Agree Neither Disagree 5řmng/y ■ -/ ~)isagrěi! 1 1 Trustworthy [1] [2] [3] [4] [51 J 2 Strong [1] [21 [3] [4] [5] 3 Capable [11 [2] [3] [41 [51 1 4 Intelligent [11 [2] J3]_ [4] [51 Semantic Differential. Semantic differential scales provide contradictory adjectives as endpoints on a Likert-type scale where the respondent can assess a person, idea, or object according to a dimension of special interest to the researcher (Osgood, Suci, and Tannenbaum 1957): Indicate your attitudes regarding the current president of the United States on the scale below: Trustworthy m [2] [31 [4] [5] [6] [7] Untrustworthy Strong ni [2] [3] [4] [5] [6] [7] Weak f Capable • ni [2] [31 [41 [51 [61 17] Incapable js Intelligent ni [2] [3] [4] [5] [6] [7] Unintelligent 1 Respectful in. [2] [3] [4] [5] [6] [7] Disrespectful i Important Guidelines for Categorization Scheme Development When developing response categories for closed-ended/fixed-choice questions, there are three important things to keep in mind: relevance, comprehensiveness, and mutual exclusivity. Relevant Just as appropriate questions need to be designed with the population and topic in mind, response scales and categories must be designed with the same considerations. Researchers need to be familiar with the most relevant or common types of answers given in order to choose relevant response options. Response options can be based on common knowledge or can be identified through research or by asking individuals Chapter 4 Survey Question Construction involved with or particularly knowledgeable about a specific topic. Appropriate response categories vary depending on culture, time, and even geographic region of the country. The goal is to make sure that all of the most important (and expected) possible response categories are listed, so respondents will not need to struggle to align their own ideas with the options available to them. Comprehensive On a distinct but related note, response options must reflect a comprehensive list of the possible options, exhausting all categories a respondent may wish to select. The best way to ensure a comprehensive category set is to include an "other" category at the end of a list, followed by the instructions "please specify" and a blank line for respondents to write on. This is also useful for future survey designs: If a large number of similar open-ended responses are noted, a researcher can create a new category option for it. Unless the researcher is 100% confident that all categories are covered, then an "other" category is necessary. As sociologists, we know that gender is composed of more than the typical two categories (e.g. male/female); therefore including an "other" response option will improve participant responses. A lack of comprehensive options is both common and very frustrating for individuals excluded by the choices (especially with regard to overlooked racial and ethnic categories). The examples below illustrate the difference between a noncomprehensive list and a comprehensive one: , What is your marital status? 1) Single 2) Married What is your marital status? 1) Single 2) Married 3) Divorced 4) Widowed 5) Separated 6) Never Married Mutual Exclusivity Having mutually exclusive categories means that the response options do not conflict or overlap with each other in any way, In ollipr words, respondents must perceive © SECTION II: QUESTIONNAIRE DESIGN that only one of the available responses fits the answer they imagine (unless, of course, the item includes a checklist of responses), If not, respondents have the burden of deciding which is closest or most appropriate—and how they choose to do so may vary from survey to survey, lowering the precision and overall quality of your data. Note that this is the opposite of the previously discussed problem, that of a lack of comprehensive categories: In this case, there are too many possible choices instead of not enough. For example, consider the following question: How long have you been seeing your primary care physician? 1) 1-2 Years 2) 3-5 Years 3) 6-10 Years 4) 11 or more years How would an individual respond who has been seeing a primary care physician for 2.5 years? Do not assume everyone will round up—people who are rarely sick or do not frequently see a doctor may be inclined to round down. You also should not assume everyone will round down—people who like their doctor and frequent the doctors office may round up. It is important to exercise caution when creating numerical categories where an individual might land in between two possibilities. Sometimes this requires instruction to round up or round down. Other times, it may be a better idea to simply add specificity to your presented options. In recent years, it has become common to use an "infinite cap" on response options based on the last category listed, In order to avoid confusion, do not list categories using a plus sign (e.g., 1-2 times, 3-5 times, 6-10 times, 10+ times). An individual may not realize that "10+" implies "anything more than 10" and may interpret "10 times" as being included in this categorization. A better response option will spell out "11 or more" to avoid this confusion. The Problem of the Neutral Point As noted at the beginning of this chapter, the testing of a hypothesis or answering of a research question requires survey researchers to measure concepts as precisely as possible. Therefore, it is necessary to decide whether or not a midpoint or neutral point in a scale is helpful for your specific research purposes. There is no definitive evidence whether or not the midpoint is valuable—the best answer is that it depends on the respondent and the type of question being asked (Kulas and Stachowski 2013; Krosnick 2002). When researchers argue that we should not include a midpoint, they are basically claiming that respondents should always be forced to take a stance on a given topic. Chapter 4 Survey Question Construction They also suggest that neutral points are potentially meaningless and offer no insight on an individuals real opinions or attitudes. However, the unavoidable reality is that people sometimes have neutral feelings. There are several scenarios when this is conceivable, such as when a respondent • Lacks interest in a topic. • Has limited recall regarding the event(s) in question. • Is legitimately undecided on an issue. • Lacks knowledge about a topic. • Lacks experience relating to a topic. • Finds the question too personal to answer. In such cases, respondents may randomly guess, skip the question entirely, or abandon the survey altogether. (Recall from the previous section on sensitive questions that respondents may be hesitant to complete a survey when they feel forced to answer a question.) Therefore, given the topic, question, and types of respondents participating, a researcher should choose carefully whether or not a neutral point is advantageous for the study, whether the value oi "forcing their hand" outweighs the risks of introducing error or additional respondent burden to the survey experience. In addition to these general concerns, there are some common, more specific sources of error, as detailed in the next section. Additional Guidelines for Response Options • Use an equal number,of positive and negative responses for scale questions. To understand why this is important, consider the question below, where 75% of the answers indicate some form of close relationship, and there is only one possible option for something other than close: Incorrect How emotionally close would you say you are to your mother? 1) Extremely close 2) Somewhat close 3) Close enough 4) Not close at all Correct How emotionally close would you say you are to your mother? 1) Very close 2) Somewhat close 3) Somewhat withdrawn 4) Very withdrawn © SECTION II: QUESTIONNAIRE DESIGN • Use a consistent rating scale throughout the entire survey. Do not mix questions with a scale where 1 ~ strongly agree with other questions with a scale where 1 - strongly disagree. • Similarly, do not mix scale ratings within the same survey. Choose the standard number of points in your Likert scales (e.g., three, five, seven) and be consistent to avoid confusion. • Limit the number of points on scales. Any more than nine points will hinder the respondent's ability to discriminate between points. Scales with five or seven points are common and reliable. • Neutral is not the same as "no opinion." "Neutral" might actually be an opinion. Individuals may feel "neutral" about the hours they work per week, but this is not the same as having no opinion on the matter. The above guidelines should sensitize you to some of the advantages, drawbacks, and special considerations surrounding the various types of closed-ended questions. In the next section, we turn to a very different type of survey item, the open-ended question, and elaborate a similar discussion of strengths, weaknesses, and other issues. LEVEL OF MEASUREMENT The end result of collecting survey data is to quantify our concepts into variables and then analyze them statistically to test our hypotheses (Nunnally and Bernstein 1994), The types of statistical analyses that are possible depend entirely on the levels at which variables are measured. To allow for our chosen type of statistical analysis, we must know, and even plan ahead for, the level of measurement of our variables, Level of measurement can be defined as the mathematical property of variables. As the precision of measurement increases, more mathematical options or tools become available with which to analyze the variables statistically. This ultimately suggests that the process of measurement is an ongoing iterative process that needs to consider both theoretical and empirical needs of the research (Carmines and Zeller 1979). An important reason for choosing a particular set of response options for each survey question is that the response options dictate the level of measurement of the variables we will use to answer our research question. We are literally assigning numbers to attributes of the variables. Note that in the examples of survey questions presented above, all the response categories were tied to a number. This is measurement in a nutshell and the end product of all our hard work developing questions and assessing validity and reliability, Chapter 4 Survey Question Construction We classify variables with four levels of measurement that we will present in order from the lowest level of precision to the highest level of precision: nominal, ordinal, interval, and ratio. Again, precision is the degree of specificity of the numbers assigned to specific response options. Nunnally and Bernstein argue that measurement is really about "how much of the attribute is present in an object" (1994: 3). Therefore, as precision increases, we are better able to gauge how much of the attribute we have observed. Another way of thinking about it is that lower levels of precision mean there are fewer mathematical operations we can use with the variables because it's harder to estimate the quantity we have, while greater precision increases the mathematical options we have. Nominal measures have the least precision. Nominally measured variables are actually qualitatively measured variables rather than numerically measured variables. That is, we cannot evaluate how much of an attribute is present; rather, we can only know whether the attribute is or is not present. Nominal variables can be distinguished only by their names. Favorite type of fruit is an example of a nominal variable. Response options include apples, oranges, grapes, bananas, et cetera. We can assign a number to apples (1), oranges (2), and bananas (3), but the number is essentially arbitrary. We cannot say, for example, that there is more favorableness associated with bananas than is associated with apples based on the number assigned. Any number used with a nominal variable is used as a label only (Nunnally 1967). Apples, oranges, and bananas are qualitatively distinct, and all we can do is examine whether or not participants chose apples (in the set of apples) or did not choose apples. A few other examples of nominal variables are gender (male* female, other), race/ethnicity (white, African American, Asian, Native American, Hispanic, multiracial), and participation in food stamp programs (yes, no). It is possible to measure nonnominal variables nominally. For instance, lets say a research team is interested in the food security of Americans who are living below the poverty line. They use income as their primary variable of interest, because it defines the US poverty line. However, the researchers may decide to measure income—a variable that is easily and precisely quantifiable, simply as "yes" for below the poverty line or "no" for not below the poverty line. This limits what the researchers can do with income. They cannot tease out how variations in income levels among those below the poverty line may affect food security, because that information was not gathered; the nominal variable used lacks the precision necessary to conduct that analysis. Ordinal measurement quantifies variables by ordering the response categories from least to most or most to least (Nunnally and Bernstein 1994). The following question is measured ordinally, because the three response categories are rank ordered from least problematic to most problematic. Upon moving to your new home, please tell me to what extent your commute to work may be a problem. Not a problem Somewhat of a problem A big problem © SECTION II: QUESTIONNAIRE DESIGN The numbers assigned to the response categories are rank ordered, with 1 being one unit less than 2, which is one unit less than 3. There is the same distance numerically between each category. Yet the difference or distance between "not a problem" and "somewhat of a problem" may be bigger or smaller than the distance between "somewhat of a problem" and "a big problem." In terms of mathematical computation, this is limiting. All we really know is that 1 is less than 2, but we cannot calculate the actual distance between the ordered categories. This means we cannot add or subtract across the scores, nor can we multiply or divide the values of this variable. Most Likert scale variables, such as opinion or attitude scales with five category response options, are ordinal—going from least amount of agreement to most amount of agreement without meaningful numbers to determine the distance between the categories. An interval level of measurement is one that rank orders response categories and, additionally, provides known distances between each response category without provid-ing information about the absolute magnitude of the trait or characteristic (Nunnally and Bernstein 1994). Conceptually, this is very difficult to understand without an example. Temperature is an interval level of measurement. In the United States, temperature is measured on the Fahrenheit scale. On this scale, 32 degrees is considered freezing. In other parts of the world, temperature is measured on the Celsius scale where 0 degrees is considered freezing. Therefore, 32 degrees Fahrenheit and 0 degrees Celsius are the same value on different scales. The zero in Celsius is meaningless, though the "freezing" designation is meaningful. The difference between 0 degrees and 1 degree Celsius is clearly understood as a 1-degree change, as is the difference between 32 degrees and 33 degrees Fahrenheit. But, 33 degrees Fahrenheit is not equivalent to 1 degree Celsius. When the absolute magnitude of the trait is not specified, we do not have a zero score that is consistently anchored to a meaning. Mathematically, then, we can add and subtract values with interval measured variables, but we cannot multiply or divide them. Another example of an interval measured variable is intelligence quotient (IQ), which is measured on a scale created from many questions and is standardized to have a mean of zero. Thus, the zero has no true meaning, and we cannot say that someone with an IQ of 100 is twice as smart as someone with an IQ of 50. Ratio measurement is the most mathematically precise of the levels of measurement. Variables that are measures on the ratio scale have response categories that can be rank ordered, the distances between the response categories are known, and there is a true zero value that is meaningfully anchored (Nunnally and Bernstein 1994). Income is an excellent example of a ratio measured variable. The question "How much did you earn in wages and salary in the previous year?" followed by a line for participants to write in a value, will create a ratio measured version of income. A participant who earns $30,000 per year makes half as much as someone who earns $60,000 per year. If a person reports $0, than that person had no earnings or wages in the previous year. The absolute magnitude of the trait is known. Thus, we can add and subtract, multiply and divide, rank order, or create nominal variables out of t'l Chapter 4 Survey Question Construction H5l income if it is measured on a ratio scale. Other examples of ratio measures are age, height, weight, and years of education. When devising questions for a survey, think carefully about the best configuration of response categories. Measure them as precisely as possible to allow for more statistical analysis options later in the research process. Look to previous published literature to determine how the field is measuring the variable to see if there are known issues with measurement. For example, how well can people articulate their previous year's earnings and wages? If this is too hard, participants might guess, estimate, or skip the question entirely. If this is what the literature says, use a less precise level of measurement. Open-ended questions are survey items formatted to allow respondents to answer questions or provide feedback in their own words. In contrast to closed-ended questions, there are no limitations to the response possibilities, and respondents are encouraged to provide in-depth answers. A quick caveat might be raised before proceeding: Open-ended questions are not necessarily always questions per se; they may include any type of prompt—question or not—to elicit an original and self-guided response from the respondent. Consider the following example: Closed-Ended Question: How emotionally close do you feel to your mother? (Add response options.) Open-Ended Question: Tell me how you feel about your mother. Open-ended questions are sometimes useful when they follow a closed-ended question. This configuration ensures that researchers can learn about some specific aspect of the issue they find relevant, but it also opens the floor to detailed elaboration, or even novel issues, that the respondent finds interesting or important as well. Consider the following question posed by a researcher studying public opinion of the US Supreme Court: 1. Please rank your support for the current US Supreme Court: 1) Strongly support 2) Somewhat support 3) Somewhat oppose 4) Strongly oppose OPEN-ENDED QUESTIONS