62 INTERPRETING QUANTITATIVE OATA WITH SPSS lo back with horizontal bars pointing 10 opposite directions, where each bar represents a five-year span. This type of histogram is called a population pyramid. In a population pyramid, the last class is left open. Usually it is the '80 years and more* class, as shown in Figure 3.10. FREQUENCY POLYGONS AND DENSITY CURVES If we join all the midpoints at the top of the columns in a histogram, wc °et what is called a frequency polygon. The polygon shows the general pattern of die distribution. Imagine now a frequency polygon drawn on a histogram with very large number of columns. Wc could redraw it as a smooth curve, called a density curve (Figure 3.11). A density curve is drawn in such a way that its surface is equal to 1. And if we look at the surface under the curve between any two values, it tells us the exact proportion of data that falls within these two values. We can now be more specific about the definition of the mode for a quantitative variable. / A k N K Figure 3.11 A density curve can be thought of as the curve resulting from joining the midpoints at the top of the various bars of a histogram with a large number of classes If the variable is represented by a histogram, the mode is the class with the highest frequency. If it is represented by a density curve, the mode is the x-value that corresponds to the highest point on the density curve. HISTOGRAM OR BAR CHART? When we have a quantitative variable that has been grouped into a small number of categories, we can represent it either by a histogram or by 3 bar chart. But which of the two representations is better? It depends on what we want to convey. To explain this point, consider a situation where we have the variable Age represented by seven categories as shown in Figure 3.4. If wc want to convey how the ages of the sample studied arc distributed over the whole range of ages, the histogram shown in Figure 3.9 is better. But if we want to show how the various age group;- are divided among men and women, or among married vs. unmarried individuals, a clustered bar chart allows us lo do that, as shown in Figure 3.12. A histogram would not permit us to juxtapose corresponding categories of age groups for men and women. In Figure 3.12. we sec the distribution separately for men and women, and we can determine ihat women arc more represented in ihe older categories, as they tend to live UNIVARIATE DESCRIPTIVE STATISTICS Respondent's Sex ■ Male I. 1 Female 18-30 31-40 41-50 51-60 61-70 71-80 81+ Age into 7 categories Figure 3.12 A clustered bar chart allows us to show the pattern of ages separately for men and for women. It is appropriate for a quantitative variable grouped into a small number of categories longer than men. Notice that the vertical axis represents the percentages, not the frequencies. If we made it represent the frequencies instead, we would still get die same general shape, but we would not be able to determine whether men or women are more represented in a given class, as the overall number of women in this sample is greater than the overall number of men. In almost every age category, we would therefore find more women, not because a higher percentage of women (as opposed to men) fall into that category, but because there are more women in the sample as a whole. Box plots Box plots are very useful to show how the values of a quantitative variable are distributed. Tlie box plot indicates the minimum and maximum values, and the three quartilcs. The central 50% of the data (the 2ná and 3rd quarters) are represented as a shaded solid box, whereas the first and last quarters are represented by thin lines. The box plot gives automatically ihe five-number summary of the data: the minimum, the 1st quartile. the median (which is the 2nd quartilc). the 3rd quartile, and the maximum. In symbols the five-number summary is given by: Min, Q,, Median, Q,. Max. The box plot is shown in Figure 3.13. 64 INTEAPRETING QUANTITATIVE DATA WITH SPSS 1 80 - * 60 - T7T- .;., - j i 40 - :m 20 - -------------------1 \ : 1 ■ mod« Symmetric Positively skewed Negatively showed distribution distribution distribution Figure 3.17 Symmetric and skewed distributions How can we know that a distribution is skewed? The first indication is the histogram: the tail end of the histogram is longer on one side than on the other. We can also sec that a distribution is skewed through its numerical features: the mean is different from the median. When the distribution is positively skewed, the mean is larger than the median, as it is pushed by the extreme values toward the longer tail. For negatively skewed distributions, the mean is smaller than the median. Therefore. a mean larger than the median tells us that the extreme values on the higher end of the distribution are much larger than the bulk of Ihe data in the distribution, pulling the mean toward the positive side. This is illustrated by the numerical example given in the section on the median, where one extreme value (60) pulls die mean up but does not affect the median. Therefore, when a distribution is highly skewed, the median is usually a better representative of the center of the data than the mean. Kurtosis This is a measure of the degree of peakedncss of the curve. It tells you whether the curve representing the distribution tends to be very peaked, with a high proportion of data entries clustered near the center, or rather flat, witii data spread out over a wide range. A normal distribution has a kurtosis equal to 0. A positive value indicates that the data is clustered around ihc center, and thai the curve is highly peaked. A negative value indicates that the duta is spread out, and that the curve is flatter lhan a normal curve. Figure 3.18 sliows three curves with zero, positive, and negative kurtosis respectively. Methodological Issues Although they seem to be simple, descriptive measures can be tricky to use. We would like to point out here some of the pitfalls and difficulties associated with dieir use. «e INTERPRETING QUANTITATIVE DATA WITH SPSS A curve with A curve with A curve with kurtosis = 0 positive kurtosis negative kurtosis Figure 3.'8 Illustration of zero, positive and negative kurtosis The Definition of the Categories over which the Counting is Done Suppose I say that the passing rate m a given class is 82%. In another college, a colleague tells me that his passing rate is 95%. Before concluding that his passing rate is much higher. I have to make sure that we are defining the passing rate in the same way. 1 may define the passing rate as the number of students who pass a course compared to those who were registered at the beginning of the semester. If he defines it the same way, we can make meaningful comparisons. But if he defines it as the number of students who pass the course compared to the number registered at the end of the semester, we cannot make a meaningful comparison. This is so because all the students who dropped out would not be taken into account in his calculation, whereas they would be taken into account in mine. A careful definition of the categories used to define a concept is therefore important. Such problems arise when we define the unemployment rate in various countries, or even wealth. The conclusion is that careful attention should be given to the way categories are defined when comparing the statistics that refer to different populations. Outliers Outliers are values that are unusually large or unusually small in a distribution. They have to be examined carefully to determine if they are the result of an error of measurement, or a typing error, or whether they actually represent an extreme case. For instance, the value 69 in the column of the variable age for college students could be a typing error, but it could also represent the interesting case of a retired peison who decided to pursue a college program. Even if they represent an extreme case, it may be desirable to disregard extreme values in some of the statistical computations. When producing a Box Plot diagram, SPSS excludes the outliers from the computation, and prints them above or below the box plot. An option allows users to have the case number printed next to the dot representing the outlier, so as to be able to identify the case and examine it more closely. UNIVARIATE DESCRIPTIVE STATISTICS Summary We have seen in this chapter the various measures used to summarize the data pertaining to a single variable as well as the various types of charts that could be used to illustrate the distribution. You should keep in mind one fundamental point: the level of measurement used for the variable determines which measures and graphs are appropriate. It does not make sense, for example, to compute the mean of the variable when the level of measurement is nominal, that is, when the variable is qualitative. There are three types of univariate descriptive measures: • measures of central tendency, • measures of dispersion, and • measures of position. Measures of central tendency, also called measures of the center, tell us the values around which most of the data is found. They give us an order of magnitude of the data, allowing comparisons across populations and subgroups within a population. They include the mean, the median, and the mode. The mean should not be used when ihe variable is qualitative. Measures of dispersion arc an indication of how spread out the data is. They are mostly used for quantitative data. The most important ones are the range, the interquartile range, the variance, and the standard deviation. Measures of position tell us how one particular data entry is situated in comparison to the others. The percentile rank is one such measure. Other measures include the quartilcs and the deciles. In addition to these measures, we have seen the weighted mean. When calculating it. the various entries are multiplied by a weight, which is a positive number between 0 and 1. All the weights add up to 1. The weighted mean is used when the numbers that are averaged have been calculated over populations of unequal size. For instance, if you have the birth rates in all Canadian provinces and you want to find the average birth rate for Canada as a whole, you must weight these numbers by the demographic importance of every province. The weighted mean is also used when you want to increase or decrease the relative importance of the numbers you arc averaging, as is done when finding the average grade over exams that do not count for the same percentage in the final grade. When categories are involved {either because the variable is qualitative, or when quantitative values have been grouped) we can/md ratios, percentages, and proportions of the groups corresponding to the categories. The general shape of a distribution is analyzed in terms of symmetry or skewncss. and in terms of kurtosis (the degree to which the curve is peaked). 70 INTERPRETING QUANTITATIVE OATA WITH SPSS The comparison of the mean and ihe median is very useful. Recall the following: II" the distribution is very skewed, the median is a bciler representative of the center of the «lata, as the extreme values lend to pull the mean towards one side of the curve. The median is no* affected by extreme values. If the mean 1ft larger than the median, the distribution is positively skewed. If Ihe mean is smaller than the median. Ihe distribution is negatively skewed. As for the graphical representation of a distribution, recall again thai the level of measurement of (he variable detennines what kind of chart is appropriate Bar charts and pie charts arc appropriate when the data is qualitative, or measured at the nominal or ordinal level». Quantitative data (whether measured at the ordinal of numerical scale levels) could also be represented by bar charts or pie charts if the values have been grouped inlo a small number (if categories- The essential difference lietween pie charts and bar charts is that In the former, the emphasis is on the relative importance of each category as compared to the other caleiyini"-. *lu'iľ;i. hi tin1 l.tivi. ihe BiTiphaSM a OD Ihe tilt <■'• Mdl I MfOf) HOW ever, there is no clear-cut distinction between tbc two. and if one is appropriate, the other is usually appropriate also, even if the emphasis is slightly different. The great advantage of bar charts is that it allows making comparisons between the distributions of subgroups, with the help of clustered bar charts. Quantitative variables ore boiler represented through histograms, A specific type of histogram is the population pyramid, which is a standard tool in demography. Line charts arc most suited 10 represent the variation of a quantity across time. In all kinds of charts, truncating the Y-axis is sometimes done to zoom in on the variations of the variables and to represent them in a more detailed way. However, wc should be aware of the f act that truncating tlte Y-axis may also convey a mistaken impression that the variations of ihr variable arc more important than ihey are in realily. Keywords Univariate Bivariate Measures of ccniial tendency Measures of dispersion Measures of posilion Mean Trimmed mean Weighted mean Median Frequencies Cumulative frequencies Valid percent Range Trimmed range Interquartile range Deviation from the mean Standard deviation Variation ratio Ratios PinivtiLons ll.il -Ml-il Clustered hui graph Pic chart Histogram Frequency polygon Line chart Box plot UNIVARIATE OCSCRIPTIVE STATISTICS '1 Mod. Coefficient of variation live-number summarv V1od.il . ;iu- ;oiy Quartiles Symmetry Majority Deciles Skcwncss Plurality Percentiles Kuriosu Percentile rank Ouilieis Suggestions for Further Reading Dcvorc, Jay and Peck. Roxy (1997) Statistics, the Exploration and Analysis of Data (3rd Kdn) Belmunt. Albany: Duxbury Press. Harnett, Donald II and Murphy, lames L (19931 Statistical Analy\U for Business and Economics Don Mills. Ontario: Aůdisoo-Wesley Publisher* Tmdel. Robert and Antoniu*. Rachad (1991) Methode* quantitatives appliquén aux sciences humaines, Montreal: CEC. Wonnacott. Thomas H. and Wonnacott, Ronald J. (1977) Introductory Statistici i3id cdn). New York: John Wiley and Sons. EXERCISES 3.1 Complclc the following senicnces: (a) HUM types of measures are useful to summarize a mimcm *l distribution. They arc ud (b) The mosi frequent value in a distribution is called_____________ (C) When ihe values of the distribution arc grouped into classes, the mode is the___________wilh the highest frequency. (d) When there arc two classes that are bigger than Ihe ones immediaiely next to ihcm, the distribution is called _____________ (c) If the modal class includes more than 50% of Ihe population, we say thai it constitutes the____________ŕ Otherwise, wc simply talk of a (f> The median falls_________________ of the otdered list of entries. ______% of the data arc less than o/ equal to ihe median, and______% are larger than or equal to it. (g) The mean of a numerical distribution is equal lo I lie _ ______of all entries divided by,_________________________. (h) The mathematical measure used to find the mean when the entries do not have the same relative importance is called_______________________- WRITING A DESCRIPTIVE SUMMARY ■ 'Flic purpose of this chapter is to explain how 10 proceed in order lo wriie a good descriptive report, and how to analyze a frequency table beyond a first-level reading of the percentages, in order to identify the numerical features of the data and lo highlight them. After studying this chapter, the student should know: • how to proceed when writing a descriptive report to summarize data; • which measures and charts are appropriate, depending on the measurement level of the variable; • how lo summarize a set of variables unit measure a given concept; • how lo analyze a frequency table in dciail and identify its important features; • the difference between a first-level description and an analytical description; • the criteria for a good descriptive summary. In Chapter 3, wc have seen how lo produce simple descriptive statistical measures, as well as simple tables and graphs. We have also seen that Hie statistical measures to be used depend on Ihc level of measurement of the variable. Now. we would like 10 sec how we can integrate all these elements and produce a synthetic report that describes certain features of a population. For the time being, we will restrict these explanations to univariate descriptions of variables. Later on, you will have to include bivariatc descriptions, that is, descriptions of the statistical associations between variables, as well as confidence statements, that is, generalizations from the observed sample to Ihc population as a whole, two statistical topics studied later on in this book. We will also learn how to report the result of a hypothesis testing. How to Write a Descriptive Report We will consider two types of repori. Basic reports consist in a direct reading of (he tables produced by SPSS, and a reformulation in direct, plain language of what Ihc tables say, with accompanying charts as illustrations. There is very little interpretation WRITING A OESCRIPriVt SUMMARY 7» in this case. A second level in sophistication consists in writing analytical reports: such reports would highlight (he outstanding tendencies that can be seen in tlie dala. and may include a greater degree of interpretation. We will now explore both kinds of reports. Basic, Direct Reports Suppose you want to describe the educational level of the individuals included in llie GSS93 subset data file supplied with Ihc SPSS package. This means that you would like to have some global description that tells you whether the people in your sample tend to have a high level of education or not (this is a description of the central tendency), and whether there is a big polarization, with some people having a lot of education and many others very little (this is a description of the dispersion). The first thing lo do is lo see which variables concern education. You will find three such variables in the GSS93 subset data file. Lisi them, and list the level of measurement of each. In ihis dala filc, you will find that the three variables arc: • Highest year of schooling completed (scale), • Highest degree obtained (ordinal, 5 categories), and • Possession.or not of a college degree (ordinal. 2 categories). Determine what kind of descriptive measures you would use for each. Would you use a frequency table? For which of the variables? Which charts would be more appropriate? Sometimes you will feel lhat you are not too sure which type of chart is appropriate. Get SPSS to produce several charts, examine them carefully lo sec which ones convey a better representation of the distribution of the variable, then select one of them, and paste it into your report. One of the important pitfalls that you should avoid is to give a lot of tables or charts that arc not very useful. You may want to be selcclive here: select the relevant information, and try to write it in a clear and concise way. For example, SPSS produces tables giving you ihe number of valid answers. You do not need to include the table itself. You could simply write in brackets (n = 1500) when describing Ihe sample, to indicate lhal your sample contains 1500 individuals. Whenever you discuss or describe the results lhat relate to one of the variables, if yon sec thai there are a lot of missing answers, add a phrase about ihe number of valid answers, such as (valid n= ...) and fill in the number of valid answers Although the number of people in ihc sample is the same throughout the analysis of this dala file {n = 1500). ihe number of valid answers varies a lot. This is why 80 INTERPRETING QUANTITATIVE DATA WITH SPSS you have to specify how many valid answers you have lo a particular question. You do not have to do that for every single question: you report the number of valid answers only when there is a lot of missing data, and the valid percentages differ by several points from the total percentages. It is advisable in this case to report the valid percentages. In some cases it may be relevant to report both the valid and total percentages. What follows is a set of criteria that define a good descriptive report. Criteria for a Good Report THE GENERAL PRESENTATION Make sure the text is clear, well organized, and concise. If the analysis is long, a cover page may be desirable. Make sure that all the relevant information is in it: a title, your name, the name of the course and the course number, the name of the instructor to which you are presenting it,'and the date. Some of this information, such as your name and the assignment number, could be written in the header of your document (refer to Lab 2 for explanations on the header). The tables and graphs must be printed with the correct identification: a title must be given to every table or graph. If you copy the tables from SPSS with the Copy... command (rather than the Copy Object... command), you can edit the table, and delete the rows or columns that arc not useful or relevant. Also avoid grammatical mistakes: a spell check may be useful, but rely always on a careful reading of your report. Include in your report a description of the data file you are using: its source, the year the survey was conducted, the kind of variables that are found in it, the institution under which it was conducted, etc. DESCRIPTION OF THE VARIABLES UNDER STUDY Make sure to include in your study all the variables that are relevant for your subject. If there are several variables that address a given topic, use them all to analyze this topic. For instance, 'education' can be measured in several ways. If there are several variables that deal with education, examine the distribution of each. To describe a variable properly, you must select the appropriate measures. Do not compute the mean of a qualitative variable, because it is meaningless. You may want to use some of the recoded variables, or recode some variables yourself. Do not include a table of frequencies if the variable is quantitative. Such tables are usually quite long, and they arc not useful to the reader. If the quantitative variable has been grouped into a small number of categories, a frequency table may be useful, in addition to the descriptive measures used for quantitative variables. Finally, formulate your conclusions in full, grammatically correct sentences that highlight the meaning of your numerical results. An example of a very concise description of the educational level of the people in our sample is given in Insert 4.1. WRITING A DESCRIPTIVE SUMMARY The appropriate measures to be used are summarized in Table 4.1. ai Table 4.1 Appropriate descriptive measures for the various levels of measurement Level of Measurement Nomina! (categories) Ofji-i-i! Numerical scale, ungrouped Numerical scale, grouped Appropriate Statistical Measures Frequencies, percentages, mode. Ratios, proportions and rates. Frequencies; mode; median. Cumulative frequencies. (If there are many categories, you may compute the mean and median, but the interpretation of lite numerical results may be problematic.) Mean, median, mode, range, minimum, maximum standard deviation, interquartile range. (Frequency tables are not useful for this type of measure.) Frequency tables, mode. If there are a large number of groups: mean and standard deviation. Tne mean is usually the mean code of the categories. It can be used for comparative purposes if other samples are grouped in the same way. but it sltould not be mistaken for the mean of the variable itself. If grouped into a small number ij( categories, it Should be treated like ordinal data. Appropriate Charts Bar cham, pie charts Bar chares; histograms Histograms, frequency polygons, box- plots, time lines Histograms, bar charts, pie chart. Box plots may be misleading if the number of categories is small. Examples of Concise Descriptive Reports What follows (Insert 4.1) is an example of a short descriptive report, which answers the question: Describe the educational level of the sample given in the file GSS93 subset that comes with the SPSS program. INSERT 4.1 Descriptive report of the educational level of the sample The data set used here is a subset of the General Social Survey conducted in the US in 1993 (n = 150Ü). There are three variables in this data set that address the issue of education: the highest year of schooling completed (scale), the highest degree obtained (ordinal, 5 categories) and the possession or not of a college degree (ordinal, 2 categories). The average highest year of schooling completed is 13 years with a standard deviation close to 3 years. The graph below shows the distribution of this variable. INTERPRETING QUANTITATIVE DATA WITH SPSS If wc compare that situation with ihc Ideal number of children, we see ihai the mean for that variable is 2.76 children, bul die comparison wiih the actual number of children is difficult to make, as there are 535 missing answers for that variable (wc can assume that only those who had children were asked that question). It is better to examine the histogram of the ideal number of children. Here we see that the mode, or most desirable situation, is by far the situation with two children. Very few pc0ple think that one child is the ideal situation. wj Std. Dov = 1.57 Mean = 2.8 N ■ 965.00 0.0 1.0 2.0 3.0 4.0 Ideal Number of Children 6. Spanking Children We have answers for 66% of the respondents, and tlte rest of the answers are missing. Of those who answered, about three-quarters (73.3%) indicated they either agree or strongly agree with spanking children as a disciplinám measure, while the rest (26.7%) disagree or strongly disagree. 7. Number of Siblings We see here that the average is 3.7 brothers and/or sisters. If we examine the cumulative frequencies, wc see that 60.2% of the respondent come from families of 4 children or less (the respondent plus 3 brothers or sisters), the rest (almost 40%) coming from families with 5 children or more. Comparing that with the number of children people currently have, wc see that in general, individuals come from families that arc larger than die families they themselves establish, since the average number of children in this sample tends to be much smaller than the number of brothers or sisters respondents have. ■ WRITING A DESCRIPTIVE SUMMARY Analytical Descriptive Reports 87 The examples shown above are quite direct, and consist essentially in reporting, almost as is. the information provided in the frequency tables. But a more analytical view would permit a richer reading of such tables. To illustrate what is meant by that we will go into a more detailed - and more analytical - reading of frequency tables. EXAMPLES OF HOW TO ANALYZE A FREQUENCY TABLE To make our point clear, wc are going to analyze four cases of the same situation, represented by the tables below. They all deal with the frequencies of the variable Political Party Affiliation, taken from the GSS93 subset tile. The first table is the one that we get from the actual data in this file. The other three have been modified to illustrate how the analysis can highlight the distribution pattern. Table 4.2 Political Party Affiliation A Frequency Percent Valid Percent Strong Democrat 213 14.2 14.3 Not Set Democrat 298 19.9 20.0 Ind. Near Democrat ISO 12.0 12.1 Independent 1'/ 12.5 l ;■..-. Ind. Near Republican 148 9.9 9.9 Not Sit Republican 280 18.7 18.8 Strong Republican 168 11.2 113 Other Parly 17 1.1 1.1 Total valid. 1491 99.4 100.0 NA 9 .6 Total 1500 i 00.0 Case A Analysis of Case A (Table 4.2). We see from the table that those who are affiliated with the Democrats (strongly or not strongly) add up to 34.3%. or slightly more than a third. Those who arc affiliated with the Republicans add up to 30.1%. or slightly less than a third. The independents add up to 34.5, again a little more than a third. It is interesting to note that ihe population is almost evenly divided into three groups, and that ihose who affiliate to neither party are as numerous (or a little more numerous) than those who affiliate widi either of the two main parties. We can also notice that, within each of the two main parlies, those who do not have a strong affiliation with the party are more numerous than those who have a strong affiliation (for the Republicans: 280:168, or about 7:4. and tor the Democrats. 298:213, or about 3:2). The bar chart shown in Figure 4.1 illustrates this situation. Case B Analysis of Case B (Table 4.3). Wc sec from the tabic that those who affiliate with the Democrats add up to42.1%. Those who are affiliated with the Republicans add up to 39.1%, or slightly less than the Democrats. The independents add up only to 17.6%. indicaling that there is a strong polarization between the two INTERPRETING QUANTITATIVE DATA WITH SPSS ■DU 3O0 - ÍCO - 100 - Strong Dnmocfol Ind, Nooi DOiti Ind. Nnar Rap Strong Ropubllcan Not Str Domociat Indopendoni Not Sir Ropubllcan Other Party Political Party Affiliation Figure 4.1 Political Party Affiliation Table 4.3 Politica Party Affiliat on B Frequency Percent Valid Percent Strong Democrat 272 18.2 18.2 Not Sir Democrat (V, 23.? 23.9 Ind, Near Democrat w 8.1 8.2 Imlepenilcnl 57 (.8 3.8 Ind, Near Republican 84 5.6 5.6 Not Str Republican 15] 23.4 23.5 Strong Republican .'.>: 13 J 15.6 Other Pany 17 • I I.I T01.il valid 1491 99.4 100.0 NA 9 .6 Toial 1500 100.0 parlies, wiih less than 1 person out of 5 not affiliated to one of these two parties. We can also notice thai, within a party, those who are not strongly affilialed with ihc parly arc more numerous than those who are (for ihc Republicans 23.4% vs. 15.5%, or a ratio of about 3:2. and for the Democrats 23.7% vs. 18.1 %. or a ratio of about 4:3). The bar chart in Figure 4.2 illustrates Ihis situalion. and the polarization between the two parties is clearly visible. Case C Analysis of case C (Table 4.4). Wc sec from the (able lhat iliose who are affilialed wiih (he Democrats add up to 35.6%, or slightly more than a third. Those WRITING A DESCRIPTIVE SUMMARY 'I'-: JC0 2C0 - 'v: - Strong Democrat Ind, Noar Dom Ind. Noor Hop Strong Ropubllcan Not Str Domociat Indopondonl Not Str Ropubllcan Oihoi Pany Political Parly Affiliation B Figure 4.2 Political Party Affiliation B Table 4.4 Political Party Affiliation C Frequency Percent Valid P*rcent Strong Democrat 292 19.5 19.6 Not Sir Democrat 236 15.7 15.8 Ind, Near Democrat 188 12,5 12.6 Indepcndenl 93 6.2 6.2 (ml. New Republican IĎ3 110 11 l Not Str Republican 233 15.5 15.6 Strong Republican 267 17.8 119 OtlKr Party 17 I.J l.l Total valid U9I 99.4 100.O NA 9 6 Total 150O 100.0 who are affiliated with the Republicans add up to 33.5%. or about a third. The independents add up to 29.9%. Thus, the population is almost evenly split between the three groups, with ihe Democrats only slightly ahead of the Republicans. Notice lhat, within each party, those who are siron^ly affiliated with the party are more numerous than those who are noi (a ratio of 4:3 for the Democrats, and a ratio of 6:5 for tlie Republicans). This is illustrated in Figure 4.3. Case D Analysis of case D (Table 4.5). Wc sec from the lable thai Hits is a situation of weak polarizaiion between die Republicans and Ihc Democrats. The Democrats aitracl 42.8% of the population, while the Republicans only get 30% of die w INTERPRETING QUANTITATIVE DATA WITH SPSS v:c 300 - 20O - 100 - Strong Democrat Ind. Near Dem Ind, Near Rep Strong Republican Not Sir Democrat Independent Not Str Republican Other Patty Political Party Affiliation C Figure 4.3 Political Party Affiliation C; support, almost 13 points behind the Democrats. The independents add up to 26.0% of the imputation. Notice that, within each party, those who are strongly affiliated with the party are the majority, with a ratio of about 4:3 for the Democrats and about 5:4 for the Republicans, a situation illustrated by Figure 4.4. ■ Table 4.5 Political Party Affiliation D Frequency Percent Valid Percent Strong Democrat 356 23.7 23.9 Not Str Democrat 282 18.8 18.9 Ind. Near Democrat 188 12.5 12.6 Independent 116 7.7 7.8 Ind, Near Republican ■4 5.6 5.6 Not Su Republican 202 13.5 13.5 Strong Republican M6 16.4 16.5 Oilier Party :•■ 1.1 1.1 Total valid 1491 99.4 100.0 NA 9 .6 Total 1500 100.0 As we have seen, the short descriptive paragraphs that follow each table do not simply report the frequencies. Wc have tried to highlight the specific features of each situation by answering the following questions: Is there a polarization WRITING A DESCRIPTIVE SUMMARY 91 300 H 2CG - IDO - Strong Democrat Ind, Near Dom Ind. Near Rep Strong Republican Not Str Democrat Independent Not Str Republican Othar Party Political Party Affiliation D Figure 4.4 Political Party Affiliation D between ihe two parries? Is one of them clearly more popular than the other? Is there a large proportion of independents? How is the level of mobilization within each party? We answered that last question by providing the ralio of those who feel a strong affiliation to the party compared to those who do not feel a strong affiliation. A descriptive report that does that systematically is more analytical than one where the percentages are flatly reported as is. Insert 4.3 illustrates such a report. INSERT 4.3 Description of the Voting Behaviour and of the Political Tendencies of a Sample of US Residents The data summarized here come from a (non-representative) sample of 1500 individuals, which is a subset of the General Social Survey conducted in the US in 1993. - Four variables deal with our topic: Voting in 1992 Election, Polilical Party Affiliation. Think of self as Liberal or Conservative, and Political outlook. All four variables arc measured at the nominal level. An examination of the frequency tables shows that the last variable is a recode of the third one. as explained below.