ľ.u INIf HPftETINÜ QUANTITATIVE DATA WITH SPSS ~- Tabla 8.4 Crow-tabulation of th« variablem L*v»l of socialization with peer* and Indention to quit thtt job Htgh icvci of nouriluDOB wlihpMn LO* If vrI til *.m*lmiCitin with \xeti Tolah iiiUtiiii.ii lo conti n oo Intention to find with th« prvsent job another job toon Totals »9 ■ľ ZO 65 Mil ,.il MIU A (able such as Table 8.4 is called a two-wa) table, or a contingency table, nr a cross-tabulation of the (wo variables. We can read in it thai we have the answers for 300 employees, uf which 240 have | high level of socialization with their peers, and 60 a low level of socialization. Of Ihcsc same 300 people, 23S do not plan to leave their jobs for the time being, and 65 wish to find another job soon. The number wntten in (he lower right corner is the gnmd total; the other totals are called marginal totals. Cm wc dcicimine. on (lie basis oi ihm tuhle, that there is some kind of link between the fact that people ilo not socialize with iheir peer?, and thru desire to leave ihis |oh? In order to answer this question, ii may lie helpful to compute some percentages. We will compute the row percentage*, that is, the percentages within the categories of socialization with peer'. Die results are shown in Tahle 8.5. 1 Table 8 5 Croii tabulation of th« variables lev»/ of ioclslltstion with peer* and Intention to quit this /ob Intention to continue with th« present job Intention to find another job soon totals High level of socialization 1« 45 240 wiihpeen Ptrctniagt wiihin Ltvtl of %ociali;ati could be; t) Would you lay you have a kiigb or low level oi u Labi u ui i vom peers .u work? (check on«) A. High level:_______ Low level:_______ But this is not a very good qurstion. because there is no uniform definition of what is a high level or a low level. Mislead, there could be n series of indicators. represented by questions such as: Do you take your lunch with your peers or alone? ! drive home with some of lbem? Do you phone some ol (hem during the weekends? If you have a problem with the boss, would you trust iheoi en» to seek their advice? I i. On the basis of the an wo Ik ,..... catcher would divide the respondents into two groups those who display a high level ol socialization and those who don't. Tiu- criterion for < lasaifii ation could be something hke ihose who answered Yes to niosl o| these questions will lie • luvoticd 09 having ■• high level of Interaction Mere, the COttCflpf that wc are Dying Uj observe is the *i""i with peers, and all the other variables (having lunch with them, calling iheni. etc.) arc indicators of that concept. (Review Chapter I on these notions. I We therefore notice a big difference between those who socialize with their peers and those who do not. In Ihe latter category, a larger percentage of individuals plan to leave their job. We can say, therefore, that: Individuals In this sample win» do not socialize with (heir p..i s m, mni • liki-h to ** ant to lind aiiotliei job than Ihosr who do socialize with their peers. Ih- ľM uline srntriKC illustiatcs the tllililiiuKlltal aspect «'1 M " •>" íl ■- 01 ll ttOfJ between tWO categorical variables: People who ate in one of the categories ol the first variable arc more likely to lind themselves in a given category of ihe second variable. Titus wc can conclude: There is a statist leal association between Ihe variable! />ve7 of urcialization with peen and Intention to quit this job. 152 INTfHPRETING QUANTITATIVE DATA WITH SPSS Keep ni minii, though, thai it does noi follow front thai conclusion ih.it the level of s.>, iMi/.ihun is the . nine oi the intention lo quit. Il could well he ihc other wiiy mound. Or both variables could result from .1 third reason noi presented in this tublc. such as: this place of work is in n remote area, fat from people's houses. We will come hack to the interpretation of ihc statistical association later in this chapter. Inere ii Jiiothcr was i>( U Lmu' at tin- .fatisdc.d associatJoil JtmflnnJ abow Instead of looking at the percentages within tlie levels of socialization, we could look at ihc percentage within (lie categories of ihc variable Inlcntion to quit Ihis job. Wc would get Table 8.6. Table 8 6 Crots-tabulatlon of tho variables Low! 0/ socialization with pmmrs and Intmntlnn to quit this fob Intention to conti""» with lh« pr«s«n< loll Intention to find another Job soon totals High level of McialiuUoB wiihprt" PcitenMtr nlthln Intention to quit/oil I9j 83.0% 45 240 Low lc»*l of wkuIiucmi wkhpeci* Percentage wtihm Intention lo quiij"'' 40 17.0% 30 30.8% 60 Touh 2SS 100.0% 65 100.0% 300 W.- ,.1.1 n,tH „lake an nulysis similar to ihc one we made above. Among the people who plan 10 continue working at lbe same pime. S.V.Í, mainlain .1 liifdi level of socialization wnii Him pom, gut ihn perctnugi drop« down to 69.M among those who wish to find a job somewhere else Thus, wc can say that the individuals oj 'his sample who do plan lo stay in this jolt lend to socialize with their peers at a higher level than those who pl'tn to leave. Again, this indicates (or confirms) that there is a statistical association between the two variables Note that the percentages wntten in the two tables above are called either; percentages if they add up to 100% horizontally, across the cells of one row, or column percentages if ihcy add up lo I í KM vertically, die «ells of one column You will learn in Lab 10 how to produce similar tables with SPSS. Keep in mind that we are only talking about statistical associations, not about causes. Il does not follow from the existence of a statistical association lhat one of the variables is the cause of the «Hier. STATISTICAL ASSOCIATION 11J The Case of Ono Quantitative and One Qualitative Variable Suppose now that wc want to anály/e ihe statistical relationship between DM quantitative and one qualitative variable, for instance Income (quantitative) and Sex (qualitative). Several opiums arc offered to us. The simplest is to compute the average of the quantitative variable separately for each category of the qualitative variable. Example Hie average income fOI a sample Ol I 500 |X'i>plc. OttUlsÜDI Of «00 IDCli ami UK) women, is $19,400 a year. Suppose thai the average incoii ofc women and for men separately ia given by: Average Ira otnc of men: 52 I, nut 1 Income of women in Uial sample: $1 Í.300 Tins wotiM mean that there Is a large difference between the incomes of men and women ľhe income ÓI men b (23.400-17,300) /17,300 x 100 = 35.2% higher than that of woroea This means thai there is a statistical association between the variables income and sex fOT BW individuals of that sample (we are not generalizing to die whole population yet). However, the preceding («temeni does noi mean thai sc\ is the 1 oust «>i the difference in Income, All we can say (or the time being 18 thai women make less money than men do. The interpretation ol thai differ ence is aoothei matter, ü could ba due to discriminati.....dip sysiei...... n could be due to lome othei intervening variable (it. for instance, lbe women ol (nil ample tended to be younger than the mm. and therefore have let! working experience) or some other cause ■ Finding the average for men and for women separately is not the only way to establish the existence of a statistical association. Another method would be to reootk income into three categories: high, intermediate, low. and then treat both variahi categorical variables. In SPSS Lab 5, you have seen in detail how 10 illustrate the dU fcrence between the incomes of various groups graphically with box plots, SPSS I lb 11 shows how to compute statistical incomes for each group separately. Ordinal Variable« There are specific methods for establishing statistical usmn ,........between ordinal «rUbles. Such methods take into account the ranking of each „.dividual on one of 57 16 8579 154 INTERPRETING QUANTITATIVE DATA WITH S the variables in comparison to his or her ranking on the other variable. They will not be treated here. Ordinal variables are often treated as quantitative variables and correlations are computed. The results of such computations are sometimes difficult to interpret. Statistical Association as a Qualitative Relationship The interpretation of the statements made above in the section on two qualitative variables about the statistical association between them is not obvious. Recall that the two variables were the level of socialization of workers with their peers in a factory and their desire to stay or quit their job. We had found that the two variables were associated statistically. But there could be several possible interpretations of that statistical association. First interpretation; We can interpret the statistical association to mean that a high level of socialization induces people to want to stay in that job. The explanation could be that the job is therefore more enjoyable, and people want to continue working there. In a way, the high level of socialization can be considered to be a cause for staying in that job. and inversely, a low level of socialization a reason to leave. So, wc are now talking about more than a statistical association: we are talking about a relationship between variables. This situation can be represented by the diagram shown in Figure 8.5. Level of sociali;aiion Desire to stay or quit Figure 8.S In symbolic terms, if we designate the level of socialization by X. and the desire to quit the job by Y, we could write: X=* Y We could go a little further in that interpretation. If. in our theoretical framework, we had used the variable Satisfaction with she job, denoted by Z. as a general concept, and the level of socialization as one indicator of that concept, wc could now conclude that the relationships can be illustrated by Figure S.6. Level of socialization Figure 8.$ Sati^ci.-on '.v,tr job L^~> ^ll;. -.- ,,-,v ,_.. „ ,, STATISTICAL ASSOCIATION 155 The following pattern illustrates the situation. X=»Z=rľ In other words, the level of socialization is used as an explanatory variable, to explain why people are more inclined to quit their jobs. Notice that (his interpretation does not follow from the statistical analysis of the association between the two variables. This is clearly an interpretation, and it is not the only possible interpretation, as we will see in what follows. Second interpretation. We could reverse the preceding interpretation and say that if individuals tend to quit their job (they may perhaps want a better salary, or a more challenging job), they will not invest a lot of energy in socializing with their peers, since they know they are going to quit soon. Here the model is reversed: Y=>X - In other words, the desire to quit the job is used to explain why people do not socialize a lot with their peers. This interpretation, like the previous one. does not follow automatically from the statistical association between the two variables. The statistical association allows such an interpretation, but it does not prove it. Third interpretation. The results of the statistical analysis are consistent with yet another interpretation, which asserts that both the desire to quit and the lack of socialization arc the result of a third variable, such as Desire to get a better salary. If people think that their present salary is too low, and that they can get a better salary if they find another job. they may plan to quit and also they may decide not to invest too much energy and time in socializing with their peers. The model proposed here for explaining the statistical association is the following. Fourth interpretation. The last interpretation that we could propose is to consider both variables as indicators of the general concept Satisfaction with job. This concept could be measured by several indicators: level of socialization, intention to stay, satisfaction with the salary level, pleasant atmosphere at the office, relationship of support and cooperation with the management, etc. In this interpretation, the key concept is the global satisfaction with the job^When people are globally satisfied, they are more likely to socialize with their peers, to consider staying in this job for a long time. etc. Sometimes the qualitative relationship between two correlated variables is said to be spurious. To say that a relationship is spurious means that there is no logical link between the two variables, and that the statistical association is misleading. Such statistical association is often due to a third variable, but the logics linking each of ISfi INTERPRETING QUANTITATIVE DATA WITH SPSS the two correlated variable with the third one are completely unrelated. A classical example is that of height and salary. It could turn out that there is a statistical association between the height of an individual and his or her salary for a given sample. But if we break down the sample studied into men and women, we find that within each group there is no relationship. What happens is that on one hand men tend to be taller than women, and on the other hand in most societies the social structure favors men over women and the former end up tending to have higher salaries. The two kinds of associations (sex and height; gender and salary) follow logics that are totally unrelated to each other, hence our conclusion thai the statistical association between height and salary' is spurious. However, it is not always clear whether two sets of causal relationship are related or not, and one should be quite careful in interpreting a statistical association as spurious or as meaningful. Summary and Conclusions From Statistical Association to Relationship between Variables The discussion above should help us understand belter two distinct concepts, the concept of statistical association and the concept of relationship between variables. Statistical association is something that can be observed objectively and measured, as we have seen in the examples above. Basically, it means that if you know the score of an individual on a variable X you can make a better guess of his or her score on another variable Y than if you did not know the score on X. The measure of statistical association depends on the level of measurement of the variables, which depends partly on the type of variables. • For quantitative variables measured by a numerical scale, statistical association is called correlation. Two such quantitative variables arc correlated when the values of one of them can be predicted with some precision from the values of the other variable. For linear correlation, the points representing the individuals are close to a straight line, which is called the regression line. If the association is strong, the points are very close to the line, the correlation coefficient r is close to 1 or -1, and the predictions based on the regression line involve a small error. • For qualitative variables measured by a nominal scale, statistical association is analyzed with the help of a contingency table, also called a two-way table or a cross-tabulation. Statistical association means that individuals who are in a given category of the independent variable are more likely to be in a specific category of the dependent variable dian in other categories. There are ways of measuring the strength of die association but they will not be discussed here. • if one variable (X) is quantitative (measured by a numerical scale) and the other one (ľ) qualitative (measured by a nominal scale), statistical association is studied by comparing the average scores on X across die various categories of Y. STATISTICAL ASSOCIATION 157 This situation is summarized in Figure 8.7. LEVEL OF MEASUREMENT OF THE VARIABLES NOMINAL VS. NOMINAL (Two qualitative variables) NOMINAL VS. SCALE (One qualitative and one quantitative variable) SCALE VS. SCALE (Two quantitative variables) PROCEDURE FOR ESTABLISHING THE ASSOCIATION CROSSTABS We compare the row percentages across the categories of the independent variable. If the difference is big we say that there is a statistical association. Lab 10 COMPARE MEANS We compute the mean of the quantitative variable for each category defined by the nominal variable separately. We compare these means to see if there is a big difference across categories. Lab 11 CORRELATION The value of r, the correlation coefficient. tells us whether the association is strong or weak, and whether it is positive or negative. The regression line (given by sn equation as well as on a graph) helps us predict how an individual scores on the dependent variable when we know the score pn the independent variable. When predicting there is always an error, which is small when the correlation is strong. Lab 12 Figure 8.7 How to measure statistical association? It depends on the level of measurement of the variable Relationship between variables. This notion is used to describe the logical link between variables. The independent variable could be a cause of the dependent variable, or an explanatory factor of the dependent variable; they could both be effects of some other variable; or they may be two indicators of a concept, or even two aspects of the same phenomenon. The notion of relationship between variables is a qualitative notion. It is a matter of interpretation, and it depends on the theoretical framework used in the research and on the research question or the research hypothesis. Statistical association should not be automatically interpreted as meaning a causal link,