THE BASIC LANGUAGE OF STATISTICS Tins chapter Is an introduction to statistics .uni to quantitative methods, ii explains ílu- lusit liiiiľiiu^c Livi-,l m štatistu s. i lie notion of a tlala Me. 'lie lIisIuk lion between dcacnplivc and infcrcnii.il statist» v and die hasit tonecpls "I tlutiMu ■» .nut quantitative methods, Aftei itudying this chapter, Ihn ttudeni should know: • die btsk vocabulary <>i slitlsUci and of quantitativ« methods; • wh.it an electronic dan Rk looks like, and how to Identify cues and variables: • the different uses of |he icmi 'statistics': • (he basic definition of descriptive and Infcrcniial statists ■; • ihc type of variables and of nwattinnwiil scales; • how concepts are OpcraUonallZcd with the help of indicators Introduction: Social Sciences and Quantitative Methods Social iciencci arm .11 nudying -m.iI and human phenomena iv rigorous!) .is possible fhis involve dem rtbini tome aspect of the social realUy. analyzing n 10 Mr whclltcr logical links« .m lie established between its viiiiotis pmls. and. whcncvei possible, predicting future outcome*, I he jicncial objective ol HCfa studies is to understand (he ptltCTM ol individual or Collective hehavior. the constraints thai affect it. the causes and explanations thai can help ns understand our societies and ounelves better ud predict the consequences of OOttain situations. Such studies are never entirely Objective, us ihey are inevitably based on cettaifl assumptions and belief s ibat cannot he demonstrated Our percep-lioflSOl social phenomena arc themselves subjecdvedUa U;v .Mem. as the) depend on ihc mnmings -*c Htribute to what «re observe frnaa, we ttittrpm lociaJ and human phenomena much more (han we describe them, BUI WC iiv to make ihat inter- piei.iiiou us objective .is possible uf the phenomena «....... ubeowaw^M which means that we es* lianslaic into numbers some tspc.fsol OUT observations I •* IMUM ( WC OSI quaiHtf) 9273 69 2 INTERPRETING QUANTITATIVE DATA WITH SPSS population Change: wc can couni how many babies are bom every year in a given country, how many people die. and how many people migrate in or out of (he country. Such figures allow us to estimate the present size of the population, and maybe even to predict how this size is going to change in the near future. We can quantify psychological phenomena such as the degree of stress or the rapidity of response to a stimulus; demographic phenomena such as population sizes or sex ratios (the ratio of men to women): geographic phenomena such as the average amount of rain over a year or over a month; economic phenomena such as the unemployment rate; we can also quantify social phenomena such as the changing patterns of marriage or of unions, and so on. When a social or human phenomenon is quantified in an appropriate way, we can ground our analysis of il tin figures, or stafislics. This allows us 10 describe the phenomenon with some accuracy, to establish whether there are links lietween some ol the variables, and even [0 predict the evolution of the phenomenon. If the observations have been conducted on a sample (thai is. a group of people smaller than the whole population), wc may even be able 10 generalize to the whole population what we have found on a sample. When wc observe a social or human phenomenon in a systematic, scientific way. the information we gather about it is referred to as d,iui. In other words, data is information that is collected in a systematic way. and organized and recorded in such a way that it can be interpreted correctly. Data is not collected haphazardly, but in response to some questions that the researchers would like to answer. Sometimes, we collect information (that is, data) about a character or a quality. Such as the mother tongue of a person. Sometimes, the dala is something measurable with numbers, such as a person's age. In both cases, we can treat this data numerically: for instance WC can COuni how many people speak a certain language, or we can lind lbe average ajjc of a group ol people. The procedures and techniques used io analyze dala numerically are called quantitative methods. In other words, quantitative methods arc procedures and techniques used to analyze data numerically; they include a study or the valid methods used foi collecting data in the first place, as well as a discussion of ibe hunts of validity of any given procedure (that is. an understanding of the situations when a given procedure yields valid results), and of the ways the results are to be interpreted. This book constitutes an introduction to quantitative methods for the social sciences. The first chapter covers the basic vocabulary' of quantitative methods. This vocabulary should be mastered by the student if the remainder of the book is to be understood properly. Data Files The first object of analysis in quantitative methods is a data file, lliat is. a set of pieces of information written down in a codified way. Figure I. I illustrates what an electronic data file looks like when we open it with the SPSS program. r m e hasíc language or STATISTICS a 7m uiolďl SJ: J nm» Jal *i iOd JSiSlĽJ' 2Ľ*1 .m,\"\ ig—»f. ľ. HI" ■i.i:-. 12 a o á . ,V----------w/ figure 1.1 Th» Data window In SPSS version 10.1. C SPSS. Reprinted with permission. This data file was created by the statistical software package SPSS Version IO. I. which will be used in this course. The first lab in the second part of this manual will introduce you to SPSS, which stands for Statistical Package for the Social Sciences. On the top of the window, you can read the name of the data file: GSS93 subset. This stands for Subset of the General Social Survey, a survey conducted in the USA in I993. When we open an SPSS data file, two views can be displayed: the Data View or the Variable View. Both views are part of the same file, and one can switch from one view to the other by clicking on the lab at Ihe bonom left of the window. The Data View Hie information in this data view is organized in rows and columns. Each row refers to a case, that is. all the information pertaining to one individual. Each column refers to a variable, that is, a character or quality that was measured in ibis survey. For instance, the second column is a variable called »rkslat. and the third is a variable called marital. But what arc the meanings of all these numbers and words? A data file must be accompanied by information thai allows a reader to interpret (that is. understand! the meanings of the various elements in it. This information constitutes the codebook. In SPSS, we can lind the information of the codebook by clicking the word Variables... under tlie Utilities menu. Wc get a window listing all die variables contained in this data file. Bv clicking once on a variable, we see the inlbrnfotion pertaining to this variable: the short name that stands on ihe top of the column; what the name stands for (the label ol the variable); the numerical lype of the variable filial is. how many digits are used, and whether it includes decimals); other technical information to be explained later; and the Value Labels, that is. what each number appearing in the data sheet stands for, 04 21 INTERPRETING QUANTITATIVE DATA WITH SPSS ^^^^9 v»mMBHw™««ic_ rim La|*l; Unii Mm *.kc UttH' Í— d»Tt I dws'iri -i -»t.-[-■ &jý_j- O." figure 1.2 The Variables window in SPSS. The codes and value labels of the variable Marital Status are shown ■mawwMgamr .„■>- :--. •—lil ;j|Hl*l ^1 ;.-j y,>l *l filrrl Blg-H^lP^;. -n mi | igt-*) [ sfa -£: ■■ '..■-.•, umí. >■ ■-.-■ ■■ ..,■■ mp IMM ■-""' -I'll"! " I'l'i'J WiiV-ngfiltn» í ■--..!.. :i; Ml". -,.:•!.d "•r ■'.ri u; ',■... -l-^' ■ -" ľ - " *«— j\"~» glJ**./ ----i-ii-J ^ Figure 1.3 The Data View window in SPSS when the Show Labels command is ticked in the View menu. The value labels are displayed rath t; r than the codes Figure 1.2 shows rhe codes used for ihe variable Marital Statun. You may have noticed that: 1 stands for married 2 stands for widowed 3 stands for divorced etc. ••■• The numbers 1. 2. 3, etc. are die codes, and die lentis married, widowed, divorced, tftc. are the value labels that correspond to the various codes. The name martini* which appears at the top of the column, is the variable name. Mariini Stains is the variable label: it is a usually longer, detailed name for 'he variable. When we prim tables or graphs, it is ihe variable labels and the value labels thai are printed. figure 1.4 The Variable View window in SPSS. The variables are listed in the rows, and their properties are displayed Figure 1.5 The Value Labels window in SPSS. In this window it is possible to add new cotles and their corresponding value labels, or to modify or delete existing ones There is a way of showing ihe value labels instead of the codes. This is done by clicking Value Labels under the View menu. The Data View window looks now as shown in Figure 1.3. We can sec ihai case number 4. for example, is a person who works pan lime, and who has never been married. To understand the precise meaning of the numbers written in the other cells, we should first read ;he variable information found in the codebook for each of the variables. in version 10.(1 and version 11 of SPSS, yon can read the information pertaining to the variables in the Variable View. By clicking on ihe tab for Variable View, you get the window shown in Figure 1.4. / In the Variable View, no data is shown. You can see. however, all ihe information pertaining to the variables ihcmselves. each variable being teprcscnied by a line. The various variable names are listed in the firsi column, and each is followed by information about the corresponding variable: the way it is measured and recorded, its full name, ihe values and iheir codes, eic. All these terms will be explained in detail later on. The label, ihat is. the long name of ihe variable marital, is Marital Status. By clicking on ihe Values cell for the variable marital, the window shown in Figure 1.5 pops up. * INTERPRETING QUANTITATIVE DATA WITH SPSS We am sec again ihc meanings of ilic codes used lo designate the various marital statuses. We can now raise n number of questions: i low «lid we come up with this data? What ;tre llic rules lor obtaining reliable dala lhal can Iv interpreted easily'' How can we analyze diis dala?Tahlc I. I includes ;■ systematic lisl of such questions. The answers io diese questions will he found in the various chapters and sections of iliis manual. Table II Some questions thai arise when we want to use quantitative methods Questions Chapters How did wc come up with this daia? What are the question« wc arc trying 10 answer? What Is ihc place oí quantitative analysis in social research, and nit* dues ii link up with die qualitaiive questions wc may wanl to *sk' Whai is the scientific way of denning concepts and opei jtiuoaltzing them'.* How do we conduct social rceaich in a scientific way? Wtiai procedures should wc follow lo ensure that result- are scientific? Wliot arc die basic lypes oí research design»? HOW do we go ahoul collecting Hie dat.»' < ki« collected, tl«: iiata mini ix- «sulzcd ami described. Hnw du wc ilu that? When we summarize lite data what are Ihc chnruclcnsiics iii.t wc liieus on? What kind of inhumation is lost? Whal arc llie must lounnun types of shaivs and d i sir ihm ions we encounter? Whal arc the procedure* lor Selecting a sample? Arc some of them heiler than o»herV' .Some institutions collect and publish a let of social data. Where can we find il? How do v,c use it? Sometimes wc notice coincidences in the data- lor instance. ihosc who have a higher income tend to behave diifeienily on some social variables than those who do not, K there a way of describing such relationships between variables, and drawing their significance? Sometimes tile ilata come* fnjm a sample, that is. j pan of the population, and not the whole populaiii'iv (Sin *«■ generalize our conclusions to the whole population on the bub »1 d»e dala Collected on a sample" How can [his l-c done* Is it preclM? What an the risks that nur conelusi»its are wrung? I. The Basic Lancuai! .Statistics Mi- lne Reseat i h Prose >s > Univariate Descriptive Statistics 5. Normal l)isinhuii"ii\ 0. Sampling Deigns 7 Statistical Database* X. Statistical Association 9. Statistical Inlcrence: list 11 nan i m III Statistical Inference; Hypothesis Testing The Discipline of Statistics The term statistics is used in two different meanings: it can refer io the discipline of statistics, or it can refer to ihe actual dula lhal has been collecled. - THE BASIC LANGUAGE OF STATISTICS 7 As a scientific discipline, the object of Statistics is the numerical treatment of data lhal pertain lo a large quantity of individuals or a large quantity of objects. Il includes a general, theoretical aspect which is very nialheiiiaiieal. but it can also include the Study Of ihc concrete problems (hal are raised when we apply the theoretical methods to specific disciplines. The term quantitative methods is used to refer to methods and techniques of statistics which arc applied to concrete problems. Thus, ihe difference between statistics and quantitative methods is that the latter include practical concerns such as finding solutions to the problems arising from thq collection of real data, and intcrprciiug (he numerical results as they relate to concrete situations. For instance, proviny that ihe mean (or average) of a sei »f values has certain mathematical properties is part of statistics. Deciding ihai the mean is an appropriate measure to use in a given situation is part of quantitative methods. But the line between statistics and quantitative methods is fuzzy, and the two terms arc sometimes used interchangeably. In practice, the term statistics is often used to mean quantitative methods, and wc will use it in lhat way too. The lerm statistics has also a different meaning, and il is used to refer 10 the aclual data lhal has been obtained by statistical methods, Thus, we will say for instance that ihe latest statistics published by the Ministry of Labor indicate a decrease in unemployment. In ihai last sentence, ihc word statistics was used to refer to data published by ihe Ministry. Populations, Samples, and Units Three basic lentis must be defined to explain the subject matter of the discipline of statistics: • unit for element, or case). • population, and • sample. A unit (sometimes called element, or case) is ihe smallest object of study. If we arc conduciing a sludy on individuals, a unil is an individual. Hour study were about (he health system (we may want lo know, for instance, whether certain hospitals are more efficient than others), a unil for such a study w^uld lve a hospital, noi a person. \ population is the collection of all units thai we wish to consider. If our study is about (he hospitals in Quebec, the population will consist or all hospitals in Quebec. Sometimes, ihc term universe is used to relet to the sei of all individuals under consideration, but we will nol use il in ibis manual. Most of the time, wc cannol afford to study each and every unit in a population, due to the impossibility of doing so or to considerations of time and cost. In ibis case, wc study a smaller group of units, called a sample. Thus, a sample is any subset (or subgroup) of our population. 5 30 B-A 3631 30 a INTERPňCTING QUANTITATIVE OATA WITH SPSS OflWtptlvo Mfltlitk . H aim« at l Ihn dala. Somo ol Iho in forma I ton is lost is a rwulL A good suti maty captures Ihe issonlial aspocls o* tho data and (ha most q levo m ones. ms at Ľiusions about) i population whfi ieal rharociof tamptu is given. The inlerence orways implies a margin ol Ofrot and a probability ol orror. Inlorfinces liaiiüi on representative samples have a (iigMííf chance of bel rig corr&cl. A raMtlom sample is more likely to bo roprosunialivu. MEASURES OF CENTRAL TENDENCY "Tioy answer Win question: What ace t«c values that represent Ihe bulk of the the Oast way' «Milan, mode Ll.lt.l Man MEASURES OF DISPERSION They answar tne question: How ipr out is tho data? Is it mostly mnCtniraltd around Ihn canter, 0' s proud oul ovor a largo rimon' Sin ill In« I deviation, variandi ranou MEASURES OF POSITION Thny answer llvn question: How is linn individual entry positioned with 'aspect to all Ihn others' Percentiles, deciles, quartiles MEASURES OF ASSOCIATION Thny answer the question: II we know tnn scorn «I an individual on one variable, to what extent can we successfully predict how he is hkely to score on the other variable? Correlation cocHtaiont |r) ESTIMATION n is based on the distinction between sample and population. H consists in fluessing the value of a measure on a population (i.e. a p only the value on ihr ;the stntlstl«}. Opinit l>iisi»l on »slítnut km renulla «r« g* ti on with a mi , I lv 1.1 «riot ametorl when sample is known , poll* an> always Ihosuivöy is ii-iiiiitiv« sample. iMdliŕOil to the mtnil.....ľ ''ii'l HVPOTIIESIS TESTING It i* also based on tho distinct ic"> between sample and population, but Iho process »9 levotsorl: Wo mfl>n a hypothesis about o population parameter On thai basis, we predict a range ol values a vnHablo is iik*h/ to taxe on a representative sample. Then we go and measure the sample. If the observed value falls within Ihn predicted range, wo conclude that the hypothesis is reasonable. If the observed value 'alls outsldo the predicted rang», we reject our hypothesis. Figure 1.6 The discipline ol statistics and its two branches, descriptive statistics and inferential statistics The distinction between sample and population is absolutely fundamental, whenever you are doing a computation, or making uny statement, n must he clear in your mnul whether you are talking about a sample (a group of units generally smaller than du* population» or about Hu- whole |M>pulalion. The discipline of statistics includes two nmin brunches: • descriptive statistics, and • inferential statistics. .....--------- THE BASIC LANGUAGE OF STATISTICS The following paragraphs explain what each branch is about. Refer also to Figure 1.6. Some ol ihe lernis used in (he diagram may not lie clear lor now, lint (hey will be explained as we progress Descriptive Statistics The methods and techniques of descriptive statistics aim at summarizing large quantities of data by a few numbers, in a way thai highlights the niosi important numerical leatures of the dala. For instance, if you say that your average GPA (grade point average) in secondary schooling is 3.62. you are giving only one number that gives a pretty good idea of your performance during all your secondary schooling. If you also say dial ihe standard deviation (this term will be explained later on) of your grades is 0.02. you arc saying that your marks are very consistent across die various courses. A standard deviation of 0.1 would indicate a variability that is í times bigger, as wc will learn later on. You do not need to give the detailed list of your nurks in every exam of every course: the average GPA is a sufficient measure in many circumstances. However, the average can sometimes be misleading. When is (he average misleading? Can wc complement u by oilier measures ihui would help us have a belter idea of (he features of (he data we are summarizing? Such questions ore part of descriptive statistics. Descriptive .statistics include measures of central tendency, measures of disperston, measures of position, and measures of association. They also include a description of the general shape of the distribution of the data. These terms will be explained in ihe corresponding chapters. inferential Statistics Inferential statistics aim at generalizing a measure taken on a small number of cases that have been observed, to a larger set of cases that have not been observed. Using the terms explained above, we could reformulate this aim. and say that inferential statistics aim at generalizing observations made on a sample to a whole population. For instance, when pre-election polls are condueied. only one or two thousand individuals ore questioned, and on the basis of their answers, ihe polling agency draws conclusions about the voting intentions of the wholapopulation. Such conclusions ore not very precise, und there is always a risk that they are completely wrong. More importantly, the sample used io draw stuh conclusions musí be u representative sample, that is. a sample in which all ihe relcvani quuliiies of die population are adequately represented. How can we ensure thai U sample is representative? Well, we can't. We can only increase our chances ol selecting u representative sample if we select it randomly. We will devote a chapter «> sampling methods. Inferential statistics include estimation and hypothesis testing, two techniques that will be studied in Chapters 9 and 10. 1« IN I t «PKI llh<. OUANTirAIIVI DATA WtIM %f\% A lew mora »«•"»•' ntuai be defined to he able to go forth« in «mi Hudy, Wc need to talk j huk- iboul wtabfcl ind ihcdi «vjn-s Vati.ibli". and Mcisutemonl A variable Í8 a characteristic or quality that is observed, measured, ami recorded in a dttt tile l generally, in j single column). If you need 10 keep irock of (lie country of birth of ihe individuals in youi population, you will include in your study B win ahk- called Country ofbirlk. You may also »am io keep track of the nationality of the Individuals: you will itien Imvc another variable called NiiUiumtliy. I he two van ahlcs .ire ilislmcl. mi».1 some people nut) i.iih ihr nationality ni i touniiN ntliei than the one they weie born in Here are some examples ol variables utcd widely m MKÚd «letKCs Smlu-ilcmonfapble variables 5e* Religion level of education lliülicst degree obtained Marilal natal Country of birth Nauonaln) Mother tongue ľs>iholo«úal wniublcs I evel of anxiety Slumilus res|Hiiis,- ume Si on- obuineil m a personality test S,oie obtained hi .i" jptHude lest rioimmie variables Working status Income Value of individual assets Average number of hours, ol work per week Vm iuliles that refer lit units other than I In- iiiiliuilu.il Numlvr ol Inispilals in UCDUfHr) Percentage ol people who cen nud Percentage ol people who completed high school total population Birth raw li-mliiv rale Numlwr ol" teachers per I(HH) people Number nfdtKi.>ts per io.mi people Population L'tiiwth I'ledoininanl lelieioil You iii.iv have noUCCd that some til ibcSC v .»übles letci to i|iiahlies Im« h u iiioihei tongue) ami other* refer to quantities, such .is Ihe total population "i u countrj in ia.i l people m a boasehold, the Size Of a building, or ihe annual sales ol a producl Qualitative variables are Julk ten silts or qualities that are not numerical, sud) as mother tongue, or country Of Origin. The semes of the individuals Ol a popu I at inn on the various variables are called the values ol that variable Example Suppose you have the information shown in Table 1.2 aboul five students m vi'i.....Hece Table 1.2 EurnpUi of qualitative and quantitativ« variable« Nam« Wa Mai? Pd.-r Omim s ,1,1,- l"a 17 Itt ri Program of Study Sisul Soieoct Pure ami Appl»«l Sewnce < minima: Oflto SyuCTu Tttiaolog? tltuptiK Pwign________ Grad» Point Average 3.78 í.« M..-»VC» .1« llicie ne (hire vaitallies \y_, («|U itiliiaUVi] PrOgMBI "I Bind) (QU ifluUiví), .nul t.rade Point \vcrajje imwnlilativcl fhe VBhWli '»i sMues. taken by ihe individual« fOJ ihe vaiiahle Alte IIB 1 l, IH, L9(lwk0) Ud 10 fhlIVA! Mt I taten N'i ihe variable Cnntrani of Shuly arc Sucial Science, hm- and Applied Science, Cummer, i ' Ulice System« l iľ -l-'Ky. and Graphic Design. (Jualitaiive variables -tie sometimes referred to as categorical variables because (hey consist of categories ni which ihe population «in be classitied For instance, wc can classify all students in a college into categories according io the program of Study they are in. í arem) mention must iv ;:i.en to the wa> iibaofvatsDns penaining n a variable are "■'""'■>> We mu« find o system.....wording the data thai is very dear, and that can be uiierprewd without any ambiguii) CtMsider. i.« msumce. ihe roUowing char» lemtli i: age rank in the la.miy. und.....the, tongue ľne first characteristic ia a uuoniM) the sect....... a rank, and ihe third is......dii) i he 0 *wi.....d io record ""' <*wrvations al.......hesc chcracteristio. will he organiiwd into ihm levels ur measurement' 13 36 0933 12 INTÉBPflETING QUANTITATIVE DATA WITH SPSS • i measurement ai the nominal level; • measurement at the ordinal level; and • measurement at the numerical scale level. Each level of measurement allows us to perform certain statistical operations, and not others. The nominal level of measurement is used to measure qualitative variables. It is ihe simplest system lor writing down our observations: when we want to measure a Characteristic at the nominal level, we establish a number of categories in such a way-thai each observation falls into one and only one of these categories. For example, if you want to write down your observations about mother tongue in the Canadian context, you may have the following categories: • English. • French, • Native, and • Other. Depending on the subject of your research, you may have more categories to include other languages, or you may want to make a provision for those who have two mother tongues. It is important to note that when a variable is measured at the nominal level, the categories must be • exhaustive, and • mutually exclusive. The categories arc said to be exhaustive when ihey include the whole range of possible observations, that is. they exhaust all the possibilities. Thai means that every one of the observations can fit in one of the available categories. The categories arc said to be mutually exclusive if they are not overlapping: every observation fits in only one c-aiegory. These two properties ensure thai the system used lo write down the observations is clear and complete, and that there arc no ambiguities when recording the observations or when reading the data file. Tabic 1.3 displays examples of measurements made at the nominal level. Qualitative variables must be measured ai ihe nominal level. The ordinal level of measurement is used when the observations are organized in categories that arc ranked, or ordered. We can say lhal one category precedes another, but we cannot say by how much exactly lor if we can. we do not keep that information). I lere too the categories must be exhaustive anil mutually exclusive, but in addition you must be able to compare any iw« categories, and say which one precedes the other (or is bigger, or better, etc.). Table 1.4 displays examples of variables measured at the ordinal level. THE 8A51C LANGUAGE Of STATISTICS 13 Table 1.3 Examples of variables measured at the nominal level Variable Place of bi nh Woik M3IUS Categories used Male Female Ttie country where the survey is couitduiTetl Ahead Working full-iimc Working pan-time *l<*n>|»i4iilv oui of work Unemployed Retired Housekeeper Ohei Table 1.4 Examples of variables measured at the ordinal level Variable Rating of a rmuuani Rank arming siblinns Ranked Categories Excellent Ver, good Acceptable Poor Very I"** First child Second child eic. High Medium !■.■". The scale used to wriie down an ordinal variable is often referred to as a I.tkert scale. Ii usually has a limited number of ranked categories: anywhere from three to seven categories, sometimes more. For instance. If people arc asked m rate a service as: D Excellent Q Very good O Good O Poor □ Very poor, the proposed answers constitute a five-level Liken scale. 14 INTERPRETING QUANTITATIVE DATA WITH SPSS • Another example of n Liken scale, this (inn- with Ibui levels, i\ provided by the situations whewasliilcmcnl is given, and respondents arc asked to say whether they: Ü Totally agree □ Agree U Disagree G Totally disagree. A variable measured ai the ordinal level could he either qualitative or quantitative. In Tahle 1.4. the variable Income is quantitative, and the variable Kating of a Restaurant in qualitative, bill they arc ImuIi measured al llie ordinal level. Hor a variable measured at the ordinal level, we can say thai one value precedes another, but we cannot give an exact numerical value lor the difference between them. For instance, if we know lhal a respondent is the first child and the other is the second child in the same family, we do noi keep track of the age difference between them. It could Ik* one year1 in ime use and live years in another case, hut ihe values recorded under this variable do not give us this information: they only give us ihc rank. When recording information about categorical variables, the information is usually coded. Coding is die operation by which we determine the categories that will be recorded, and the codes used to refer to them. For instance, if the variable is Sex, and ihe two possible answers arc: Male Female. we usually code this variable as 1 Male 2 Female. The numbers 1 and ~ are the codes, and the categories Male and Female are the values of the variable. When coding a variable, a code must be given to the cases where no answer has been provided by thtf respondent, or when ihe respondent refuses to answer (if the answer is judged too personal or confidential, such as the exact income of a person). We refer to these answers as missing values and we give them different codes. Lab <> explains how to handle Ihcm in SPSS. Finally, some variables are measured hy a numerical stale Fvm observation is measured against the scale and assigned a numerical value, which measures a quantity, These variables are said to Ix* quantitative, fable 1.5 displays examples of numerical scale variables. ruf BASIC LANGUAGE Of STATISTICS table 1.5 Examples of variables measured at the numerical «ale level II Variable Numerical Scale Annual income in dollars, without decimals (n» cents) Age In years, wiih im fraction* Age In year*, with one decimal for fractions of a year Temperature In degrees Celsius Tin« In years. A siaumg pouti mu\i he specified Annual Income In dollar*, m ihe .-umicm thimsiind Notice thai the s;imť variable can be measured hy different scales, as shown in the examples above. So. when we use a numerical scale, we imisi determine the units used Ifor instance years or months!, and the number of decimals used. Numerical scales are sometimes subdivided into interval scales and ratio scuU\s. depending on whether there is an absolute zero to the scale or not. Thus, temperature ami time arc measured hy interval scales, whereas age and number of children are each measured by a ratio scale. However, this distinction will not be relevant for most of what we are doing in this course, and we will simply use the term numerical scale to talk about this level of measurement. The program SPSS thai we are going to use simply uses the term scale to refer to such variables. Most statistical software packages include more specific ways of writing down the observations pertaining to a numerical scale. For instance. SPSS will otter the possibility of specifying thai Ihe variable is a currency, or a dale. Moreover, it is also possible lo group the values of a quantitative variable into Classes. Thus, when observing (he variable OR«, we can write down the exacl age of a person in years, or we can simply write Ihe age group ihc person tails in. as is done in the following example: • 18 lo 30 years • 31 to 40 years • 41 to 50 years • 50 to 60 years • Over 60. When we group a variable such as age into a small number of categories as we have just done, wc musí code the eaiegories as we do tor categorical variables. For example, 1 would stand for ihe category IS to l<> years 2 would stand for ihe category 31 to 40 years eic. 15 IC INKRPRETING QUANTITATIV* DATA WITH SPSS 1 In such situations, we cannoi perform ihe same statistical operations tlmi we do when the values arc noi grouped. For instuncc. ihc mean, Or average of the variable (/i;<- is best calculated when (he ages arc not grouped. When we group ihe values, n is because we »'Hi 10 know (he relative importance (thai is. mc frequency, in percentage) or one group as compared lo the others. The information thai Ml'« <>l the population is under 20 years old in some developing countries is obtained by grouping the ages into 20 years old or less and more than 20 years old. When we collect the data, it is always better to collect it in actual years, since we can easily group it laier on in the data tile with the help of a statistical software package, [n this case, a new column is added to the data lile. and it contains the grouped data of the quantitative variable. For example, in the <;SS93 subset data tile that we use in the SPSS labs, you will find two variables for age: one is called age. and the other one is called agecat4. The latter is calculated from the former, by grouping individuals into lour age groups. In the column of agecaťl, the specific age of an individual is not recorded: only the age group of the individual is recorded. Finally, numerical scales can be either continuous or discrete. A scale is said to be continuous if the observations can theoretically lake any value over a certain range, including fractions of a unit. For instance, age. weight, length arc continuous variables bťcausc ihey are not limited to specific values, ami they can lake any value within B certain range. A variable is said to be discrete if H can take only a limited number of possible values, but noi values in between. Foi instance, the variable Number ofihildren is measured by a discrete scale because il can only be equal to a whole number: 0, I. 2. etc. Importance of the Level of Measurement The level of measurement used for a variable depends on whether it is qualitative or quantitative-Qualitative variables must be measured at the nominal or ordinal level. They cannot l>e measured ai the numerical scale level, even when their categories arc coded with numbers. For instance, as shown above, we usually code the variable Sex as follows: 1 Male 2 Female. In this t'QSCi »"* mimhers I and 2 have no numerical value. They arc simply codes. It is shorter to write I than male, and we could have assigned Ihe numbers differently. If you ask SPSS to compute ihc mean (or average) for a variable coded in this way. you will gel a numerical answer. Kul you must always keep in mind thai such a numerical answer is totally meaningless because the level of measurement ■ THE 8ASIC LANCUAfiE Of STATISTICS 17 'S of lhal variable is nominal. The numbers used to record the information are simply codes. (Juaiililalive variables are usually measured by a numerical scale, hut Ihey could be measured at Ihe ordinal level also, lor iiisiance. if you have the ;.....ual income o ľ an individual, you may treat it as a numerical scale, but you could also group ihe values into Low. Medium and High income and treat the variable at the ordinal level. When you perform a statistical analysis of data, it is very important to pay attention to the level of measurement of each variable. Some statistical computations are appropriate only to a given level of measurement, and should not be performed if the variable is measured at a different level. Concepts, Dimensions, and Indicators We oltcn want to observe social phenomena that are too abstract and complex to )>c expressed by a single variable. Suppose for instance that we want to observe and measure ihc degree of religious inclination (or (he tendency of a person towards religion) in a given social group. Religious inclination can be manifested in many ways: people may have or not have certain beließ about their religion: they may also perform or not certain rituals such as attending religious services, fasting, praying, etc.: ihey may also seek the advice «/ the religions leadership on important decisions, or ignore such leadership: finally, ihey may seek 10 look at everything from the poinl of view of religion, and apply Ihe teat'hing.x of their religion in their daily lives, or ignore them. All these aspects are not found all the time in all individuals. Some individuals may have strong beliefs, while avoiding ihe religious services. Other may attend all services while being skeptical about some of the religious dogma. The way to handle this complexity is to subdivide the concept of religious inclination t mo dimensions, which are themselves measured by several indicators, if we were to study religious inclination in the Catholic religion, we would gel a set or dimensions and indicators that would look as in Table 1.6 (we are simplifying the issues a little, of course). The items listed on the right-hand side of Table I .ň are indicators of the concept of religious inclination. None of them, taken alone, is a measure of religious inclination, but each of them constitutes one aspect of it Indicators that are seen as similar are grouped together to form one dimension of the concept. And finally the various dimensions, taken together, capture the «mcept as a whole. This way of breaking down a complex concept into dimensions and indicators is vailed the »pcr-atioiiali/ation of the concept. As an illustration, we may want lo see how economists ope rationalize ihe concept of cost of living- Ihey estimate the average cost of most of the standard expenses a family of four is expected to incur. The various expenses are divided into main dimensions such as food, housing, transportation, education, and leisure. Lach dimension is then subdivided into smaller dimensions; themselves subdivided further until indicators are reached. For instance food is 31 46 1» IN-ÍBPMTIHC QUANTITATIVE DATA WITH \P\% Table'1.6 Eximpl» of how a concept can bo brokun down into dimension» and indicator« Concept Dimensions Indicators. RELIGIOUS INCLINAflON I IkllstS H Rituals III (iuHlancc IV, Daily life Urlnf inti'xl Btltll m Ihc Holy Tnnily liclkl III (lie mum dogma i-1. AtWtfctMkC ot tei>icti IVrfunninrf player» H.i|iii'HiK i'hililicn •Mi i . ihc jwiesl J»»*i> CoMMilling (he offwul «puuons ol the chunk im ceiiain issue» such as binh coetrol ,i, HťiiiK lnul .mil geiKnui I» people Nul i «'.iinf (iihcis in LXMntncrcul trUUCHom M broken down as: meat, vegetables, milk producta, etc.. themselves subdivided Into specific item* such a» lonialocs.Ieilucc.clc. ľinnlly, ľ« each of diese indicators, ihc increase or oecnaM in ihc COM of living is measured against Ute corresponding cosi in some yew. culled the ruse year. By combining! (hew indicator,, economise .in-able 10 measure how ihc cost of living has changed, on the average, foi I i.imily of four. The way j omu-pl is broken down, or opcralioiializcd. into dimension» and iasScalori depends on the theoretical framework .atonicd for a study Rescauhci. may not agree on how to operalionalize a concept, and you will find in the Iitcraiutc different sluilie» lhal operalionalize concepts in completely ditfcrcni ways, because they rely on dillcieni theoretical fhuneworkl Summary Quantitative nothodl arc procedures and le.htuqtics for collecting, organizing, describing, analyzing, and interpreting »lala In ihisshapler we has e icamed ihc besk vocabulary mad to talk about quantitative ifMdxxh Dan is osgaarzed into elccironk dala tiles wilb the help ol statistical pack.ij.vs A data tile contains ÜK vuluc» taken by a number ol cases (which are ihc uniis ol ihc population under study] over some variables, hveiy row represents a case, while every column represent» a Variable The unil» in ihc il.itj lile usually form j »ample .stu»h i» it-Neil a Mih%et ol ihc whole population. Soiitcliincs. the data file refers 10 «lie whole population. Till SASIC LANGUAGE O' STATISTICS H The variables can be cilher qualitative or quantitative. The system used to record* the Information is called a meaiurotnant scale, There ait- uvea levels of measurement: nominal, ordinal and numerical (interval or ralio) The level ol measurement ol .i ■, .mahle will determine »hat statistical procedures can he pwfanitBJ, and what kind ol graphs must be used to illustrate the dala. When a concept is complex, it is not measured directly. It is usually broken down into dimensions and indicators, which arc then combined to provide a »ingle measure. 'Ihc statistical procedures themselves (all mto two hioud ttegori«»: descriptive »tad»IK» and inferential statistics IVwnpdse statistka) techniques ami at describing the data by summarizing it. while inferential statistical techniques aim ai generalizing m a whole population whm ha» been observed on n sample Keywords Siudeiiis should he able to deli in* and explain oil ihc following lerms Da» ii.n.i ni Case Unit Quantitative methods Vmahk variable label Sample ľ .i .i . i level nl measurement Value Value label Nominal level Ordinal level Variable type Numerical level interval s l_hiantitativc variable QualilMive variable Exhaustive Mutually exclusive categories Ltkeil scale Continuous numerical scales Disctele iiunienc.il scales Codes Coding Codebook Statistici (tne two meanings) Descriptlvo statistics Inferential statistics Dimension» <>i .,.....,.■-•■ Indicaiors of .1 concept OperattotuHmlon of s concept Suggestions for Further Reading BlalOCa li. Huben \l : -• _ i .•.u,pimil,ztiti|.i Saddle River. NJ: IVilllic Mall Rosenbaum, Soni* (1979) Quuimiauve Mtffteds ,„iä Statistku \ (kilét /». Seesai Research l(.....| HiHv SjSe PublkJth-nn. Itudcl. Kotvil and Antonius. R-h.nl > |WI > Methode* auamiiaii.e\ applique,% aux sciences humainrt Montreal: OK". 4249