THE BASIC LANGUAGE OF STATISTICS Tins chapter Is an introduction to statistics .uni to quantitative methods, ii explains ílu- lusit liiiiľiiu^c Livi-,l m štatistu s. i lie notion of a tlala Me. 'lie lIisIuk lion between dcacnplivc and infcrcnii.il statist» v and die hasit tonecpls "I tlutiMu ■» .nut quantitative methods, Aftei itudying this chapter, Ihn ttudeni should know: • die btsk vocabulary <>i slitlsUci and of quantitativ« methods; • wh.it an electronic dan Rk looks like, and how to Identify cues and variables: • the different uses of |he icmi 'statistics': • (he basic definition of descriptive and Infcrcniial statists ■; • ihc type of variables and of nwattinnwiil scales; • how concepts are OpcraUonallZcd with the help of indicators Introduction: Social Sciences and Quantitative Methods Social iciencci arm .11 nudying -m.iI and human phenomena iv rigorous!) .is possible fhis involve dem rtbini tome aspect of the social realUy. analyzing n 10 Mr whclltcr logical links« .m lie established between its viiiiotis pmls. and. whcncvei possible, predicting future outcome*, I he jicncial objective ol HCfa studies is to understand (he ptltCTM ol individual or Collective hehavior. the constraints thai affect it. the causes and explanations thai can help ns understand our societies and ounelves better ud predict the consequences of OOttain situations. Such studies are never entirely Objective, us ihey are inevitably based on cettaifl assumptions and belief s ibat cannot he demonstrated Our percep-lioflSOl social phenomena arc themselves subjecdvedUa U;v .Mem. as the) depend on ihc mnmings -*c Htribute to what «re observe frnaa, we ttittrpm lociaJ and human phenomena much more (han we describe them, BUI WC iiv to make ihat inter- piei.iiiou us objective .is possible uf the phenomena «....... ubeowaw^M which means that we es* lianslaic into numbers some tspc.fsol OUT observations I •* IMUM ( WC OSI quaiHtf) 9273 69 2 INTERPRETING QUANTITATIVE DATA WITH SPSS population Change: wc can couni how many babies are bom every year in a given country, how many people die. and how many people migrate in or out of (he country. Such figures allow us to estimate the present size of the population, and maybe even to predict how this size is going to change in the near future. We can quantify psychological phenomena such as the degree of stress or the rapidity of response to a stimulus; demographic phenomena such as population sizes or sex ratios (the ratio of men to women): geographic phenomena such as the average amount of rain over a year or over a month; economic phenomena such as the unemployment rate; we can also quantify social phenomena such as the changing patterns of marriage or of unions, and so on. When a social or human phenomenon is quantified in an appropriate way, we can ground our analysis of il tin figures, or stafislics. This allows us 10 describe the phenomenon with some accuracy, to establish whether there are links lietween some ol the variables, and even [0 predict the evolution of the phenomenon. If the observations have been conducted on a sample (thai is. a group of people smaller than the whole population), wc may even be able 10 generalize to the whole population what we have found on a sample. When wc observe a social or human phenomenon in a systematic, scientific way. the information we gather about it is referred to as d,iui. In other words, data is information that is collected in a systematic way. and organized and recorded in such a way that it can be interpreted correctly. Data is not collected haphazardly, but in response to some questions that the researchers would like to answer. Sometimes, we collect information (that is, data) about a character or a quality. Such as the mother tongue of a person. Sometimes, the dala is something measurable with numbers, such as a person's age. In both cases, we can treat this data numerically: for instance WC can COuni how many people speak a certain language, or we can lind lbe average ajjc of a group ol people. The procedures and techniques used io analyze dala numerically are called quantitative methods. In other words, quantitative methods arc procedures and techniques used to analyze data numerically; they include a study or the valid methods used foi collecting data in the first place, as well as a discussion of ibe hunts of validity of any given procedure (that is. an understanding of the situations when a given procedure yields valid results), and of the ways the results are to be interpreted. This book constitutes an introduction to quantitative methods for the social sciences. The first chapter covers the basic vocabulary' of quantitative methods. This vocabulary should be mastered by the student if the remainder of the book is to be understood properly. Data Files The first object of analysis in quantitative methods is a data file, lliat is. a set of pieces of information written down in a codified way. Figure I. I illustrates what an electronic data file looks like when we open it with the SPSS program. r m e hasíc language or STATISTICS a 7m uiolďl SJ: J nm» Jal *i iOd JSiSlĽJ' 2Ľ*1 .m,\"\ ig—»f. ľ. HI" ■i.i:-. 12 a o á . ,V----------w/ figure 1.1 Th» Data window In SPSS version 10.1. C SPSS. Reprinted with permission. This data file was created by the statistical software package SPSS Version IO. I. which will be used in this course. The first lab in the second part of this manual will introduce you to SPSS, which stands for Statistical Package for the Social Sciences. On the top of the window, you can read the name of the data file: GSS93 subset. This stands for Subset of the General Social Survey, a survey conducted in the USA in I993. When we open an SPSS data file, two views can be displayed: the Data View or the Variable View. Both views are part of the same file, and one can switch from one view to the other by clicking on the lab at Ihe bonom left of the window. The Data View Hie information in this data view is organized in rows and columns. Each row refers to a case, that is. all the information pertaining to one individual. Each column refers to a variable, that is, a character or quality that was measured in ibis survey. For instance, the second column is a variable called »rkslat. and the third is a variable called marital. But what arc the meanings of all these numbers and words? A data file must be accompanied by information thai allows a reader to interpret (that is. understand! the meanings of the various elements in it. This information constitutes the codebook. In SPSS, we can lind the information of the codebook by clicking the word Variables... under tlie Utilities menu. Wc get a window listing all die variables contained in this data file. Bv clicking once on a variable, we see the inlbrnfotion pertaining to this variable: the short name that stands on ihe top of the column; what the name stands for (the label ol the variable); the numerical lype of the variable filial is. how many digits are used, and whether it includes decimals); other technical information to be explained later; and the Value Labels, that is. what each number appearing in the data sheet stands for, 04 21 INTERPRETING QUANTITATIVE DATA WITH SPSS ^^^^9 v»mMBHw™««ic_ rim La|*l; Unii Mm *.kc UttH' Í— d»Tt I dws'iri -i -»t.-[-■ &jý_j- O." figure 1.2 The Variables window in SPSS. The codes and value labels of the variable Marital Status are shown ■mawwMgamr .„■>- :--. •—lil ;j|Hl*l ^1 ;.-j y,>l *l filrrl Blg-H^lP^;. -n mi | igt-*) [ sfa -£: ■■ '..■-.•, umí. >■ ■-.-■ ■■ ..,■■ mp IMM ■-""' -I'll"! " I'l'i'J WiiV-ngfiltn» í ■--..!.. :i; Ml". -,.:•!.d "•r ■'.ri u; ',■... -l-^' ■ -" ľ - " *«— j\"~» glJ**./ ----i-ii-J ^ Figure 1.3 The Data View window in SPSS when the Show Labels command is ticked in the View menu. The value labels are displayed rath t; r than the codes Figure 1.2 shows rhe codes used for ihe variable Marital Statun. You may have noticed that: 1 stands for married 2 stands for widowed 3 stands for divorced etc. ••■• The numbers 1. 2. 3, etc. are die codes, and die lentis married, widowed, divorced, tftc. are the value labels that correspond to the various codes. The name martini* which appears at the top of the column, is the variable name. Mariini Stains is the variable label: it is a usually longer, detailed name for 'he variable. When we prim tables or graphs, it is ihe variable labels and the value labels thai are printed. figure 1.4 The Variable View window in SPSS. The variables are listed in the rows, and their properties are displayed Figure 1.5 The Value Labels window in SPSS. In this window it is possible to add new cotles and their corresponding value labels, or to modify or delete existing ones There is a way of showing ihe value labels instead of the codes. This is done by clicking Value Labels under the View menu. The Data View window looks now as shown in Figure 1.3. We can sec ihai case number 4. for example, is a person who works pan lime, and who has never been married. To understand the precise meaning of the numbers written in the other cells, we should first read ;he variable information found in the codebook for each of the variables. in version 10.(1 and version 11 of SPSS, yon can read the information pertaining to the variables in the Variable View. By clicking on ihe tab for Variable View, you get the window shown in Figure 1.4. / In the Variable View, no data is shown. You can see. however, all ihe information pertaining to the variables ihcmselves. each variable being teprcscnied by a line. The various variable names are listed in the firsi column, and each is followed by information about the corresponding variable: the way it is measured and recorded, its full name, ihe values and iheir codes, eic. All these terms will be explained in detail later on. The label, ihat is. the long name of ihe variable marital, is Marital Status. By clicking on ihe Values cell for the variable marital, the window shown in Figure 1.5 pops up. * INTERPRETING QUANTITATIVE DATA WITH SPSS We am sec again ihc meanings of ilic codes used lo designate the various marital statuses. We can now raise n number of questions: i low «lid we come up with this data? What ;tre llic rules lor obtaining reliable dala lhal can Iv interpreted easily'' How can we analyze diis dala?Tahlc I. I includes ;■ systematic lisl of such questions. The answers io diese questions will he found in the various chapters and sections of iliis manual. Table II Some questions thai arise when we want to use quantitative methods Questions Chapters How did wc come up with this daia? What are the question« wc arc trying 10 answer? What Is ihc place oí quantitative analysis in social research, and nit* dues ii link up with die qualitaiive questions wc may wanl to *sk' Whai is the scientific way of denning concepts and opei jtiuoaltzing them'.* How do we conduct social rceaich in a scientific way? Wtiai procedures should wc follow lo ensure that result- are scientific? Wliot arc die basic lypes oí research design»? HOW do we go ahoul collecting Hie dat.»' < ki« collected, tl«: iiata mini ix- «sulzcd ami described. Hnw du wc ilu that? When we summarize lite data what are Ihc chnruclcnsiics iii.t wc liieus on? What kind of inhumation is lost? Whal arc llie must lounnun types of shaivs and d i sir ihm ions we encounter? Whal arc the procedure* lor Selecting a sample? Arc some of them heiler than o»herV' .Some institutions collect and publish a let of social data. Where can we find il? How do v,c use it? Sometimes wc notice coincidences in the data- lor instance. ihosc who have a higher income tend to behave diifeienily on some social variables than those who do not, K there a way of describing such relationships between variables, and drawing their significance? Sometimes tile ilata come* fnjm a sample, that is. j pan of the population, and not the whole populaiii'iv (Sin *«■ generalize our conclusions to the whole population on the bub »1 d»e dala Collected on a sample" How can [his l-c done* Is it preclM? What an the risks that nur conelusi»its are wrung? I. The Basic Lancuai! .Statistics Mi- lne Reseat i h Prose >s > Univariate Descriptive Statistics 5. Normal l)isinhuii"ii\ 0. Sampling Deigns 7 Statistical Database* X. Statistical Association 9. Statistical Inlcrence: list 11 nan i m III Statistical Inference; Hypothesis Testing The Discipline of Statistics The term statistics is used in two different meanings: it can refer io the discipline of statistics, or it can refer to ihe actual dula lhal has been collecled. - THE BASIC LANGUAGE OF STATISTICS 7 As a scientific discipline, the object of Statistics is the numerical treatment of data lhal pertain lo a large quantity of individuals or a large quantity of objects. Il includes a general, theoretical aspect which is very nialheiiiaiieal. but it can also include the Study Of ihc concrete problems (hal are raised when we apply the theoretical methods to specific disciplines. The term quantitative methods is used to refer to methods and techniques of statistics which arc applied to concrete problems. Thus, ihe difference between statistics and quantitative methods is that the latter include practical concerns such as finding solutions to the problems arising from thq collection of real data, and intcrprciiug (he numerical results as they relate to concrete situations. For instance, proviny that ihe mean (or average) of a sei »f values has certain mathematical properties is part of statistics. Deciding ihai the mean is an appropriate measure to use in a given situation is part of quantitative methods. But the line between statistics and quantitative methods is fuzzy, and the two terms arc sometimes used interchangeably. In practice, the term statistics is often used to mean quantitative methods, and wc will use it in lhat way too. The lerm statistics has also a different meaning, and il is used to refer 10 the aclual data lhal has been obtained by statistical methods, Thus, we will say for instance that ihe latest statistics published by the Ministry of Labor indicate a decrease in unemployment. In ihai last sentence, ihc word statistics was used to refer to data published by ihe Ministry. Populations, Samples, and Units Three basic lentis must be defined to explain the subject matter of the discipline of statistics: • unit for element, or case). • population, and • sample. A unit (sometimes called element, or case) is ihe smallest object of study. If we arc conduciing a sludy on individuals, a unil is an individual. Hour study were about (he health system (we may want lo know, for instance, whether certain hospitals are more efficient than others), a unil for such a study w^uld lve a hospital, noi a person. \ population is the collection of all units thai we wish to consider. If our study is about (he hospitals in Quebec, the population will consist or all hospitals in Quebec. Sometimes, ihc term universe is used to relet to the sei of all individuals under consideration, but we will nol use il in ibis manual. Most of the time, wc cannol afford to study each and every unit in a population, due to the impossibility of doing so or to considerations of time and cost. In ibis case, wc study a smaller group of units, called a sample. Thus, a sample is any subset (or subgroup) of our population. 5 30 B-A 3631 30 a INTERPňCTING QUANTITATIVE OATA WITH SPSS OflWtptlvo Mfltlitk . H aim« at l Ihn dala. Somo ol Iho in forma I ton is lost is a rwulL A good suti maty captures Ihe issonlial aspocls o* tho data and (ha most q levo m ones. ms at Ľiusions about) i population whfi ieal rharociof tamptu is given. The inlerence orways implies a margin ol Ofrot and a probability ol orror. Inlorfinces liaiiüi on representative samples have a (iigMííf chance of bel rig corr&cl. A raMtlom sample is more likely to bo roprosunialivu. MEASURES OF CENTRAL TENDENCY "Tioy answer Win question: What ace t«c values that represent Ihe bulk of the the Oast way' «Milan, mode Ll.lt.l Man MEASURES OF DISPERSION They answar tne question: How ipr out is tho data? Is it mostly mnCtniraltd around Ihn canter, 0' s proud oul ovor a largo rimon' Sin ill In« I deviation, variandi ranou MEASURES OF POSITION Thny answer llvn question: How is linn individual entry positioned with 'aspect to all Ihn others' Percentiles, deciles, quartiles MEASURES OF ASSOCIATION Thny answer the question: II we know tnn scorn «I an individual on one variable, to what extent can we successfully predict how he is hkely to score on the other variable? Correlation cocHtaiont |r) ESTIMATION n is based on the distinction between sample and population. H consists in fluessing the value of a measure on a population (i.e. a p only the value on ihr ;the stntlstl«}. Opinit l>iisi»l on »slítnut km renulla «r« g* ti on with a mi , I lv 1.1 «riot ametorl when sample is known , poll* an> always Ihosuivöy is ii-iiiiitiv« sample. iMdliŕOil to the mtnil.....ľ ''ii'l HVPOTIIESIS TESTING It i* also based on tho distinct ic"> between sample and population, but Iho process »9 levotsorl: Wo mfl>n a hypothesis about o population parameter On thai basis, we predict a range ol values a vnHablo is iik*h/ to taxe on a representative sample. Then we go and measure the sample. If the observed value falls within Ihn predicted range, wo conclude that the hypothesis is reasonable. If the observed value 'alls outsldo the predicted rang», we reject our hypothesis. Figure 1.6 The discipline ol statistics and its two branches, descriptive statistics and inferential statistics The distinction between sample and population is absolutely fundamental, whenever you are doing a computation, or making uny statement, n must he clear in your mnul whether you are talking about a sample (a group of units generally smaller than du* population» or about Hu- whole |M>pulalion. The discipline of statistics includes two nmin brunches: • descriptive statistics, and • inferential statistics. .....--------- THE BASIC LANGUAGE OF STATISTICS The following paragraphs explain what each branch is about. Refer also to Figure 1.6. Some ol ihe lernis used in (he diagram may not lie clear lor now, lint (hey will be explained as we progress Descriptive Statistics The methods and techniques of descriptive statistics aim at summarizing large quantities of data by a few numbers, in a way thai highlights the niosi important numerical leatures of the dala. For instance, if you say that your average GPA (grade point average) in secondary schooling is 3.62. you are giving only one number that gives a pretty good idea of your performance during all your secondary schooling. If you also say dial ihe standard deviation (this term will be explained later on) of your grades is 0.02. you arc saying that your marks are very consistent across die various courses. A standard deviation of 0.1 would indicate a variability that is í times bigger, as wc will learn later on. You do not need to give the detailed list of your nurks in every exam of every course: the average GPA is a sufficient measure in many circumstances. However, the average can sometimes be misleading. When is (he average misleading? Can wc complement u by oilier measures ihui would help us have a belter idea of (he features of (he data we are summarizing? Such questions ore part of descriptive statistics. Descriptive .statistics include measures of central tendency, measures of disperston, measures of position, and measures of association. They also include a description of the general shape of the distribution of the data. These terms will be explained in ihe corresponding chapters. inferential Statistics Inferential statistics aim at generalizing a measure taken on a small number of cases that have been observed, to a larger set of cases that have not been observed. Using the terms explained above, we could reformulate this aim. and say that inferential statistics aim at generalizing observations made on a sample to a whole population. For instance, when pre-election polls are condueied. only one or two thousand individuals ore questioned, and on the basis of their answers, ihe polling agency draws conclusions about the voting intentions of the wholapopulation. Such conclusions ore not very precise, und there is always a risk that they are completely wrong. More importantly, the sample used io draw stuh conclusions musí be u representative sample, that is. a sample in which all ihe relcvani quuliiies of die population are adequately represented. How can we ensure thai U sample is representative? Well, we can't. We can only increase our chances ol selecting u representative sample if we select it randomly. We will devote a chapter «> sampling methods. Inferential statistics include estimation and hypothesis testing, two techniques that will be studied in Chapters 9 and 10.