Computing Descriptive Statistics 67 Standard Scores Tlie mean often serves as a convenient reference point to which individual observations are compared. Whenever yon receive an examination back, the first question you ask is. How docs my performance compare with the rest of the class? An initially dismal-looking score of 65% may tum stellar if that's the highest grade. Similarly, a usually respectable score of 80 loses its appeal if it places you in the bottom quarter of the class. If the instructor just tells you the mean score for the class, you can only tell if your score is less than, equal to, or greater than the mean. You can't say how far it is from the average unless you also know the standard deviation. For example, if the average score is 70 and the standard deviation is 5, a score or 80 is quite a bit better than the rest. It is two standard deviations above the mean. If the standard deviation is 15, the same score is not very remarkable. It is less than one standard deviation above the mean. You can determine the position or a case in the distribution of observed values by calculating what's known as a standard score, or z score. To calculate the standard score, first find the difference between the case's value and the mean and then divide this difference by the standard deviation. standard score = „value-mean Equation 4.5 standard deviation A standard score tells you hov/ many standard deviation units a case is above or below the mean. If a case's standard score is 0, the value for that case is equal to the mean. If the standard score is 1, the value for the case is one standard deviation above the mean. If the standard score is -1, the value for the case is one standard deviation below the mean. (For many types of distributions, including the normal distribution discussed in Chapter 10, most of the observed values fall within plus or minus two standard deviations of the mean.) The mean of the standard scores for a variable is always 0, and their standard deviation is 1. You can use die Descriptives procedure in SPSS to obtain standard scores for your cases and to save them as a new variable. Figure 4.3 shows the notes from the Descriptives procedure that indicate that a new variable, the standard score for age, has been created. In addition, a new vari- 68 Chapter 4 able, zaga, has been saved in the Data F-ditor, containing the standard scores for age (see Figure 4.4). Figure 4.3 Descriptive Statistics in the Viewer HDtM Output Created Vunablos Created of Modllod ^\ 7ACE 22 Apr fid IOilO:Sfi Anowvoiiable. zage, nss been Figure 4.4 Data Editor with standard scores saved as a new variable You see that the first case has an age of 43. From the standard score, you know that the case has an age less than average, but not very much. The age for the case is less than a quarter of a standard deviation below the mean. The fifth case has an observed age of 78, which is almost two standard deviations above the mean. Standard scores allow you to compare relative values of several different variables for a case. For example, if a person has a standard score of 2 for income, and a standard score of -1 for education, you know that the person has a larger income than most and somewhat fewer years or education. You couldn't meaningfully compare the original values, since the variables all have different units of measurement, different means, and different standard deviations. x 9 U 1 c l_ O. L." O £ o "O o. r» í 1 i 1 :■ •tfí tí O 2 The Normal Distributio 1 : What is the normal distribution, analysis f • What docs n normal listribiLt^mma||hta • What is a standard normal distribution? • What is the Central Limit Theorem, and why is it important? • In Chapter 9, you learned how to evaluate a claim about the mean of a variable that has two possible values. Using rhe binomial rest, you calculated the probabilities of getting various sample results when the probability of a success was assumed to be known. In this chapter, you'll learn how to test claims about the mean of a variable that has more than two values. You'll also learn about the normal distribution and the important role it plays in statistics. ► Ihis chapter examines data on scrum cholesterol levels from (he olectfk.sav data file. In addition, some figures use simulated data sets included in the file simul.sav. The histograms and output shown can be obtained using the SPSS Graphs menu (sec Appendix AJ and the Dcscriptivcs procedure (see Chapter 4). The Normal Distribution You may have noticed that the shapes or the two stcm-and-leaf plots in Chapter 9 are similar. They look like bells (on their sides). The same data arc displayed as histograms in Figure 10.1 and Figure 10.2, where a bell-shaped distribution with the same mean and variance as the data is superimposed. You can sec that most of the values are bunched in the center. The farther you move from the center, in cither direction, the fewer the number of observations. The distributions arc also more or less symmetric. That is, if you divide the distribution into two pieces at the peak, the two halves of the distribution arc very similar in shape, but mirror images of each other. (The theoretical bell distribution is perfectly symmetric.) 177 178 Chapter 10 Figur« 10.1 Simulated experiments: sample slje 10 You can obtain 140 histograms using the Gfíi/ííw men». (.■II as desenbod in Aoooixtix A 100 In íha Histograms T »1 */ Sí Si W C 'Z ľ Sample Mean Many variables—such as blood pressure, weight, and scores on standardized tests—turn out to have distributions that arc bell-shaped. For example, look at Figure 10.3, which is a histogram of cholesterol levels for a sample of 239 men enrolled in the Western Electric study (Paul ct al., 1963). Note that the shape of the distribution is very similar to that in Figure 10.2. That's a pretty remarkable coincidence, since Figure 10.2 is a plot of many sample means from a distribution that has only two values (l=cured, 0=not cured), while Figure 10.3 is a plot of actual cholesterol values. The Normal Distribution 179 Figur« 10.3 Histogram of cholesterol values To obtain this Histogram. pf)sn the eiectricsůvdgta Me and select tfiotS8 in the H'tstOQiams (í<ňtog to». •J* O* - u M ■■■»i- .— Sown» OiCesletol (nttfdi) The bell distribution that is superimposed on Figure 10.1, Figure 10.2, and Figure 10.3 is called the normal distribution. A mathematical equation specifics exactly the distribution of values fot a variable thai has a normal distribution. Consider Figure 10.4, which is a picture of a normal distribuiion that has a mean of 100 and a standard deviation of 15. The ccnicr of the distribution is at the mean. The mean of a normal distribution has the same value as the most frequently occurring value (the mode), and as the median, the value that splits the distribution into two equal parts. Figure 10.4 A normal distribution 68% oi cases loi withinone standard deviation o I the 95% o) oil eases (all within two-standard deviations of the mean 100 115 If a variable has exactly a norma! distribution, you can calculate the percentage of cases falling within any interval. All you have to know arc the mean and the Standard deviation. Suppose that scores on 1Q tests arc normally distributed, with a mean of 100 and a standard deviation of IS, as was once thought to be true. In a normal distribution, 68% of all values fall within one standard deviation of the mean, so you would expect 63% of the population to have 1Q scores between 85 (one standard deviation below the mean) and 115 (one standard deviation above the mean}. Similarly, 95 % of the values in a normal distribution fall within two standard deviations of the mean, so you would expect 95% of the population to have IQ scores between 70 and 130. Since a normal distribution can have any mean and standard deviation, the location of a case within the distribution is usually given by the number of standard deviations it is above or below the mean, (Kecali from Chapter 4 that this is called a standard score, or z score.) A normal distribution in which all values arc given as standard scores is called a standard normal distribuiion. A standard normal distribution has a mean of 0, and a standard deviation of 1. For example, a person with an IQ of 100 would have a standard score of 0, since 100 is the mean of the distribution. Similarly a person widi an IQ of 115 would have a standard score of +1, since die score is one standard deviation (15 points) above the mean, while a person with an IQ of 70 would have a standard score of -2 , since the score is two standard deviation units (30 points) below die mean. Figure 10.5 The standard normal distribution 47,5% ^^ 31% 13.5% 2.5% / ^^^. Some of the areas in a standard normal distribution arc shown in Figure 10.5. Since the distribution is symmetric, half of the values are greater than 0, and half are less. Also, the area to the right of any given positive score is the same as the area to the left of the same negative score. For example, 16% of cases have standardized scores greater than t-I, and 16% of cases have standardized scores less than -I. Appendix D gives areas of The Normal Distribution 181 tlic normal distribution for various standard scores. The exercises show you how to use SPSS to calculate areas in a normal distribution. H If you're more than two standard deviations from the me