NORMAL DISTRIBUTION AND NORMAL STANDARDIZED DISTRIBUTION. Week 5 o!!! oMean, median, and mode measure the central tendency of a variable. oMeasures of dispersion include variance, standard deviation, range, and interquartile range (IQR). oWe can draw a histogram, a stem-and-leaf plot, or a box plot to see how a variable is distributed. o Interval/cardinal/continous variables oWe run various statistical tests to check to what extent our data corresponds to a certain model. oTo do it… we need normally distributed variables. oNormal distribution ó bell curve shape (Frederich Gausse 18.-19. century). Normal distribution oIt is typical for a large number of biological or physical phenomena. o o oIt can also characterize some social phenomena. o COMMON ASSUMPTION A RANDOM VARIABLE IS NORMALLY DISTRIBUTED!!! INTERPRETATION AND INFERENCE MAY NOT BE RELIABLE OR VALID Figure 1. Normal Distribution Curve and its basic characteristics (σ) Figure 2. Normal standardized distribution Why is important for statistical analysis? oMajority of values are found around the average and are symmetrically distributed a average = median = mode oIt has one peak only. oWe can calculate the percentage of certain values found within a certain interval around the average. oIt is just a model and instrument of help. It is a mathematical ideal. oIf we find that our variables are very close to be normally distributed, than we are lucky J o o PARAMETRIC DATA ð oNormally distributed data – it is assumed that data are from a normally distributed population. oHomogeneity of variance – the variance should not change systematically throughout the data. oInterval data – it should be measured at least at the interval level. oIndependence – data from different subjects are independent. How to tell if a distribution is normal? oSTEP 1 - Run a histogram with a normal curve oand see if your variable is normally distributed. oANALYZE o DESCRIPTIVE STATISTICS n FREQUENCIES (please do not display frequency tables) o CHARTS o HISTOGRAMS (with normal curve) Example F dataset EVS, variable age OR use P-P plots oAnalyze-Descriptives-P-P-P plots o oSTEP 2 - We have to examine the skewness and kurtosis statistics for the distribution. A normal distribution is symmetrical. o o1. If a distribution meets the criteria of zero kurtosis and zero skewness it will have a normal distribution. o2. If skewness higher than 1, than it is not normally distributed. Figure 3. Probability distribution with different Kurtosis Table 1 shows the relevant statistics for variable age !!! oIf we have N>>200 Þ we get statistically significant values even when we have low deviation from normality o oCriteria for asymmetry not to be used when we have large samples (e.g. Field 2009, p.139) o oSTEP 3 - we use Kolmogorov-Smirnov Z test o oIf the Kolmogorov-Smirnov Z test indicates a osignificance level of less than 0.05 it means that othe distribution is probably not normal. o oANALYZE o Descriptive statistics n Explore o Plots o Normality plots with tests Table shows the results of the test The Kolmogorov-Smirnov Z test indicates that this distribution is not normal. But… remember… oNo criteria should be applied in case we have large samples (N>200). o oWhen we work with large samples, statistical significant values are obtained even for very small deviation from normality!!! Andy Field 2009, st. 139 If N<50 than… o o oUse Shaphiro-Wilk test Graphical options: Normal Q-Q Plots a Detrended Normal Q-Q Plots Explore – Plots – Normality plots with test What to do when variables are not normally distributed? 1)Use non-parametric statistics – to be discussed later 2) 2)Transform variables – by use of mathematical fucntions - e.g. log function 3) 3)Decide to ignore it when working with big enough sample sizes – at least 100/200 cases STANDARDIZED NORMAL DISTRIBUTION AND Z-SCORES – HOW TO CALCULATE AND USE THEM Why z-scores are important? oHow do we compare bananas and oranges? o oAre you as good a student of French as you are in Sociology? o oHow many people did better or worse than you on a test? oWhen you analyze data Æ to compare scores within a sample or across variables. o oYou may be asked: o oWhat percentage of people falls below a given score? oWhat is the relative standing of a score in one distribution versus another? oWhat score or scores can be used to define an extreme or deviant situation? Example oTest results SOC758 – Student 1 = 66 points, but we do not know what does mean… oIf we know the mean, than we can say whether student 1 result is better or worse than average… oIf we also know the results for another student, than we can calculate the position of these two students related to the total distribution of the results. oFor this… we need Z-scores!!!! oTo calculate… we need also SD. oValue Z-score tells us how many SD above or bellow the average is a certain case. Example… oStudent A = 66 points oStudent B = 81 points oMean = 70 points, SD = 5 o o oStudent A = (66-70)/5 = -0.8 oStudent B = (81-70)/5 = 2.2 Analyze-Descriptive – Save standardized values as variables Why do we need z-scores? oAttributes are often measured using items with difference upper and lower limits. oThe measures have a different number of categories. o o oIt is difficult to compare across these variables!!! oWhen creating multi-item scales, items that have different lower and upper points will contribute differently to the final score!!! How to solve these problems? oConvert each scale to have the same lower and upper levels o oOR o oStandardize the variables and express scores as standard deviation units: z-scores o o 1. Convert each scale to have the same lower and upper levels oFormula: oY = [(X-Xmin )/Xrange]*n o oY – new adjusted variable oX – old variable to be adjusted oXmin – the minimum observed value on the original variable oXrange – the difference between the maximum and minimum observed on the original variable on – the upper limit of the adjusted variable o Example: political implication/orientation o4 variables: -V186 – measured on 4-point -V193 – measured on 10-point scale -V222 – measured on 4-point -V224 – measured on 10-point oWe want to convert them to a scale of 1-10. oIt will help us to compare scores and averages across them!!! oIt gives each person’s score in terms of the number of standard deviations it lies from the mean! oA z-score reflects how many standard deviations above or below the population mean a score is. oA normal distribution that is standardized is called the standard normal distribution or the normal distribution of z-scores. oIt has a mean of 0 and a SD of 1. 2. Standardize the variables and express scores as standard deviation units: z-scores How to calculate Z-scores? oHere are the formulas for z-scores, z-skewness oand z-kurtosis: o o oZskewness = (S-0) / SEskewness oZkurtosis = √(K-0)/SEkurtosis oSx= standard deviation, oSEskewness = standard deviation for Skewness oSEkurtosis = standard deviation for Kurtosis Things to know about the Z-Score: oThe Z-score can be positive or negative. o oPositive is above the mean. o oNegative is below the mean. o oThe mean of the Z-scores is always zero. o oThe SD of the Z distribution = 1. Does it matter if my dependent variable is normally distributed? o oYES o o o When running a t-test or ANOVA, the assumption is that the distribution of the sample means are normally distributed.