Descriptives, Crosstabs, Correlation Methodology of Conflict and Democracy Studies December 2 Aim of this lecture •How to obtain basic information about your data • •Control of the assumptions • •Association of two variables: •Crosstabs (Contingency tables) •Correlation • Descriptive Statistics •Basic measures to summarize the characteristics of your data • •Various types: •Central tendencies – mean, median, modus •Dispersion – standard deviation, variance, minimum, maximum • •Not all descriptives are suitable for all types of variables • •We use them to describe and explore your data • • Výsledek obrázku pro same mean different standard deviation How to Obtain Descriptives in SPSS •Analyze > Descriptive Statistics > Frequencies • •Move variables of interest to the right • •In ‘Statistics’ choose all measures you require • • • • Assumptions of Data •Not all data are suitable for all statistical tests • •Parametric and Non-parametric tests • •Parametric tests as a preference v. higher requests on data Parametric Data 1.Scale data (at least interval) 2. 2.Independence 3. 3.Normally distributed data 4. 4.Homogeneity of variance Independence https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Asch_experiment.svg/600px-Asch_experiment .svg.png Parametric Data 1.Scale data (at least interval) 2. 2.Independence 3. 3.Normally distributed data 4. 4.Homogeneity of variance Normal Distribution http://curvebank.calstatela.edu/gaussdist/normal.jpg Skewness Kurtosis How to Check the Distribution •Visual control – Histogram • •Calculation of skewness and kurtosis • •Statistical tests: •Kolmogorov-Smirnov •Shapiro-Wilk Histogram •Analyze > Descriptive Statistics > Frequencies • •In ‘Charts’ choose ‘Histogram’ • •Select ‘Show normal curve on histogram’ to draw a line corresponding to normal distribution Skewness and Kurtosis •Analyze > Descriptive Statistics > Frequencies •In ‘Statistics’ choose these two options • •The values are only informative – you have to divide them by their standard error • •Acceptable values: •Small sample – between -1.96 and 1.96 •Medium sample – between -2.58 and 2.58 •Large samples – do not use it • •Skewness: • •-0.020 / 0.045 = -0.44 • • • Kurtosis: 0.279 / 0.097 = 3.07 Statistical Tests •Kolmogorov-Smirnov (Shapiro-Wilk) •Both test the null hypothesis that your data are normally distributed • •Results: •Significant (p <= 0.05) – we reject the null hypothesis •Not significant (p > 0.05) – we keep the null hypothesis • •With large samples the tests tend to lead to significant results without meaningful reason Statistical Tests •Analyze > Descriptive Statistics > Explore • •Place variable of your interest into ‘Dependent List’ • •In ‘Plots’ select ‘Normality plots with tests’ Parametric Data 1.Scale data (at least interval) 2. 2.Independence 3. 3.Normally distributed data 4. 4.Homogeneity of variance Homogeneity of Variance •Assumption that the variances in various levels of data are equal • •The levels are defined by other (categorical) variable • •We use only a single test for this assumption • •Levene test Homogeneity of Variances Levene Test •Tests the null hypothesis that variances are equal • •Results: •Significant (p <= 0.05) – we reject the null hypothesis •Not significant (p > 0.05) – we keep the null hypothesis • •With large samples the tests tend to lead to significant results without meaningful reason • • Levene Test •Analyze > Descriptive Statistics > Explore • •Place variable of your interest into ‘Dependent List’ •Place second variable that defines the levels of data into ‘Factor list’ • •In ‘Plots’ select ‘Spread vs Level with Levene Test’ and ‘Untransformed’ Association of Two Variables •Depends on types of variables • •Crosstabs: •Suitable for two categorical variables •Low amount of categories in your variables (but at least two per variable) • •Correlation: •Two scale variables, scale and ordinal, two ordinal variables •Specific case – scale and binary variable Crosstabs •Contingency tables • •Describe interaction of two categorical variables • •Age groups of people v. turnout in election (yes/no) • •Allows generalization to population • • Crosstabs •Analyze > Descriptive statistics > Crosstabs • •Select variables for Columns and Rows • •Features: •Cells – counts, percentages, residuals •Statistics – Chi-square, Cramer’s V • •Try not to fill your crosstab with too many features Counts: Observed Counts: Observed Percentages: Row Counts: Observed Percentages: Column Counts: Observed + Expected Counts: Observed Percentages: Row •Younger people do not vote to the same extent than older people • •But can we apply this to the whole population? Chi-square, Cramer’s V •There is a relationship between age and turnout, and it applies to the population • •But is it okay to end the analysis at this point? Can we find out more? Counts: Observed + Expected Residuals: Unstandardized Counts: Observed + Expected Residuals: Adjusted standardized Counts: Observed + Expected Residuals: Adjusted standardized Chi-square, Cramer’s V Why Not Make It Too Complicated? Correlation •Association between two variables (for other cases than crosstabs) • •Examples: two scale variables, scale and ordinal, two ordinal variables • •Three coefficients: •Pearson •Spearman •Kendall • • Correlation •Results vary on a scale between -1 and 1 • •Interpretation: •Zero means no association between the variables •Rising distance from zero show rising association (regardless the direction – negative or positive) •-1: perfect negative association •1: perfect positive association • •Beware of false absence of association •Always good to visualize data before calculating correlations • Pearson’s Correlation Coefficient •Parametric operation • •Requirements: •Scale data (exemption – scale and binary) •If we aim to apply the findings to population we need normally distributed data (or a large sample) • •Sensitive to outliers • Pearson’s Correlation Coefficient •Visualize the data •Graphs > Chart Builder •Select Scatter/Dot a variables of your interest • •Correlation •Analyze > Correlate > Bivariate •Select variables and the proper coefficient (PCC is set by default) •For significance select ‘Flag significant correlations’ Výsledek obrázku pro correlation Pearson’s Correlation Coefficient •Scale variable and binary variable • •Works the same as for two scale variables • •Beware of coding of the binary variable (you provide codes for each value) Non-Parametric Correlation •Spearman’s Rho and Kendall’s Tau •Correlation for other cases than two scale variables (or scale and binary) •Same interpretation as in Pearson’s CC •Preference of Kendall’s Tau if variables contain less categories and for smaller samples • •Analyze > Correlate > Bivariate •Select variables and Spearman/Kendall •For significance select ‘Flag significant correlations’ • Interpretation •Correlation does not imply causality •No control of other variables •No independent and dependent variable • •You cannot tell that one variable affects the other even in cases when such relationship seems to be meaningful and logical • •Keep the interpretation of effects of IVs on DV for the regression analysis