Descriptives, Crosstabs, Correlation Methodology of Conflict and Democracy Studies November 28 Aim of this lecture •How to obtain basic information about your data • •Control of the assumptions • •Association of two variables: •Crosstabs (Contingency tables) •Correlation • Descriptive Statistics •Basic measures to summarize the characteristics of your data • •Various types: •Central tendencies – mean, median •Dispersion – variance, minimum, maximum • •Not all descriptives are suitable for all types of variables • •We use them to describe and explore your data • • Výsledek obrázku pro same mean different standard deviation How to Obtain Descriptives in SPSS •Analyze > Descriptive Statistics > Frequencies • •Move variables of interest to the right • •In ‘Statistics’ choose all measures you require • • • • Assumptions of Data •Not all data are suitable for all statistical tests • •Parametric and Non-parametric tests • •Parametric tests as a preference v. higher requests on data Normal Distribution http://curvebank.calstatela.edu/gaussdist/normal.jpg How to Check the Distribution • •1) Visual control – Histogram • • •2) Statistical tests: •Kolmogorov-Smirnov •Shapiro-Wilk 1) Histogram •Analyze > Descriptive Statistics > Frequencies • •In ‘Charts’ choose ‘Histogram’ • •Select ‘Show normal curve on histogram’ to draw a line corresponding to normal distribution 2) Statistical Tests •Kolmogorov-Smirnov (Shapiro-Wilk) •Both test the null hypothesis that your data are normally distributed • •Results: •Significant (p <= 0.05) – we reject the null hypothesis •Not significant (p > 0.05) – we keep the null hypothesis • •With large samples the tests tend to lead to significant results without meaningful reason à use histogram instead How to read the significance in SPSS outputs SPSS output Significance ,900 10 % ,750 25 % ,500 50 % ,200 80 % ,100 90 % ,050 95 % ,010 99 % ,001 99.9 % ,000 > 99.9 % = (1 – SPSS output) * 100 Example: (1 – 0.234) * 100 = 0.766 * 100 = 76.6 % • 2) Statistical Tests •Analyze > Descriptive Statistics > Explore • •Place variable of your interest into ‘Dependent List’ • •In ‘Plots’ select ‘Normality plots with tests’ Association of Two Variables •Depends on types of variables • •Crosstabs: •Suitable for two categorical variables •Low amount of categories in your variables (but at least two per variable) • •Correlation: •Two scale variables, scale and ordinal, two ordinal variables •Specific case – scale and binary variable Crosstabs •Contingency tables • •Describe interaction of two categorical variables • •Age groups of people v. turnout in election (yes/no) • •Allow generalization to population • • Crosstabs •Analyze > Descriptive statistics > Crosstabs • •Select variables for Columns and Rows • •Features: •Cells – counts, percentages, residuals •Statistics – Chi-square, Cramer’s V • •Try not to fill your crosstab with too many features Counts: Observed Counts: Observed Percentages: Row Counts: Observed Percentages: Column Counts: Observed Percentages: Row •Younger people do not vote to the same extent than older people • •But can we apply this to the whole population? Chi-square, Cramer’s V •There is a relationship between age and turnout, and it applies to the population • •But is it okay to end the analysis at this point? Can we find out more? Counts: Observed + Expected Counts: Observed + Expected Residuals: Unstandardized Counts: Observed + Expected Residuals: Adjusted standardized Counts: Observed + Expected Residuals: Adjusted standardized Chi-square, Cramer’s V Why Not Make It Too Complicated? Correlation •Association between two variables (for other cases than crosstabs) • •Examples: two scale variables, scale and ordinal, two ordinal variables • •Three coefficients: •Pearson •Spearman •Kendall • • Correlation •Results vary on a scale between -1 and 1 • •Interpretation: •Zero means no association between the variables •Rising distance from zero shows rising association (regardless the direction – negative or positive) •-1: perfect negative association •1: perfect positive association • •Beware of false absence of association •Always good to visualize data before calculating correlations • Výsledek obrázku pro correlation Pearson’s Correlation Coefficient •Parametric operation • •Requirements: •Scale data (exemption – scale and binary) •If we aim to apply the findings to the population, we need normally distributed data (or a large sample) • •Sensitive to outliers • Pearson’s Correlation Coefficient •Visualize the data •Graphs > Chart Builder •Select Scatter/Dot a variables of your interest • •Correlation •Analyze > Correlate > Bivariate •Select variables and the proper coefficient (PCC is set by default) •For significance select ‘Flag significant correlations’ Pearson’s Correlation Coefficient •Scale variable and binary variable • •Works the same as for two scale variables • •Beware of coding of the binary variable (be sure what values the codes represent) Non-Parametric Correlation •Spearman’s Rho and Kendall’s Tau •Correlation for other cases than two scale variables (or scale and binary) •Same interpretation as in Pearson’s CC •Preference of Kendall’s Tau if variables contain less categories and for smaller samples • •Analyze > Correlate > Bivariate •Select variables and Spearman/Kendall •For significance select ‘Flag significant correlations’ • Interpretation •Correlation does not imply causality •No control of other variables •No independent and dependent variable • •You cannot tell that one variable affects the other even in cases when such relationship seems to be meaningful and logical • •Keep the interpretation of effects of IVs on DV for the regression analysis Výsledek obrázku pro math doctorates hanging suicidies correlation Výsledek obrázku pro nicolas cage correlation