Descriptives, Crosstabs, Correlation Methodology of Conflict and Democracy Studies December 10 Aim of this lecture • •How to obtain basic information about your data • •Association of two variables: •Crosstabs (Contingency tables) •Correlation • Descriptive Statistics •Basic measures to summarize the characteristics of your data • •Various types: •Central tendencies – mean, median •Dispersion – variance, minimum, maximum • •Not all descriptives are suitable for all types of variables • •We use them to describe and explore your data • How to Obtain Descriptives in SPSS •Analyze > Descriptive Statistics > Frequencies • •Move variables of interest to the right • •In ‘Statistics’ choose all measures you require • • • • Association of Two Variables •Depends on types of variables • •Crosstabs: •Suitable for two categorical variables •Low amount of categories in your variables (but at least two per variable) • •Correlation: •Two scale variables, scale and ordinal, two ordinal variables •Specific case – scale and a binary variable Crosstabs •Contingency tables • •Describe interaction of two categorical variables • •Age groups of people v. turnout in election (yes/no) • •Allow generalization to population • • Crosstabs •Analyze > Descriptive statistics > Crosstabs • •Select variables for Columns and Rows • •Features: •Cells – counts, percentages, residuals •Statistics – Chi-square, Cramer’s V • •Try not to fill your crosstabs with too many features at once Counts: Observed Counts: Observed Percentages: Column Counts: Observed Percentages: Row Counts: Observed Percentages: Row •Younger people do not vote to the same extent than older people • •But can we apply this to the whole population? Chi-square, Cramer’s V •There is a relationship between age and turnout, and it applies to the population • •But is it okay to end the analysis at this point? Can we find out more? How to read the significance in SPSS outputs SPSS output Significance ,900 10 % ,750 25 % ,500 50 % ,200 80 % ,100 90 % ,050 95 % ,010 99 % ,001 99.9 % ,000 > 99.9 % = (1 – SPSS output) * 100 Example: (1 – 0.234) * 100 = 0.766 * 100 = 76.6 % • Chi-square, Cramer’s V •There is a relationship between age and turnout, and it applies to the population • •But is it okay to end the analysis at this point? Can we find out more? In case of absence of any effect Counts: Observed + Expected Counts: Observed + Expected Residuals: Unstandardized Counts: Observed + Expected Residuals: Adjusted standardized Counts: Observed + Expected Residuals: Adjusted standardized Chi-square, Cramer’s V Why Not Make It Too Complicated? Correlation •Association between two variables (for other cases than crosstabs) • •Examples: two scale variables, scale and ordinal, two ordinal variables • •Three coefficients: •Pearson •Spearman •Kendall • • Correlation •Results vary on a scale between -1 and 1 • •Interpretation: •Zero means no association between the variables •Rising distance from zero shows rising association (regardless the direction – negative or positive) •-1: perfect negative association •1: perfect positive association • •Beware of false absence of association •Always good to visualize data before calculating correlations • Výsledek obrázku pro correlation Pearson’s Correlation Coefficient •Parametric operation • •Requirements: •Scale data (exemption – scale and binary) •If we aim to apply the findings to the population, we need normally distributed data (or a large sample) • •Sensitive to outliers • Pearson’s Correlation Coefficient •Visualize the data •Graphs > Chart Builder •Select Scatter/Dot a variables of your interest • •Correlation •Analyze > Correlate > Bivariate •Select variables and the proper coefficient (PCC is set by default) •For significance select ‘Flag significant correlations’ •Negative result = higher values in variable 1 are associated with lower values in variable 2 and vice versa •In larger towns there is lower turnout in election (in smaller towns the turnout is higher) Pearson’s Correlation Coefficient •Scale variable and binary variable • •Works the same as for two scale variables • •Beware of coding of the binary variable (be sure what values the codes represent) Non-Parametric Correlation •Spearman’s Rho and Kendall’s Tau •Correlation for other cases than two scale variables (or scale and binary) •Same interpretation as in Pearson’s CC •Preference of Kendall’s Tau if variables contain less categories and for smaller samples • •Analyze > Correlate > Bivariate •Select variables and Spearman/Kendall •For significance select ‘Flag significant correlations’ • Interpretation •Correlation does not imply causality •No control of other variables •No independent and dependent variable • •You cannot tell that one variable affects the other even in cases when such relationship seems to be meaningful and logical • •Keep the interpretation of effects of IVs on DV for the regression analysis