Descriptives, Crosstabs, Correlation
Methodology of Conflict and Democracy Studies
December 10

Aim of this lecture
•
•How to obtain basic information about your data
•
•Association of two variables:
•Crosstabs (Contingency tables)
•Correlation
•

Descriptive Statistics
•Basic measures to summarize the characteristics of your data
•
•Various types:
•Central tendencies – mean, median
•Dispersion – variance, minimum, maximum
•
•Not all descriptives are suitable for all types of variables
•
•We use them to describe and explore your data
•

How to Obtain Descriptives in SPSS
•Analyze > Descriptive Statistics > Frequencies
•
•Move variables of interest to the right
•
•In ‘Statistics’ choose all measures you require
•
•
•
•


Association of Two Variables
•Depends on types of variables
•
•Crosstabs:
•Suitable for two categorical variables
•Low amount of categories in your variables (but at least two per variable)
•
•Correlation:
•Two scale variables, scale and ordinal, two ordinal variables
•Specific case – scale and a binary variable

Crosstabs
•Contingency tables
•
•Describe interaction of two categorical variables
•
•Age groups of people v. turnout in election (yes/no)
•
•Allow generalization to population
•
•

Crosstabs
•Analyze > Descriptive statistics > Crosstabs
•
•Select variables for Columns and Rows
•
•Features:
•Cells – counts, percentages, residuals
•Statistics – Chi-square, Cramer’s V
•
•Try not to fill your crosstabs with too many features at once

Counts: Observed


Counts: Observed
Percentages: Column


Counts: Observed
Percentages: Row


Counts: Observed
Percentages: Row
•Younger people do not vote to the same extent than older people
•
•But can we apply this to the whole population?

Chi-square, Cramer’s V
•There is a relationship between age and turnout, and it applies to the population
•
•But is it okay to end the analysis at this point? Can we find out more?

How to read the significance in SPSS outputs
SPSS output
Significance
,900
10 %
,750
25 %
,500
50 %
,200
80 %
,100
90 %
,050
95 %
,010
99 %
,001
99.9 %
,000
> 99.9 %
= (1 – SPSS output) * 100
Example: (1 – 0.234) * 100 = 0.766 * 100 = 76.6 %
•

Chi-square, Cramer’s V
•There is a relationship between age and turnout, and it applies to the population
•
•But is it okay to end the analysis at this point? Can we find out more?

In case of absence of any effect


Counts: Observed + Expected


Counts: Observed + Expected
Residuals: Unstandardized


Counts: Observed + Expected
Residuals: Adjusted standardized


Counts: Observed + Expected
Residuals: Adjusted standardized
Chi-square, Cramer’s V

Why Not Make It Too Complicated?


Correlation
•Association between two variables (for other cases than crosstabs)
•
•Examples: two scale variables, scale and ordinal, two ordinal variables
•
•Three coefficients:
•Pearson
•Spearman
•Kendall
•
•

Correlation
•Results vary on a scale between -1 and 1
•
•Interpretation:
•Zero means no association between the variables
•Rising distance from zero shows rising association (regardless the direction – negative or
positive)
•-1: perfect negative association
•1: perfect positive association
•
•Beware of false absence of association
•Always good to visualize data before calculating correlations
•

Výsledek obrázku pro correlation


Pearson’s Correlation Coefficient
•Parametric operation
•
•Requirements:
•Scale data (exemption – scale and binary)
•If we aim to apply the findings to the population, we need normally distributed data (or a large
sample)
•
•Sensitive to outliers
•

Pearson’s Correlation Coefficient
•Visualize the data
•Graphs > Chart Builder
•Select Scatter/Dot a variables of your interest
•
•Correlation
•Analyze > Correlate > Bivariate
•Select variables and the proper coefficient (PCC is set by default)
•For significance select ‘Flag significant correlations’


•Negative result = higher values in variable 1 are associated with lower values in variable 2 and
vice versa
•In larger towns there is lower turnout in election (in smaller towns the turnout is higher)

Pearson’s Correlation Coefficient
•Scale variable and binary variable
•
•Works the same as for two scale variables
•
•Beware of coding of the binary variable (be sure what values the codes represent)


Non-Parametric Correlation
•Spearman’s Rho and Kendall’s Tau
•Correlation for other cases than two scale variables (or scale and binary)
•Same interpretation as in Pearson’s CC
•Preference of Kendall’s Tau if variables contain less categories and for smaller samples
•
•Analyze > Correlate > Bivariate
•Select variables and Spearman/Kendall
•For significance select ‘Flag significant correlations’
•

Interpretation
•Correlation does not imply causality
•No control of other variables
•No independent and dependent variable
•
•You cannot tell that one variable affects the other even in cases when such relationship seems to
be meaningful and logical
•
•Keep the interpretation of effects of IVs on DV for the regression analysis