Descriptives, Crosstabs, Correlation
Methodology of Conflict and Democracy Studies
December 2

Aim of this lecture
•How to obtain basic information about your data
•
•Control of the assumptions
•
•Association of two variables:
•Crosstabs (Contingency tables)
•Correlation
•

Descriptive Statistics
•Basic measures to summarize the characteristics of your data
•
•Various types:
•Central tendencies – mean, median, modus
•Dispersion – standard deviation, variance, minimum, maximum
•
•Not all descriptives are suitable for all types of variables
•
•We use them to describe and explore your data
•
•

Výsledek obrázku pro same mean different standard deviation


Variable
Mean
Std. Dev.
Minimum
Maximum
Reelection
Nominal
0.75
0.43
0
1
Number of grants
Scale
1.33
1.52
0
10
Grant in election year
Nominal
0.36
0.48
0
1
Incumbent terms
Scale
2.19
1.21
1
6
Unemployment
Scale
19.41
13.15
0.00
94.94
Number of challengers
Scale
2.18
1.38
1
7
Grants in EUR (per capita)
Scale
77.74
209.62
0.00
5,331.82
Mayor from governing party
Nominal
0.41
0.49
0
1

How to Obtain Descriptives in SPSS
•Analyze > Descriptive Statistics > Frequencies
•
•Move variables of interest to the right
•
•In ‘Statistics’ choose all measures you require
•
•
•
•


Assumptions of Data
•Not all data are suitable for all statistical tests
•
•Parametric and Non-parametric tests
•
•Parametric tests as a preference v. higher requests on data

Parametric Data
1.Scale data (at least interval)
2.
2.Independence
3.
3.Normally distributed data
4.
4.Homogeneity of variance

Independence
https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Asch_experiment.svg/600px-Asch_experiment
.svg.png

Normal Distribution
http://curvebank.calstatela.edu/gaussdist/normal.jpg


Skewness


Kurtosis


How to Check the Distribution
•Visual control – Histogram
•
•Calculation of skewness and kurtosis
•
•Statistical tests:
•Kolmogorov-Smirnov
•Shapiro-Wilk

Histogram
•Analyze > Descriptive Statistics > Frequencies
•
•In ‘Charts’ choose ‘Histogram’
•
•Select ‘Show normal curve on histogram’ to draw a line corresponding to normal distribution


Skewness and Kurtosis
•Analyze > Descriptive Statistics > Frequencies
•In ‘Statistics’ choose these two options
•
•The values are only informative – you have to divide them by their standard error
•
•Acceptable values:
•Small sample – between -1.96 and 1.96
•Medium sample – between -2.58 and 2.58
•Large samples – do not use it
•

•Skewness:
•
•-0.020 / 0.045 = -0.44
•
•
•
Kurtosis:
0.279 / 0.097 = 3.07

Statistical Tests
•Kolmogorov-Smirnov (Shapiro-Wilk)
•Both test the null hypothesis that your data are normally distributed
•
•Results:
•Significant (p <= 0.05) – we reject the null hypothesis
•Not significant (p > 0.05) – we keep the null hypothesis
•
•With large samples the tests tend to lead to significant results without meaningful reason

Statistical Tests
•Analyze > Descriptive Statistics > Explore
•
•Place variable of your interest into ‘Dependent List’
•
•In ‘Plots’ select ‘Normality plots with tests’


Parametric Data
1.Scale data (at least interval)
2.
2.Independence
3.
3.Normally distributed data
4.
4.Homogeneity of variance

Homogeneity of Variance
•Assumption that the variances in various levels of data are equal
•
•The levels are defined by other (categorical) variable
•
•We use only a single test for this assumption
•
•Levene test

Homogeneity of Variances


Levene Test
•Tests the null hypothesis that variances are equal
•
•Results:
•Significant (p <= 0.05) – we reject the null hypothesis
•Not significant (p > 0.05) – we keep the null hypothesis
•
•With large samples the tests tend to lead to significant results without meaningful reason
•
•

Levene Test
•Analyze > Descriptive Statistics > Explore
•
•Place variable of your interest into ‘Dependent List’
•Place second variable that defines the levels of data into ‘Factor list’
•
•In ‘Plots’ select ‘Spread vs Level with Levene Test’ and ‘Untransformed’


Association of Two Variables
•Depends on types of variables
•
•Crosstabs:
•Suitable for two categorical variables
•Low amount of categories in your variables (but at least two per variable)
•
•Correlation:
•Two scale variables, scale and ordinal, two ordinal variables
•Specific case – scale and binary variable

Crosstabs
•Contingency tables
•
•Describe interaction of two categorical variables
•
•Age groups of people v. turnout in election (yes/no)
•
•Allows generalization to population
•
•

Crosstabs
•Analyze > Descriptive statistics > Crosstabs
•
•Select variables for Columns and Rows
•
•Features:
•Cells – counts, percentages, residuals
•Statistics – Chi-square, Cramer’s V
•
•Try not to fill your crosstab with too many features

Counts: Observed


Counts: Observed
Percentages: Row


Counts: Observed
Percentages: Column


Counts: Observed + Expected


Counts: Observed
Percentages: Row
•Younger people do not vote to the same extent than older people
•
•But can we apply this to the whole population?

Counts: Observed + Expected
Residuals: Unstandardized


Counts: Observed + Expected
Residuals: Adjusted standardized


Counts: Observed + Expected
Residuals: Adjusted standardized
Chi-square, Cramer’s V

Why Not Make It Too Complicated?


Adding Layers


Adding Layers


Correlation
•Association between two variables (for other cases than crosstabs)
•
•Examples: two scale variables, scale and ordinal, two ordinal variables
•
•Three coefficients:
•Pearson
•Spearman
•Kendall
•
•

Correlation
•Results vary on a scale between -1 and 1
•
•Interpretation:
•Zero means no association between the variables
•Rising distance from zero show rising association (regardless the direction – negative or
positive)
•-1: perfect negative association
•1: perfect positive association
•
•Beware of false absence of association
•Always good to visualize data before calculating correlations
•

Pearson’s Correlation Coefficient
•Parametric operation
•
•Requirements:
•Scale data (exemption – scale and binary)
•If we aim to apply the findings to population we need normally distributed data (or a large
sample)
•
•Sensitive to outliers
•

Pearson’s Correlation Coefficient
•Visualize the data
•Graphs > Chart Builder
•Select Scatter/Dot a variables of your interest
•
•Correlation
•Analyze > Correlate > Bivariate
•Select variables and the proper coefficient (PCC is set by default)
•For significance select ‘Flag significant correlations’


Výsledek obrázku pro correlation


Pearson’s Correlation Coefficient
•Scale variable and binary variable
•
•Works the same as for two scale variables
•
•Beware of coding of the binary variable (you provide codes for each value)


Non-Parametric Correlation
•Spearman’s Rho and Kendall’s Tau
•Correlation for other cases than two scale variables (or scale and binary)
•Same interpretation as in Pearson’s CC
•Preference of Kendall’s Tau if variables contain less categories and for smaller samples
•
•Analyze > Correlate > Bivariate
•Select variables and Spearman/Kendall
•For significance select ‘Flag significant correlations’
•

Interpretation
•Correlation does not imply causality
•No control of other variables
•No independent and dependent variable
•
•You cannot tell that one variable affects the other even in cases when such relationship seems to
be meaningful and logical
•
•Keep the interpretation of effects of IVs on DV for the regression analysis