10: Data transformation and non-parametric tests


What to do if t-test assumptions are substantially violated?
lLarge difference in variances
-Welch approximation usable only when the difference is low to moderate (and with rather high
number of observations)
-The data might follow the log-normal distribution → use transformation
-Use a non-parametric test (but this might be tricky)
lData do not come from a normal distribution
-Check the log-normal possibility
-Use non-parametric tests

The log-normal distribution
llog(X)~N(μ, σ2)
lPositively skewed
lDefined for numbers > 0
lVery common situation in biological research
-Masses, dimension of biological objects
-Counts can be approximated by log-normal distribution

Data transformation using log - function
lChanges the scale from additive to multiplicative
-geometric instead of arithmetic means; exp(mean(log-data) = geometric mean
-H0: The ratio between geometric means is 1.0
-Results say how many times the mean is larger (e.g. 1.2 times = by 20%)
lIf suitable, improves both normality and homogeneity of variances
lTest results do not depend on the type of logarithm used (just consistency is needed)
Log

Some more tricky types of data
lOrdinal data
le.g. behavioral experiments
-Measures of reaction of an animal on an impulse
lData do not follow the normal distribution
lTransformation provides no help
lNon-parametric tests
-Do not test null hypotheses on parameters of the distributions

Various non-parametric analogues of t-tests
lPermutation tests
-Based on the principle of repeated random re-assignment of data to groups and calculating the t
-P-value corresponds to number of observations for which t is higher than that calculated based on
the original data/total number of permutations
•
Number of permutations where |tpermut | > |tdata |
Total number of permutations where |tpermut | > |tdata |

Non-parametric tests based on order
lMann-Whitney test
-Analogue of a two-sample t-test
-Original values replaced by their order in the whole dataset
-These are then used for the calculation of the U statistic
lP-value based on direct comparison to theoretical U distribution
lOr approximation to normalized normal distribution (Z) – usually applied if ties are present
lWilcoxon test
-Analogue of a paired t-test
-P-value based also mostly on normal (Z) approximation (if ties are present)
lKruskal-Wallis test
-Analogue of ANOVA
-Dunn test for multiple comparisons
lSpearman correlation coefficient
-Order-based non-parametric correlation coefficient

Non-parametric tests have also some assumptions
lIdentical (though not normal) distributions from which the samples come
-If we state the null hypothesis about the shift (i.e. difference of means)
lHomogeneity of variances, quite similar to t-test/ANOVA
lSame size of intervals for data on the ordinal scale