10: Data transformation and non-parametric tests What to do if t-test assumptions are substantially violated? lLarge difference in variances -Welch approximation usable only when the difference is low to moderate (and with rather high number of observations) -The data might follow the log-normal distribution → use transformation -Use a non-parametric test (but this might be tricky) lData do not come from a normal distribution -Check the log-normal possibility -Use non-parametric tests The log-normal distribution llog(X)~N(μ, σ2) lPositively skewed lDefined for numbers > 0 lVery common situation in biological research -Masses, dimension of biological objects -Counts can be approximated by log-normal distribution Data transformation using log - function lChanges the scale from additive to multiplicative -geometric instead of arithmetic means; exp(mean(log-data) = geometric mean -H0: The ratio between geometric means is 1.0 -Results say how many times the mean is larger (e.g. 1.2 times = by 20%) lIf suitable, improves both normality and homogeneity of variances lTest results do not depend on the type of logarithm used (just consistency is needed) Log Some more tricky types of data lOrdinal data le.g. behavioral experiments -Measures of reaction of an animal on an impulse lData do not follow the normal distribution lTransformation provides no help lNon-parametric tests -Do not test null hypotheses on parameters of the distributions Various non-parametric analogues of t-tests lPermutation tests -Based on the principle of repeated random re-assignment of data to groups and calculating the t -P-value corresponds to number of observations for which t is higher than that calculated based on the original data/total number of permutations • Number of permutations where |tpermut | > |tdata | Total number of permutations where |tpermut | > |tdata | Non-parametric tests based on order lMann-Whitney test -Analogue of a two-sample t-test -Original values replaced by their order in the whole dataset -These are then used for the calculation of the U statistic lP-value based on direct comparison to theoretical U distribution lOr approximation to normalized normal distribution (Z) – usually applied if ties are present lWilcoxon test -Analogue of a paired t-test -P-value based also mostly on normal (Z) approximation (if ties are present) lKruskal-Wallis test -Analogue of ANOVA -Dunn test for multiple comparisons lSpearman correlation coefficient -Order-based non-parametric correlation coefficient Non-parametric tests have also some assumptions lIdentical (though not normal) distributions from which the samples come -If we state the null hypothesis about the shift (i.e. difference of means) lHomogeneity of variances, quite similar to t-test/ANOVA lSame size of intervals for data on the ordinal scale