Statistical methods in biology and medicine Group of mathematical methods concerning the collection, analysis and interpretation of the data A complete description of the world is both impossible and impractical (statistics represent a tool for reducing the variability of the data) Statistics creates mathematical models of the reality that can be helpful in making decisions It works correctly only when the assumptions of its methods are met Descriptive statistics • Population-wide-works with the data related to whole surveyed population (e.g. census, medical registry) • Inductive-conclusions based on sample data (obtained from a part of the target population) are extrapolated to whole population (assumption: random selection of the sample) Statistics as a data processing tool • „raw data" - often difficult to grasp • Descriptive statistics can make the data (of given sample) understandable kod Icislo adrenalin noradrenalin hypokineza ERa 397/Pvull ERa 351/Xbal TTCB113 -2013 1 354 3E43 baze CT AG TTCKE14-2013 2 307 2955 apex TT AA TTCKH15-2013 473 ÍÚ7Í apex CT AG TTCAJ16-2013 4 341 2108 apex CT AG TTCCHM17-2013 5 321 2031 apex CC GG TTCCH SIS-2013 6 42 E 1931 apex TT AA TTCRK19-2013 7 508 1753 difuzni TT AA TTCPD20-2013 S 374 1088 difuzni CT AA TTCMJ21-2013 9 597 1798 apex CC GG TTCPO22-2013 10 420 2856 apex CT AG TTWA23-2013 11 367 2(357 apex CT AA TTCNL24-2013 12 327 2467 apex CT AG TTCJF25-2013 13 395 3929 apex CC GG TTCZM26-2013 14 344 37ue apex CT AG TTCHJ27-2013 15 42 E 4225 apex TT AA TTCGT2S-2013 16 2E5 240B apex CT AG TTCSB29-2013 17 295 3186 apex CT AG Kinds of data • Continuous (always quantitative) - the parameter can theoretically be of any value in a given interval (e.g. glucose concentration: 0-°°; ejection fraction: 0-100%) • Ratio vs. interval data - only differences, but not ratios of two values can be determined (e.g. IQ score) • Categorical (usually qualitative) - the parameter can only be of some specified values (e.g. blood group: 0, A, B, AB; sex: male, female; a disease is present/absent) • Ordinal data - are categorical, but quantitative (they can be ordered - e.g. heart failure classification NYHA l-IV) • Count data - can be ordered and form a linearly increasing row (e.g. number of children in a family: 0,1,2...) - they are often treated as continuous data • Binary data - only two possibilities (patients / healthy controls) The distribution of continuous data - histograms • The distribution of a continuous parameter can be visualized graphically (e.g. using histograms) • The values usually cluster around some numbers Heights of 30 people 10 9 6 7 6 5 4 3 a 1 D 139.5 149.5 159.5 169.5 179.5 1*9.5 1*9.5 Heights in cm www.aiMlyzem.ith.com Description of continuous data Measures of central tendency • The arithmetic mean (\x) • sum of values divided by their number (n) • The median (= 50% quantile) • cuts the order of values in half • The mode • most frequent value Measures of variability • variance (a2) • standard deviation (SD, a) • coefficient of variance (CV) • CV = a/n • standard error of mean (SE, SEM = • min-max (= range) • quartiles • upper 25% • median • lower 75% • skewness • kurtosis e probability distribution of continuous random varia Negatively (left) Normal Positively (right) skewed skewed skewed distribution _ distribution _ distribution • Probability density function • In graphs each (continuously) quantifiable variable (x axis) is linked to its probability (y axis) Examples of continuous data distribution I H •-■I / A ■Ii» Histograms + corresponding probability density functions Other ways of graphical visualisation • Box and whisker plots • Instead of median e.g. mean can be used, instead of quartiles („box") ± k a z critical 0.10 1.2S 0.05 1.65 O.Ol 2.33 • H0 Is symmetric: there is no difference between drug A and drug B (i.e. A is neither better nor worse than B) • They can reveal the differences in both ways • They are usually more suitable - we don't know the result a priori, and we are interested in both possible effects Tests for continuous data, 2 samples - examples Test Parametric Non-parametric Paired Paired (dependent) Wilcoxon paired test Student's t-test Sign test Unpaired Unpaired (independent) Mann-Whitney U-test * Student's t-test Kolmogorov-Smirnov test Tests for continuous data, more than 2 samples - examples Test Parametric Non-parametric Paired Repeated measures ANOVA (Analysis Of VAriance) -RMANOVA Friedman test („ANOVA") Unpaired One-way ANOVA (and its variants) Kruskal-Wallis test („ANOVA") • When ANOVA rejects H0, it is necessary to find out which specific samples differ from each other - post hoc tests Choose the best test In a clinical trial, patients take either a new drug to treat epilepsy or a placebo. The study is randomized (the study group is randomly drawn). Only patients, which have at least one and at most ten seizures in three months are included. The study evaluates a number of seizures during the first year of treatment A. Paired t-test B. Unpaired t-test v^ C. Mann-Whitney U-test D. Sign test E. Repeated measures ANOVA