E0410 Fundamentals of Statistics for Scientific Data Using R by Daria Sapunova, PhD student, RECETOX daria.sapunova@recetox.muni.cz Bohunice, D29, room 123 Theoretical part Mentimeter.com 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 The data is only for visual demonstration! Height (cm) R for Data Science - Grolemund, Garrett - Megaknihy.cz https://r4ds.had.co.nz/ Trying to apply a fancy new R library to my real world data : r/rstatsmemes How your data should not look before importing into R 1. 2. 3. 4. 5. How your data should look before importing into R How your data should look before importing into R ID Age Gender Height Weight Education Freq.of fish consumption (times/month) Source of fish (majority) Chronical disease Last med.exam. (years ago) Person 1 45 Male 170.4 76.4 PhD 21 Market Diabetes 10 Person 2 26 Male 168.3 65.3 BS 5 Self-fishing NA NA Person 3 54 Female 168.6 75.3 MS 23 Grocery store Bladder infection 10 Person 4 65 Male 156.6 44.2 MS 10 Market Diabetes 5 Person 5 21 Female 170.1 69.9 HS 8 Market No NA Person 6 … 44 Male 176.4 84.3 MS 43 Self-fishing No 2 Person 400 6 Male 121.2 21.9 NA 15 Market NA 1 Data preparation – cleaning, harmonizing and structuring üChoose variables you are going to work with. üUpload your data. üInspect the variables (do descriptive statistics, check unique values, check data type, check amount of missing data). üHarmonize your variables (esp. categorical variables). üStructure your data. Preliminary analysis üBuild histograms in case of numerical variables. üBuild box plots in case of categorical variables. üBuild scatterplots to see associations between numerical variables. üCheck data distributions in case of numerical variables (normality tests). üTransform the data if necessary. You may proceed to the analysis Task! Portrait of an excited young girl with laptop computer and celebrating success Data preparation – cleaning, harmonizing and structuring üChoose variables you are going to work with. üUpload your data. üInspect the variables (do descriptive statistics, check unique values, check data type, check amount of missing data). üHarmonize your variables (esp. categorical variables). üStructure your data. Preliminary analysis üBuild histograms in case of numerical variables. üBuild box plots in case of categorical variables. üBuild scatterplots to see associations between numerical variables. üCheck data distributions in case of numerical variables (normality tests). üTransform the data if necessary. You may proceed to the analysis Practical part