E7490 Advanced non-parametric methods

Faculty of Science
Autumn 2024
Extent and Intensity
1/1/0. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium).
In-person direct teaching
Teacher(s)
Mgr. Klára Komprdová, Ph.D. (lecturer)
Guaranteed by
prof. RNDr. Ladislav Dušek, Ph.D.
RECETOX – Faculty of Science
Contact Person: Mgr. Klára Komprdová, Ph.D.
Supplier department: RECETOX – Faculty of Science
Timetable
Fri 12:00–13:50 D29/347-RCX2
Prerequisites
Bi5040 Biostatistics - basic course || Bi5045 Biostatistics for Comp. Biol.
Knowledge on basic unidimensional exploratory statistical techniques, analysis of variance, correlation analysis.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Course objectives
The aim of the course is to provide students with knowledge regarding basic and advanced nonparametric methods for classification and regression and teach them how to apply these methods in different software tools (R-project, Matlab, Statistica).
Learning outcomes
At the end of the course, students should be able to:
- critically evaluate the data set in terms of distribution of data
- use classification and regression nonparametric methods
- validate the model outputs using different validation techniques
- compare results from different models
- acquisition of various software to create models(R-project, Matlab, Statistica)
- compare the advantages and disadvantages of different methods
Syllabus
  • 1. Introduction to Nonparametric Methods - Basic concepts - process modeling, types of variables, classification model, classification x regression, parametric and nonparametric multivariate statistics - a comparison of different approaches, the introduction of various software (statistics, R-project, MATLAB).
  • 2. Decision tree I - tree topology, criterial statistics, stability of the tree, crossvalidation, measurement of accuracy, tree pruning, surrounding variables, classification vs. regression trees, CART algorithm, the advantages and disadvantages of decision trees.
  • 3. Decision tree II - another algorithm of building tree: Patient Rule Induction Method (PRIM), Chi-squared Automatic Interaction Detector (CHAID), Quick, Unbiased and Efficient Statistical Tree (QUEST), Hierarchical Mixture of Experts (HME), Multivariate Adaptive Regression Splines (MARS).
  • 4. Random Forests I - extension of decision trees, creation of validation of forests, different types of forests: Bagging, Boosting, Arcing.
  • 5. Random Forests II - measuring importance of variables, the effect of variables on the prediction, clustering, outlier detection, precision, prediction.
  • 6. Accuracy of models I - matrix of confusion, definition of threshold dependent and independent indexes. Threshold dependent indexes: Normalized Mutual Information (MI),Average of Mutual Information (AMI), Overall Accuracy, Cohen's kappa, Tau index.
  • 7. Accuracy of models II - threshold independent indexes, specificity x sensitivity, Receiver Operating Characteristic curve (ROC) , Area Under the ROC Curve (AUC), coefficient of determination R2, deviation D2, maximum overall accuracy MXOA, maximum kappa (MXKp), Mean cross entropy (MXE), Mean absolute prediction error (MAPE).
  • 8. Validation technique I - validation, testing and training subsets, analytical methods for validation: Akaike's information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), Structural risk minimization (SRM).
  • 9. Validation technique II - Monte Carlo methods, principles of resampling techniques: simple splitting, cross-validation, bootstrap and jackknife.
  • 10. Real examples of using nonparametric models: Predictive modeling of species occurrence, concentration of pollutants.
Literature
  • Legendre P., Legendre L. (1998) Numerical ecology (second ed.), Elsevier, Amsterdam
  • Jan Klaschka, Emil Kotrč: Klasifikační a regresní lesy, sborník konference ROBUST 2004
  • Breiman L. (2001) Random forests. Machine Learning 45, pp. 5 32.
  • Lažanský et. Kol.: Umělá inteligence I.- IV.
  • Hastie T., Tibshirani R., Friedman J.: The Elements of Statistical Learning, Data mining, Inference and Prediction, Springer 2003
  • Breiman, L. et al (1984) Classification and Regression Trees, Chapman and Hall
  • Breiman L. (1996) Bagging predictors. Machine Learning 24, pp.123 140.
  • McCullagh C. E., Searle S. R. (2001): Generalized, Linear, and Mixed Models, John Wiley & Sons.
  • MANLY, Bryan F. J. Randomization, bootstrap and Monte Carlo methods in biology. 3rd ed. Boca Raton, Fla.: Chapman & Hall, 2007, 455 s. ISBN 9781584885412. info
  • EDGINGTON, Eugene S. and Patrick ONGHENA. Randomization tests. 4th ed. Boca Raton, FL: Chapman & Hall/CRC, 2007, 345 s. ISBN 9781584885894. info
Teaching methods
Education is performed as lectures with PowerPoint presentation. Each lecture block will be supplemented with practical lesson on PC where different approaches will be tested on various SW. Real examples from experimental biology, ecology and chemistry will be presented during these lectures. Students are asked to interpret results of practical examples. Student develop a project on a selected topic during the semester.
Assessment methods
Final assesment (at the end of semester) is combination of written examination and project evaluation.
Language of instruction
Czech
Further Comments
The course can also be completed outside the examination period.
The course is taught annually.
Teacher's information
http://www.iba.muni.cz/vyuka/
The course is also listed under the following terms Autumn 2022, Autumn 2023.
  • Enrolment Statistics (recent)
  • Permalink: https://is.muni.cz/course/sci/autumn2024/E7490