PřF:Bi7490 Advanced non-parametric method - Course Information
Bi7490 Advanced non-parametric methods
Faculty of ScienceSpring 2012
- Extent and Intensity
- 2/1/0. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium).
- Teacher(s)
- Mgr. Klára Komprdová, Ph.D. (lecturer)
- Guaranteed by
- prof. RNDr. Ladislav Dušek, Ph.D.
RECETOX – Faculty of Science
Contact Person: Mgr. Klára Komprdová, Ph.D.
Supplier department: RECETOX – Faculty of Science - Timetable
- Thu 16:00–19:50 F01B1/709
- Prerequisites
- Bi5040 Biostatistics - basic course || Bi5045 Biostatistics for Comp. Biol.
Knowledge on basic unidimensional exploratory statistical techniques, analysis of variance, correlation analysis. - Course Enrolment Limitations
- The course is also offered to the students of the fields other than those the course is directly associated with.
- fields of study / plans the course is directly associated with
- General Biology (programme PřF, N-BI, specialization Ekotoxikologie)
- Special Biology (programme PřF, N-EXB)
- Special Biology (programme PřF, N-EXB, specialization Ekotoxikologie)
- Course objectives
- At the end of the course, students should be able to:
- critically evaluate the data set in terms of distribution of data
- use classification and regression nonparametric methods
- validate the model outputs using different validation techniques
- compare results from different models
- acquisition of various software to create models(R-project, Matlab, Statistica)
- compare the advantages and disadvantages of different methods - Syllabus
- Introduction to Nonparametric Methods
- Basic concepts - process modeling, types of variables, classification model, classification x regression, parametric and nonparametric multivariate statistics - a comparison of different approaches, the introduction of various software (statistics, R-project, MATLAB)
- Decision tree I
- tree topology, criterial statistics, stability of the tree, crossvalidation, measurement of accuracy, tree pruning, surrounding variables, classification vs. regression trees, CART algorithm, the advantages and disadvantages of decision trees
- Decision tree II
- another algorithm of building tree: Patient Rule Induction Method (PRIM), Chi-squared Automatic Interaction Detector (CHAID), Quick, Unbiased and Efficient Statistical Tree (QUEST), Hierarchical Mixture of Experts (HME), Multivariate Adaptive Regression Splines (MARS)
- Random Forests I
- extension of decision trees, creation of validation of forests, different types of forests: Bagging, Boosting, Arcing
- Random Forests II
- measuring importance of variables, the effect of variables on the prediction, clustering, outlier detection, precision, prediction
- Accuracy of models I
- matrix of confusion, definition of threshold dependent and independent indexes threshold dependent indexes: Normalized Mutual Information (MI), - Average of Mutual Information (AMI), Overall Accuracy, Cohenovo kappa, Tau index
- Accuracy of models II
- threshold independent indexes, specificity x sensitivity, Receiver Operating Characteristic curve (ROC) , Area Under the ROC Curve (AUC), coefficient of determination R2, deviation D2, maximum overall accuracy MXOA, maximum kappa (MXKp), Mean cross entropy (MXE), Mean absolute prediction error (MAPE)
- Validation technique I
- validation, testing and training subsets, analytical methods for validation: Akaike's information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), Structural risk minimization (SRM)
- Validation technique II
- Monte Carlo methods, principles of resampling techniques: simple splitting, crossvalidation, bootstrap and jackknife
- Real examples of using nonparametric models:
- Predictive modeling of species occurrence, concentration of pollutants
- Literature
- Breiman L. (2001) Random forests. Machine Learning 45, pp. 5 32.
- Lažanský et. Kol.: Umělá inteligence I.- IV.
- Legendre P., Legendre L. (1998) Numerical ecology (second ed.), Elsevier, Amsterdam
- Breiman, L. et al (1984) Classification and Regression Trees, Chapman and Hall
- Hastie T., Tibshirani R., Friedman J.: The Elements of Statistical Learning, Data mining, Inference and Prediction, Springer 2003
- Jan Klaschka, Emil Kotrč: Klasifikační a regresní lesy, sborník konference ROBUST 2004
- Breiman L. (1996) Bagging predictors. Machine Learning 24, pp.123 140.
- McCullagh C. E., Searle S. R. (2001): Generalized, Linear, and Mixed Models, John Wiley & Sons.
- MANLY, Bryan F. J. Randomization, bootstrap and Monte Carlo methods in biology. 3rd ed. Boca Raton, Fla.: Chapman & Hall, 2007, 455 s. ISBN 9781584885412. info
- EDGINGTON, Eugene S. and Patrick ONGHENA. Randomization tests. 4th ed. Boca Raton, FL: Chapman & Hall/CRC, 2007, 345 s. ISBN 9781584885894. info
- Teaching methods
- Education is performed as lectures with PowerPoint presentation. Each lecture block will be supplemeted with practical lesson on PC where different approaches will be tested on various SW. Real examples from experimental bilology, ecology and chemistry will be presented during these lectures. Students are asked to interpret results of practical examples. Student develop a project on a selected topic during the semester.
- Assessment methods
- Final assesment (at the end of semester) is combination of written examination and project evaluation.
- Language of instruction
- Czech
- Further Comments
- Study Materials
The course can also be completed outside the examination period.
The course is taught annually. - Teacher's information
- http://www.cba.muni.cz/vyuka/
- Enrolment Statistics (Spring 2012, recent)
- Permalink: https://is.muni.cz/course/sci/spring2012/Bi7490