PřF:Bi7496 Modern regr.& class.tech.biol. - Course Information
Bi7496 Modern regression and classification techniques in computational biology
Faculty of ScienceAutumn 2007
- Extent and Intensity
- 0/0. 4 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: z (credit).
- Teacher(s)
- prof. Michael Schimek, Ph.D. (lecturer)
RNDr. Tomáš Pavlík, Ph.D. (seminar tutor)
RNDr. Eva Gelnarová (seminar tutor) - Guaranteed by
- prof. RNDr. Jiří Hřebíček, CSc.
Faculty of Science - Course Enrolment Limitations
- The course is offered to students of any study field.
- Course objectives
- Regression and classification methods based on the classical linear model and its extensions constitute a core part of multivariate statistics. Their important role in computational biology and medicine is obvious. However, over the last twenty years various new approaches have been developed which are more appropriate for the analysis of biodata. Most of them relax parametric model assumptions and add additional flexibility when our task is fitting models to quantitative observations. Others are destined to perform predictive tasks, e.g. in risk estimation. The latter belong to the group of statistical learning procedures. An additional complication is the size and complexity of the data we wish to analyse. Special techniques have been proposed to handle huge data sets and n much smaller then p-problems (typical for genetic data). Modern regression and classification techniques heavily rely on efficient computing. We take advantage of the open source R statistics and graphics environment. In the lectures (2 hours) selected statistical approaches and appropriate computer concepts are introduced. In the computer laboratory (2 hours) applied data problems are discussed and analyzed with R procedures. Moreover each student is requested to do a small case study on her/his own as an exercise. The results should be summarized in a short written report in English. There will be an introduction into R at the beginning of the computer laboratory.
- Syllabus
- * Typical applications of regression and classification techniques in computational biology. * Typical data structures (errors, complexity, size, and dimensionality) in the modern biosciences. * The concept of regression model fitting. * The concept of statistical learning (prediction). * Curse of dimensionality and ill-posed problems (incl. n much smaller then p-problem). * Complexity control, regularization, and penalization. * The role of computing and algorithms. * Introduction to smoothing techniques (including k-Nearest-Neighbors). * Generalized additive non- and semiparametric regression models. * Metric, distance, and similarity. * Regression and classification trees. * Linear classification methods and extensions. * Nonparametric classification methods. * Support vector machines as statistical learning tool.
- Language of instruction
- English
- Further Comments
- Study Materials
The course is taught only once.
- Enrolment Statistics (Autumn 2007, recent)
- Permalink: https://is.muni.cz/course/sci/autumn2007/Bi7496