Bi7496 Modern regression and classification techniques in computational biology

Faculty of Science
Autumn 2007 - for the purpose of the accreditation
Extent and Intensity
0/0. 2 credit(s). Type of Completion: z (credit).
Teacher(s)
prof. Michael Schimek, Ph.D. (lecturer)
RNDr. Tomáš Pavlík, Ph.D. (seminar tutor)
RNDr. Eva Gelnarová (seminar tutor)
Guaranteed by
prof. RNDr. Jiří Hřebíček, CSc.
Faculty of Science
Course Enrolment Limitations
The course is offered to students of any study field.
Course objectives
Regression and classification methods based on the classical linear model and its extensions constitute a core part of multivariate statistics. Their important role in computational biology and medicine is obvious. However, over the last twenty years various new approaches have been developed which are more appropriate for the analysis of biodata. Most of them relax parametric model assumptions and add additional flexibility when our task is fitting models to quantitative observations. Others are destined to perform predictive tasks, e.g. in risk estimation. The latter belong to the group of statistical learning procedures. An additional complication is the size and complexity of the data we wish to analyse. Special techniques have been proposed to handle huge data sets and n much smaller then p-problems (typical for genetic data). Modern regression and classification techniques heavily rely on efficient computing. We take advantage of the open source R statistics and graphics environment. In the lectures (2 hours) selected statistical approaches and appropriate computer concepts are introduced. In the computer laboratory (2 hours) applied data problems are discussed and analyzed with R procedures. Moreover each student is requested to do a small case study on her/his own as an exercise. The results should be summarized in a short written report in English. There will be an introduction into R at the beginning of the computer laboratory.
Syllabus
  • * Typical applications of regression and classification techniques in computational biology. * Typical data structures (errors, complexity, size, and dimensionality) in the modern biosciences. * The concept of regression model fitting. * The concept of statistical learning (prediction). * Curse of dimensionality and ill-posed problems (incl. n much smaller then p-problem). * Complexity control, regularization, and penalization. * The role of computing and algorithms. * Introduction to smoothing techniques (including k-Nearest-Neighbors). * Generalized additive non- and semiparametric regression models. * Metric, distance, and similarity. * Regression and classification trees. * Linear classification methods and extensions. * Nonparametric classification methods. * Support vector machines as statistical learning tool.
Language of instruction
English
Further Comments
The course is taught only once.
The course is also listed under the following terms Spring 2011 - only for the accreditation, Spring 2007, Autumn 2007, Autumn 2008, Autumn 2009, Spring 2010, Spring 2011, Spring 2012, spring 2012 - acreditation, Spring 2013, Spring 2014.