Bi7496 Modern regression and classification techniques in computational biology

Přírodovědecká fakulta
jaro 2012
Rozsah
0/0. 4 kr. (plus ukončení). Doporučované ukončení: zk. Jiná možná ukončení: z.
Vyučující
prof. Michael Schimek, Ph.D. (přednášející)
RNDr. Tomáš Pavlík, Ph.D. (cvičící)
RNDr. Eva Gelnarová (cvičící)
Garance
prof. RNDr. Jiří Hřebíček, CSc.
RECETOX – Přírodovědecká fakulta
Dodavatelské pracoviště: RECETOX – Přírodovědecká fakulta
Předpoklady
Students should be familiar with the basics of the regression modelling. There will be an introduction into R at the beginning of the computer laboratory.
Omezení zápisu do předmětu
Předmět je otevřen studentům libovolného oboru.
Cíle předmětu
The aim of this course is to introduce students to modern regression and classification methods and its extensions that constitute a core part of multivariate statistics. The goals can be summarised as follows:
To demonstrate various new approaches that have been developed in last twenty years and which are appropriate for the analysis of biodata.
To describe the assumptions of parametric models and how to check them.
To show how to control the model flexibility when our task is fitting models to quantitative observations.
To teach students how to perform predictive tasks, e.g. in risk estimation.
To demostrate how to cope with the size and complexity of the data using special techniques that have been proposed recently.
Osnova
  • * Typical applications of regression and classification techniques in computational biology.
  • * Typical data structures (errors, complexity, size, and dimensionality) in the modern biosciences.
  • * The concept of regression model fitting.
  • * The concept of statistical learning (prediction).
  • * Curse of dimensionality and ill-posed problems (incl. n much smaller then p-problem).
  • * Complexity control, regularization, and penalization.
  • * The role of computing and algorithms.
  • * Introduction to smoothing techniques (including k-Nearest-Neighbors).
  • * Generalized additive non- and semiparametric regression models.
  • * Metric, distance, and similarity.
  • * Regression and classification trees.
  • * Linear classification methods and extensions.
  • * Nonparametric classification methods.
  • * Support vector machines as statistical learning tool.
Literatura
  • Hastie T., Tibshirani R., and Friedman J. The elements of statistical learning - data mining, inference and prediction. Springer, NewYork, 2001.
Výukové metody
In the lectures (2 hours) selected statistical approaches and appropriate computer concepts are introduced. In the computer laboratory (2 hours) applied data problems are discussed and analyzed with R procedures.
Metody hodnocení
Each student is requested to do a small case study on her/his own as an exercise. The results should be summarized in a short written report in English and subsequently presented to other students for discussion.
Vyučovací jazyk
Angličtina
Další komentáře
Předmět je vyučován jednorázově.
Předmět je zařazen také v obdobích podzim 2007 - akreditace, jaro 2011 - akreditace, jaro 2007, podzim 2007, podzim 2008, podzim 2009, jaro 2010, jaro 2011, jaro 2012 - akreditace, jaro 2013, jaro 2014.