Statistics for Computer Science

Important information, lecture details, syllabus

Statistics for Computer Science (MV013)  is a foundational statistics course for master’s students at the Faculty of Informatics, building upon and expanding the knowledge acquired during their bachelor’s studies. The course focuses on the practical application of more advanced statistical methods.

If you are rather interested in the theoretical background of the methods covered, I recommend enrolling in the MA012 Statistics II course (held in autumn semester), instead.

Teaching and examination language of the course is English. Assignments will be given in English, but Czech, or Slovak solutions will be accepted, as well.
Prerequisites

Basic knowledge of calculus (functions, derivatives, definite integral) and linear algebra (matrices, determinants, eigenvalues, eigenvectors) is assumed. Basic knowledge of probability and statistics and experience with writing codes in statistical software R is required, within a scope of the course MB153 Statistics I or  MB143 Design and analysis of experiments or similar courses.

Specifically, you are expected to have sufficient knowledge of:

  • Random variables and vectors (probability distribution, probability mass function, probability density function, cumulative distribution function, independent random variables).
  • Numerical characteristics of random variables and vectors (expectation, variance, standard deviation, quantiles, covariance, correlation).
  • Descriptive statistics.
  • Confidence intervals and hypothesis testing (one-sample, paired and two-sample t-test).
  • Linear regression model.
  • Use of R language for statistical calculations and analysis.

Students without adequate knowledge of statistical methods or without an experience with the R language are highly recommended to complete the course MB143 or MB153 (in Czech only) first.

Course schedule
  1. Why do we need statistics? Data preparation.
  2. Exploratory data analysis (descriptive statistics, data visualization).
  3. What about multivariate case? Principal component analysis, multivariate outliers.
  4. Nonparametric and parametric models - comparison, application in practise.
  5. More on maximum likelihood method (from point to interval estimates).
  6. One-sample tests - what if your data is not normal? How to find out?
  7. Alternative approaches to one-sample t-test (Wilcoxon test, bootstrapping).
  8. From one-sample to two-sample (vs. paired problem) and even more-sample problem.
  9. What if the standard approach fails? (Wilcoxon, bootstrap).
  10. Linear regression model (model and statistical tests).
  11. Linear regression model (checking model assumptions).
  12. Is there a connection? - tests of independence for various types of the data.