P033 Scientific Data Processing

Faculty of Informatics
Spring 1997
Extent and Intensity
2/1. 3 credit(s). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
Teacher(s)
doc. RNDr. Vladimír Znojil, CSc. (lecturer)
Guaranteed by
Contact Person: doc. RNDr. Vladimír Znojil, CSc.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Syllabus
  • Data set, objects and atributes, types od data: alternative, categorial, quantitative. Elementary characteristics of methods for data gathering. Methods for description of the data: histogram, average, median, modus, alpha-quantils. Frequency functions and frequency density. Application to one- and two-dimensional data sets.
  • Elementary terms in the theory of probability. Discrete and continuous probability. Probability density and distribution functions. Stochastically independent and dependent phenomena, conditional probability. Bayes relation.
  • Elementary types of distribution functions, binomic, Poisson, normal and log-normal distribution. Their basic characteristics and applications. Some types of special distribution functions, restricted distributions.
  • The law of large numbers, central limiting theorems. Their importance for statistical evaluating and restricting assumptions of their validity.
  • Characteristics of distribution functions, moments and their characteristics, principals of testing various types of distributions. The role of normal distribution in statistics.
  • Interval estimation, confidence intervals separate and simultaneous.
  • Hypothesis testing, types of tests, sequential tests. Errors of the first and second types, their mutual relation. Parametric and non-parametric procedures. Some other modern approaches and comparison of various methods.
  • Frequent statistical calculations: correlation and regression, analysis of variance in simple and complex situations. The least square method, its advantages and disadvantages. Some interesting applications of LSM as a substitution of ANOVA.
  • Comparison of averages and deviations of experimental values, comparison of groups, Holms method.
  • Multidimensional data and methods of their processing: reduction of dimensionality and exploiting methods of data analysis. Representability of data and problems of data distortion. Statistical models of data sets.
  • Principal component analysis (PCA), method of "reciprocal averaging" (RA), detrended correspondence analysis (DCA).
  • Factor analysis, its tasks and methods, searching for factors and basic types of factor rotation. Relations and problems with interpreting of results. The use of factor analysis.
  • Cluster analysis: metrics of similarity spaces, the use of alternative and categorial data, "mixed data" and their metrics. Methods for evaluations of cluster distance. Hierarchic methods for clustering "from the top" and "from the bottom", non-hierarchic methods for clustering. Advantages and disadvantages of the methods. The method of "two-way clustering". Applications of cluster analysis in ecology and biology.
  • Discrimination analysis, selection of parametric space. Aposterior clasification probabilities. The use of discriminating methods in biology and medicine.
  • Heuristic methods of data analysis, GUHA methods. Their use and risks in using these.
  • Short review of what not to forget and what and when to use. Statistical programme packages and their content (Statgraph, BMDP, SPSS, SyStat, Statistica).
Language of instruction
Czech
The course is also listed under the following terms Spring 1996, Spring 1998, Spring 1999, Spring 2000, Spring 2001, Spring 2002.
  • Enrolment Statistics (Spring 1997, recent)
  • Permalink: https://is.muni.cz/course/fi/spring1997/P033