E7528 Analysis of genomic and proteomic data

Faculty of Science
Spring 2025
Extent and Intensity
2/1/0. 3 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium), z (credit).
In-person direct teaching
Teacher(s)
Mgr. Eva Budinská, Ph.D. (lecturer)
Mgr. Barbora Zwinsová (assistant)
Guaranteed by
Mgr. Eva Budinská, Ph.D.
RECETOX – Faculty of Science
Contact Person: Mgr. Eva Budinská, Ph.D.
Supplier department: RECETOX – Faculty of Science
Prerequisites
( Bi5040 Biostatistics - basic course || Bi5045 Biostatistics for Comp. Biol. || Bi5046 Biostatistics for Comp. Biol. || E5046 Biostatistics for Comp. Biol. ) && E8600 Multivariate Methods && E7527 Data Analysis in R && Bi4010 Essential molecular biology && E0034 Analysis & classif. data
Bi5040 Biostatistics - basic course or Bi5045 Biostatistics for Computational Biology Solid foundation in biostatistics and molecular biology and genetics is necessary. Having attended the following courses constitutes an advantage: Bi7527 Data analysis in R, Bi8600 Multivariate Methods, Bi4010 Basic molecular biology, Bi3060 Basic genetics, B7250 Human genetics.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 30 student(s).
Current registration and enrolment status: enrolled: 0/30, only registered: 5/30, only registered with preference (fields directly associated with the programme): 5/30
fields of study / plans the course is directly associated with
Course objectives
The aim of the course is to teach the students to analyze data from microarray experiments and mass spectrometry. The data analysis cover all the steps from pre-processing of raw data, normalization to biological interpretation of results, by the means of advanced statistical methods.
Learning outcomes
After following the course, student:
Knows basic types of biological and medical questions in genomic and proteomic experiments;
Knows selected technologies that are sources of high-density genomic and proteomic data (types of DNA microarrays, arrayCGH, and MASS spectrometry);
Knows basic data types produced by genomic and proteomic technologies and their drawbacks from biostatistician's point of view.
Can list basic steps of genomic and proteomic data analysis.
Is aware of technological details of microarrays and MASS spectrometry that can influence data structure, quality and subsequent analyses.
Understands basic methods of quantification leading to raw data matrix and the necessity of further data quality control and normalization.
Knows specific, technology dependent sources of noise in the data.
Using graphical and statistical tools can identify this noise in the data.
Applies statistical methods to remove the noise from the data.
Is able to perform necessary and specific data transformations (normalization).
Can standardize measurements between experiments.
From multiple raw datasets creates final data matrix of samples and proteins/genes for downstream analyses.
Can identify and remove batch effects in the data.
Describes general principles of analysis of genomic and proteomic data.
Based on the hypothesis and data type selects correct method for hypothesis testing.
Understands and apply SAM and limma.
Applies hypothesis testing for detection of differentially expressed genes and proteins.
Knows basic statistics methods for class prediction and applies them to genomic and proteomic data.
Knows basic non-parametric data-mining techniques and applies them to genomic and proteomic data.
Knows positives and drawbacks of different prediction methods.
Applies MAQC II standards for creating classifiers from microarray data.
Selects and applies multivariate regression strategies which combine gene expression and clinical data.
Applies Cox-proportional hazards model for prediction of prognostic role of genes/proteins and Kaplan-Meier estimates between gene/protein expression based groups.
Knows principles and methods for gene sets analysis.
Knows basic principle and methods for gene network analysis.
Applies gene set analysis on a model example.
Knows public databases of genomic and proteomic data.
Knows Fisher Z-transformation and other basic meta-analytical concepts in genomic data.
Applies meta-analytical methods for ordering ;
Performs genomic and proteomic analyses in R and Bioconductor;
Knows selected specific R and Bioconductor data structures and packages and applies them for data analysis.
Syllabus
  • 1. Challenges of genomic and proteomic technologies
  • 2.DNA microarrays: principles, types and design of probes, image analysis and data quantification;
  • 3.Quality control and normalization of cDNA microarray data;
  • 4.Quality control and normalization of oligonucleotide microarray data;
  • 5.Quality control and normalization of other microarrays (Epigenetic chips, SNP chips, Illumina BeadChip, ....);
  • 6. Protein MASS spectrometry: principles, data quantification,data quality control and normalization;
  • 7.Basic principles of downstream analysis of genomic and protemoic data
  • 8.Class comparison
  • 9.Class prediction
  • 10.Class discovery
  • 11.Survival analysis and other regression techniques
  • 12.Gene set and gene network analysis
  • 13.aCGH analysis
  • 14.Meta-analysis
Literature
    recommended literature
  • Meta-analysis and combining information in genetics and genomics. Edited by Rudy Guerra - Darlene Renee Goldstein. Boca Raton: CRC Press, 2010, xxiii, 335. ISBN 9781584885221. info
  • GENTLEMAN, Robert. R programming for bioinformatics. Boca Raton: CRC Press, 2009, xii, 314. ISBN 9781420063677. info
  • Bioinformatics and computational biology solutions using R and bioconductor. Edited by Robert Gentleman. New York: Springer, 2005, xix, 473. ISBN 0387251464. info
  • Data analysis and visualization in genomics and proteomics. Edited by Francisco Azuaje - Joaquín Dopazo. Hoboken, NJ: John Wiley, 2005, xv, 267. ISBN 0470094397. info
  • DRĄGHICI, Sorin. Data analysis tools for DNA microaarays. Boca Raton: Chapman & Hall/CRC, 2003, 477 s. +. ISBN 1-58488-315-4. info
Teaching methods
The lectures will be combined with practicals in R and its extension Bioconductor. First, the theory, concepts and methods are explained, then students can apply these concepts in data analysis of real examples. Students will choose a project they will analyze during the semester. In the second half of the semester, students will present their interim results in lectures.
Assessment methods
Final written test will consist of approximately 10 questions, scored by 20 points in total. This test will count for 50% of the final evaluation. Remaining 20 points will be awarded proportionally for the activity during the course (5 points) and for project elaboration quality (15 points). For successful completion of the course it is necessary to achieve at least 21 points and at least 10 points from the project. Students can use all the study materials, as the questions are designed to test mainly the knowledge of important general principles and capability of quickly applying the knowledge acquired during the course when performing real analysis.
Language of instruction
Czech
Further comments (probably available only in Czech)
The course is taught annually.
The course is taught: every week.
Information on course enrolment limitations: Doporučení absolvovat Bi8600, Bi4010, Bi3060
The course is also listed under the following terms Spring 2023, Spring 2024.
  • Enrolment Statistics (recent)
  • Permalink: https://is.muni.cz/course/sci/spring2025/E7528