M7DataSP Advanced Data Science Practicum

Faculty of Science
Autumn 2024
Extent and Intensity
0/2/1. 3 credit(s) (příf plus uk k 1 zk 2 plus 1 > 4). Type of Completion: z (credit).
In-person direct teaching
Teacher(s)
Mgr. Eva Maršálková (lecturer)
Mgr. Petr Šimeček, MSc., Ph.D. (lecturer)
Guaranteed by
doc. PaedDr. RNDr. Stanislav Katina, Ph.D.
Department of Mathematics and Statistics – Departments – Faculty of Science
Supplier department: Department of Mathematics and Statistics – Departments – Faculty of Science
Timetable of Seminar Groups
M7DataSP/01: Mon 10:00–11:50 MP1,01014, P. Šimeček
Prerequisites
It is expected that students have some experience with a programming language suitable for Data Analysis, e.g. Python or R. The code examples will be given in Python.
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
The capacity limit for the course is 30 student(s).
Current registration and enrolment status: enrolled: 30/30, only registered: 2/30, only registered with preference (fields directly associated with the programme): 0/30
fields of study / plans the course is directly associated with
Course objectives
The main goal is to get hands-on experience with data analysis and machine learning methods. Also to deepen students' programming skills.
Learning outcomes
This course will enable students to
- predict dependent variable with linear or logistic regression
- examine unknown data using Principal Component Analysis and/or clustering
- split data into training and testing sets, understand variance vs bias trade-off
- use classification and regression trees, forests, bagging and boosting (XGBoost, LightGBM, CatBoost)
- get basics of pytorch, applying neural networks and fine-tuning to image and NLP data
- get experience with large language models, both trough API and with HuggingFace transformers package

As a side product, after on this course students will also practice
- data cleaning
- visualizations
- data transformation (group by, summary)
- working with git and GitHubem
- working on command line
- reproducible analysis and documents (Jupyter notebook, markdown, quatro)
- social skills, working in groups
Syllabus
  • The details can be found on GitHub (also look into materials from previous years) https://github.com/simecek/dspracticum2024
Teaching methods
Each lecture will be focused on one dataset and problem on which we demonstrate a new data science skill. Students are expected to submit homework before each lecture.
Assessment methods
group homeworks (by group of 2-4 students), extra 30% optional final project (individual). To pass, you must achieve at least 70% points.
Language of instruction
Czech
Further Comments
Study Materials
Teacher's information
https://github.com/simecek/dspracticum2024
The course is also listed under the following terms Autumn 2020, autumn 2021, Autumn 2023.
  • Enrolment Statistics (recent)
  • Permalink: https://is.muni.cz/course/sci/autumn2024/M7DataSP