PV197 GPU Programming

Faculty of Informatics
Autumn 2009
Extent and Intensity
1/0. 2 credit(s) (plus extra credits for completion). Recommended Type of Completion: zk (examination). Other types of completion: k (colloquium).
Teacher(s)
doc. RNDr. Jiří Filipovič, Ph.D. (lecturer)
doc. RNDr. Petr Holub, Ph.D. (lecturer)
RNDr. Jiří Matela, Ph.D. (assistant)
Guaranteed by
prof. Ing. Václav Přenosil, CSc.
Department of Machine Learning and Data Processing – Faculty of Informatics
Contact Person: prof. Ing. Václav Přenosil, CSc.
Timetable
Mon 18:00–19:50 D2
Course Enrolment Limitations
The course is also offered to the students of the fields other than those the course is directly associated with.
fields of study / plans the course is directly associated with
Course objectives
The class focuses on programming of graphics processors (GPU), that allow achieving computing power unavailable for traditional universal processors, provided parallelism of GPUs is properly utilized. Students will learn architecture of GPUs as well as CUDA programming model. Basic design patterns suitable for implementation on GPUs will be analyzed. The students will prepare solutions to given problems using GPUs. At the end of the course, the successful students will understand SIMD/SIMT programming model and its usage on GPUs. They will be able to design parallelization of algorithm suitable for GPU and implement it using CUDA programming model.
Syllabus
  • Introduction: information about the class, motivation for GPU programming, overview of parallelism model, basics of CUDA, first demonstration code
  • GPU hardware and parallelism: hardware description (multiprocesor, SP, SFU, memory hierarchy), detailed description of parallelism (threads and blocks, SIMT model, warp scheduling, synchronization), atomic operation and thread voting, calculation on GPU -- time of instruction processing, arithmetic precision, example of different approaches to matrix multiplication -- na\"{\i}ve versus block-based
  • Memory model of GPUs: detailed defintion of different memory types, coalescing of global memory, bank efficiency of shared memory, access through PCIe, example of matrix transposition
  • CUDA, tools and libraries: detailed description of C vs. CUDA, detailed description of API aspects (initialization, memory management on GPU, stram management, error detection), compilation using nvcc, CPU emulation, debugging, profiling, CUBLAS, CUFFT, third party libraries, project assignment
  • Optimization: basic rules for algorithm design for GPU, latency hiding for global memory, redistribution of work in threads and thread blocks, parallelism inside thread, optimization of the sequential code, resource usage optimization
  • Basic patterns for parallel algorithms I: requires reading selected parts of Patterns for Parallel Programming book and integrating the knowledge with CUDA model
  • Basic patterns for parallel algorithms II
  • Case studies 1: Parallel reduction, prefix scan
  • Case studies 2: Molecular dynamics, Coulomb potential calculations, force calculation using cut-off
  • Case studies 3: Matrix multiplication, hybrid CPU/GPU algorithm for LU factorization
  • Case studies 4: MRI, Quick sort, other examples
  • Discussion of projects, presentation of best achieved results: presentation of 3 best solutions by authors, discussion of individual solutions, final discussion
Literature
  • MATTSON, Timothy G, Beverly A. SANDERS and Berna MASSINGILL. Patterns for Parallel Programming. Boston: Addison-Wesley, 2005, xiii, 355. ISBN 0321228111. info
  • The data parallel programming model : foundations, HPF realization, and scientific applications. Edited by Guy-René Perrin - Alain Darte. Berlin: Springer, 1996, xv, 284. ISBN 3540617361. info
  • GPU gems 3. Edited by Hubert Nguyen. Upper Saddle River, NJ: Addison-Wesley, 2007, l, 942. ISBN 9780321515261. info
Teaching methods
Lectures, reading of recommended literature, solving and programming assignments.
Assessment methods
Scores for assignment solutions: 50% for working solution, 50% bonus for performance of the solution. Oral exam after all the lectures: 50%. In order to pass successfully, scores for working solution and oral exam must not be 0.
Language of instruction
English
Further comments (probably available only in Czech)
Study Materials
The course is taught only once.
Information on the per-term frequency of the course: v dalších semestrech bude součástí IV112.
The course is also listed under the following terms Autumn 2010, Autumn 2011, Autumn 2012, Autumn 2013, Autumn 2014, Autumn 2015, Autumn 2016, Autumn 2017, Autumn 2018, Autumn 2019, Autumn 2020, Autumn 2021, Autumn 2022, Autumn 2023, Autumn 2024.
  • Enrolment Statistics (Autumn 2009, recent)
  • Permalink: https://is.muni.cz/course/fi/autumn2009/PV197