Revision 1
C2115
Practical introduction to supercomputing
Lesson 15
Petr Kulhánek
kulhanek@chemi.muni.cz
National Center for Biomolecular Research, Faculty of Science, Masaryk University, Kotlářská 2, CZ-61137 Brno
15 Practical introduction to supercomputing
Lesson 15
Content
> GPGPU
comparison to CPU
> Running applications
pmemd on GPU
15 Practical introduction to supercomputing Lesson 15
GPGPU
General-purpose computing on graphics processing units
Nvidia Tesla P100
15 Practical introduction to supercomputing
Lesson 15
-3-
CPU vs GPU
https://docs.nvidiaxom/cuda/cuda-c-programming-guide/index.htm
CPU GPU
The GPU contains a large number of GPU computing cores that are organized into groups (SM, streaming processors). The GPU performs computational operations on a group of data - vector data processing.
Example: Nvidia RTX 3070
• 5,888 CUDA colors
• 46 Streaming Multiprocessors
Demonstration video:
https://www.youtube.com/watch?v=-P28LKWTzrl
15 Practical introduction to supercomputing
Lesson 15
Using the GPU
Parallelization of jobs using GPU for their run requires non-trivial modifications in algorithms and the use of special development environments.
Programming method:
• Nvidia CUDA
• OpenCL (Nvidia, AMD, etc.)
• or the use of optimized libraries
•   cuBLAS, cuFFT, etc.
15 Practical introduction to supercomputing
Lesson 15
-5-
GPU task monitoring (Nvidia)
• In batch systems, only the assigned GPU are available to the job (affected by the CUDA_VISIBLE_DEVICES variable set by the batch system).
• The progress of job on the GPU can be monitored using the tool nvidia-smi.
[kulhanek@wolf30 main]$ nvidia-smi Mon Mar    1 20:37:41 2021 +----------------------------------
- +
I   NVIDIA-SMI 460.32.03
Driver Version:   460.32.03        CUDA Version:   11.2 | I-------------------------------+----------------------+----------------------+
|   GPU    Name Persistence-M|   Bus-Id Disp.A  |  Volatile Uncorr.  ECC |
|   Fan    Temp    Perf    Pwr:Usage/Cap| Memory-Usage   |   GPU-Util    Compute M. |
| | | MIG M. |
|===============================+======================+===
| 0 GeForce RTX 3070 Off | 00000000:01:00.0 Off | |     0%      48C        P8 15W /  240W   | 257MiB /    7982MiB |
I                                                                  I I +-------------------------------+----------------------+---
| 1 GeForce RTX 3070 Off | 00000000:C1:00.0 Off | |     0%      42C        P8 7W /  240W   | 2MiB /    7982MiB |
I                                                                  I I +-------------------------------+----------------------+---
0£
0£
========= I
N/A I
Default I N/A I
---------+
N/A I Default I
N/A I ---------+
Processes:					
GPU GI	CI	PID	Type	Process name	GPU Memory
ID	ID				Usage
0 N/A	N/A	9252	C	pmemd.cuda	255MiB
15 Practical introduction to supercomputing
Lesson 15
Exercise
15 Practical introduction to supercomputing
Lesson 15
pmemd.cuda
• pmemd is a program for molecular dynamics. More detailed information can be found here: http://ambermd.org
• Script for GPU run of the application:
#!/bin/bash		
# activate pmemd for GPU		
module add pmemd-cuda:18.1		
# run the application		
pmemd.cuda -0 -i prod.in -p	6000	.parm7 \
-c	6000	. rst7
15 Practical introduction to supercomputing
Lesson 15
-8-
Exercise 1
Input data is on the WOLF cluster in the directory: /home/kulhanek/Documents/C2115/data/chitin
1. Determine the performance of pmemd per 1 GPU in ns per day and compare it with the most powerful run of pmemd on the CPU (exercise L14.E1).
2. Run the task on 1GPU in a separate directory in the Infinity environment, request the entire node (place=excl) and use the same computing node (props=vnode=wolf30).
3. Monitor the progress of the task on the computing node with the tool nvidia-smi. Submitting the job into the batch system:
$ psubmit default run.sh ngpus=l place=excl props=vnode=wolf30
15 Practical introduction to supercomputing
Lesson 15
-9-