Revision 1
C2115
Practical introduction to supercomputing
Lesson 14
Petr Kulhánek
kulhanek@chemi.muni.cz
National Center for Biomolecular Research, Faculty of Science, Masaryk University, Kotlářská 2, CZ-61137 Brno
15 Practical introduction to supercomputing
Lesson 14
Content
> Infinity
role, command overview
> Starting applications
pmemd parallel run
> Exercises
efficiency of running pmemd in parallel
15 Practical introduction to supercomputing
Lesson 14
-2-
Infinity
https://lcc.ncbr.muni.cz/whitezone/development/infinity/
15 Practical introduction to supercomputing
Lesson 14
-3-
Overview of commands
Software management:
• site activation of logical computing resources
• software activation/deactivation of software
Task management:
• pqueues overview of batch system queues available to the user
• pnodes overview of computing nodes available to the user
• pqstat overview of all tasks submitted into the batch system
• pjobs overview of user tasks submitted into the batch system
• psubmit submitting a job into the batch system
• pinfo job information
• pgo logs the user on to the computing node where the task is performed
• psync manual data synchronization
15 Practical introduction to supercomputing
Lesson 14
Job
Job has to fulfill following conditions:
• each job runs in a separate directory
• all job input data must be in the job directory
• job directories must not be nested
• progress of the task is controlled by a script or input file (for automatically detected jobs)
• job script must be in bash
• absolute paths must not be used in the job script, all paths must be stated relative to the job directory
Job script
Job script can be introduced by standard interpreter for bash or special interpreter infinity-env which does not allow the task to run outside the computing node. The second approach prevents possible damage/overwriting/deletion of already calculated data by accidental re-running of the script.
#!/bin/bash
# script itself
#!/usr/bin/env infinity-env
# script itself
15 Practical introduction to supercomputing
Lesson 14
Submitting a job
We are submitting the task in the job directory by command psubmit. psubmit destination job [resources]
destination (where) is either:
• queue_name
• alias
job is either:
• job script name
• input file name for automatically detected jobs resources are required resources for the job, if not specified, running on 1 CPU is required
15 Practical introduction to supercomputing
Lesson 14
Resource specification (selected)
Source	i	Description
ncpus		total number of CPUs required
ngpus		total number of GPUs required
nnodes		number of computational nodes (WN)
mem		total amount of required memory (CPU), unit mb, gb
walltime		maximum job run time
workdir		type of working direktory on WN
place		method of occupying computing nodes
props		required properties of computational nodes
15 Practical introduction to supercomputing
Lesson 14
-8-
Monitoring progress of job
You can use command pinfo to monitor the progress of the job which is run either in the job directory or in the working directory on the computing node. Other options are commands pjobs and pqstat.
If the job is running on a computing node, you can use the command pgo which logs the user on to the computing node and changes the current directory to the job working directory.
pgo j ob id
without argument
pgo
Computing node (WN) working/directory/
Monitoring the task in the terminal.
15 Practical introduction to supercomputing
Lesson 14
Service files
In the job directory, service files are created when the job is submitted into the batch system, during the life of the job and after its completion. Their meaning is as follows:
• *.info        control file with information about the progress of the task
• *.infex       custom script (wrapper), which is run by the batch system
• *.infout     standard runtime output of *.infex script, must be analyzed when the
task terminates abnormally
• *.nodes     list of nodes reserved for the job
• *.mpinodes list of nodes reserved for the job in format for OpenMP
• *.gpus       list of GPU cards reserved for the job
• *.key        unique job identifier
• *.stdout    standard output from running a job script
15 Practical introduction to supercomputing
Lesson 14
-10-
Data synchronization
Default operating mode
Source Meaning
workdir=scratch-local     Data is copied from the job input directory to the working
directory on the computing node. The working directory is created at the beginning of the job by the batch system. When the job is completed, all data from the working directory is copied back to the job input directory. Eventually, the working directory will be deleted if the data transfer was successful.
ser Interface (Ul) (Frontend) job/input/dir
dataout=copy-master rsync
datain=copy-master rsync
omputational Node #1 Worker Node (WN) /scratch/job_id/
15 Practical introduction to supercomputing
Lesson 14
Data synchronization, cover.
Suitable for analysis
Source	Meaning
workdir=jobdir	Job data is on shared storage.
15 Practical introduction to supercomputing
Lesson 14
-12-
Running applications
15 Practical introduction to supercomputing
Lesson 14
-13-
Request/use of resources
Native batch system (PBSPro)
user specifies required computing •  user must ensure that the job uses
the assigned computing resources
resources
Infinity
user specifies required computing resources
Infinity environment will ensure
correct starting of the job (selected applications only)
(other tasks) user must ensure that the job uses the assigned computing resources
15 Practical introduction to supercomputing
Lesson 14
-14-
pmemd
pmemd is a program for molecular dynamics. More detailed information can be found here: http://ambermd.org
Script for CPU run of the application:
#!/bin/bash	
# activate module amber	containing
# application pmemd	
module add amber	
# running application	
pmemd -0 -i prod.in -p	6000.parm7 \
-c	6000.rst7
15 Practical introduction to supercomputing
Lesson 14
-15-
pmemd - parallel run
When running in parallel, only entry of resources in the psubmit command changes. Nothing else changes! (input data and job script remain the same).
$ psubmit default rum.sh ncpus=l
ft
can be omitted
*.stdout
Module build:  amber:16.0:x86 64:single
Computational node:
$ psubmit default run.sh ncpus=2
*.stdout
Module build:  amber:16.0:x86 64:para
Computational node:
1 S   %CPU %MEM        TIME+ COMMAND
R   99.7   0.2    0:06.41 pmemd
S    0.3   0.0    0:00.01 sshd R    8.3   6.0     0:00.09 top
15 Practical introduction to supercomputing
Lesson 14
-16-
Exercise
15 Practical introduction to supercomputing
Lesson 14
-17-
Exercise 1
Job input data is on the WOLF cluster in the directory: /home/kulhanek/Documents/C2115/data/chitin/cpu
1. The goal of the exercise is to determine how well the pmemd application scales in the range of the number of CPUs, which are multiples of two. Determine the actual and theoretical length of the calculation, the real acceleration, and the real CPU usage as a percentage. Plot the real acceleration as a function of the number of CPUs. Compare the found curve with the curve for ideal scaling.
2. Enter tasks using the Infinity environment with variable quantity for ncpus. Run each test in a separate directory. Regardless of the number ncpus always request the whole node (place=excl) and use the same computing node (props=vnode=wolf30).
How to submit a job:
$ psubmit default run.sh ncpus=8 place=excl props=vnode=wolf30
See notes on the following page
15 Practical introduction to supercomputing
Lesson 14
pmemd
Simulation length:
The length of the simulation (calculation) is determined by the keyword (nstlim) specified in the prod.in file, which specifies the number of integration steps. Select the size of nstlim so that the job run time is about 60 minutes using 1 CPU.
The result of the simulation are the files:
mdout
mdinfo <- contains statistical information, e.g., how much ns per day is the
program able to simulate
mdcrd restrt
15 Practical introduction to supercomputing
Lesson 14
-19-