C2115 Practical introduction to supercomputing -1-Lesson 3
C2115
Practical introduction to
supercomputing
Lesson 3
Petr Kulhánek
kulhanek@chemi.muni.cz
National Centre for Biomolecular Research, Faculty of Science
Masaryk University, Kamenice 5, CZ-62500 Brno
Revision 2
C2115 Practical introduction to supercomputing -2-Lesson 3
Content
➢ Architecture of clusters and (super)computers
➢ Front nodes
➢ Computational nodes and elements
➢ Data storage
➢ Network infrastructure
C2115 Practical introduction to supercomputing -3-Lesson 3
Key parts
User Interface
(UI) # 1
User Interface
(UI) #X
Computing
Element (CE) # 1
Work Node
(WN) # 1
Work Node
(WN) # 2
Work Node
(WN) #Z
…
…
…
Storage Element
(SE) # 1
Storage Element
(SE) #M
users
interconnect (IN) # 1
Interconnect (IN) #Y
interconnect
Computing
Element (CE) #N
Work Node
(WN) # 1
Work Node
(WN) # 2
Work Node
(WN) #Z
…
interconnect
C2115 Practical introduction to supercomputing -4-Lesson 3
Frontend (UI)
Frontend (front-end node, user interface) is a computer
dedicated for direct interaction with the user.
The user can use it to prepare input data for
tasks, submit jobs into the batch system, for
management of jobs and manipulating job
results (visualization).
The front node, unless explicitly allowed, should not be used to run CPU and memory
intensive tasks, potential pre-processing or post-processing of job data must be entered as
separate jobs into the batch system.
In small compute clusters, the front node is often also a compute node.
A cluster or supercomputer usually handles the tasks of many users, which requires only
remote access to the front node. (Additionally, the front node is typically physically present
in the server room, where there is relatively a lot of noise.)
C2115 Practical introduction to supercomputing -5-Lesson 3
Computationally intensive tasks are NOT run on the UI!!!
Remote access - front node (UI)
Front node (frontend) is a computer that interacts with the supercomputer.
• UI usually only offers a command line (CLI - command line interface) using a secure ssh
connection (secure shell).
• If a graphical interface is available (GUI - graphical interface) it is more appropriate to
use a remote desktop (VNC - virtual network connection) than export of X11 display.
skirit.ics.muni.cz
ssh skirit.ics.muni.cz
login1
ssh salomon.it4i.cz
login2
loginN
loadbalancerloadbalancer
sshlogin2
direct access indirect access
C2115 Practical introduction to supercomputing -6-Lesson 3
Computational element (CE)
Computational element (computational element - CE), very often also referred to as cluster,
is a grouping of computational nodes most often with the same architecture (homogeneous
cluster). These nodes are usually connected by a very fast local network (Ethernet 1 Gbs,
Infiniband, or a proprietary solution).
Direct use of CE and WN is prohibited.
Jobs must be submitted to the batch system.
The computing node or the computer node are
synonyms for the work node (WN - worker
node).
C2115 Practical introduction to supercomputing -7-Lesson 3
Computing node (WN)
Computing node (worker node - WN, computational node) is a unit that acts as a standalone
computer that is dedicated to solving user jobs. A node can handle several
jobs at the same time. One job can use multiple compute nodes. However, the
number of jobs should not exceed the computing resources (CPU, RAM, HDD)
that this node provides. The efficient use of computing resources is taken care
of by the OS (operating system) in conjunction with batch system.
The task can also run on nodes that are in different CEs. However, this is only suitable for a
special type of jobs.
WN # 1
job # 1job # 1
job # 2job # 2
WN #2 WN #3
job #3job #3
CE#1
Possible organization of tasks on WN and CE:
C2115 Practical introduction to supercomputing -8-Lesson 3
Access to WN - monitoring only
direct access indirect access
MetaCentrum,
all NCBR/CEITEC MU clusters IT4I
This approach can only be used to monitor running tasks.
It MUST NOT be used to run jobs on their own
Access is restricted to nodes
where the user is running jobs.
C2115 Practical introduction to supercomputing -9-Lesson 3
Computational element / node
LEX cluster
(computational element)
Infiniband Controler
computing nodes
rack
C2115 Practical introduction to supercomputing -10-Lesson 3
Computing node
1U height = 1.75 inches (44.45 mm)
twin - two computers in one chassis
one computer in one chassis
examples of typical compute nodes that are used in "cheap" clusters
- in supercomputers, a proprietary solution is mostly used
C2115 Practical introduction to supercomputing -11-Lesson 3
Computing node
disks - local data storage
processors memory
cooling
C2115 Practical introduction to supercomputing -12-Lesson 3
Computing node
SGI UV2000
pip
one computing node
192 CPU cores, 4 TB memory
blades
bus
disk array - local data
storage
C2115 Practical introduction to supercomputing -13-Lesson 3
Computing node - accelerators
NVidia Tesla K20 (GPGPU)
Intel Xeon Phi (BALL)
The computing power of the
accelerators may exceed the
performance of the installed
CPUs on the computing node.
C2115 Practical introduction to supercomputing -14-Lesson 3
Typical computer scheme
CPU
north
bridge
north
bridge
south
bridge
south
bridge
USB
mouse, keyboard
real clock
time
SATA controllers
hard drives
BIOS
graphics
system
memory
memory controller
peripherals with quick access via
PCI Express (accelerators)
Network (ethernet)sound
PCI bus
C2115 Practical introduction to supercomputing -15-Lesson 3
Multiprocessor nodes
Nowadays, compute nodes contain multiple
physical processors (minimum two), each
containing multiple computing CPU cores. The
RAM is then usually accessible at different speeds
(NUMA Non-Uniform Memory Architecture).
The reason for this arrangement is increasing
computing power, which, however, brings
increased demands on the preparation and
execution of computational tasks.
C2115 Practical introduction to supercomputing -16-Lesson 3
Data storage (SE) - partitioning
Types of data storage (SE - storage element) and their use:
➢ local data storage - temporary job data
➢ (remote) data storage (disk array) - live data of jobs or solved projects
➢ hierarchical data storage - completed projects and backups
local data storage
C2115 Practical introduction to supercomputing -17-Lesson 3
Local data storage
▪ Disk array connected locally to the compute node.
▪ HDD - hard disk (Hard Disk Drive) is a device used in computers to permanently store
large amounts of data by magnetic induction.
▪ SSD - Solid-State Drive, in information technology, is a type of data medium which,
unlike magnetic hard disks, does not contain moving mechanical parts and has a much
lower power consumption.
wikipedia.org
Local temporary storage (scratch directories) are intended for currently running tasks on
the compute node.
These directories MUST NOT* be used for longterm
data storage.
*) Of course you can, but then don't be surprised that one day you won't find them,
because the administrator, or another intelligent tool, has cleaned the storage
C2115 Practical introduction to supercomputing -18-Lesson 3
Disk array
file servers accessing disk array
data remotely via NFS
(Network File System) protocol
RAID6 disk array
RAID6 disk array
RAID 0
large amount of HDDs
brno9-ceitec 269 TiB
Disk arrays are suitable for
currently solved projects.
C2115 Practical introduction to supercomputing -19-Lesson 3
Disk array - data protection
RAID (Redundant Array of Inexpensive/ Independent Disks), in informatics, is a method of
securing data against hard disk failure. Security is realized by specific storage of data on
multiple independent disks, where the stored data is retained even if one of them fails.
The level of security varies depending on the type of RAID selected, which is indicated by
numbers (most often RAID 0, RAID 1, RAID 5, or more recently RAID 6).
When part of the disk array is damaged, the array runs in degraded mode, where another
failure would be irreparable. Therefore, the so-called spare discs are used immediately as a
replacement for damaged ones. Speed of ata access may be reduced during rebuilding
disk array (new data parity calculation).
wikipedia.org
Arrays contain a large number of HDDs, which are mechanical components that are prone
to failure. To reduce data corruption, data is most often protected using RAID technology.
C2115 Practical introduction to supercomputing -20-Lesson 3
Hierarchical data storage
Hierarchical storage management (HSM) is a data storage technique, which automatically
moves data between high-cost and low-cost storage media. While it would be ideal to have
all data available on high-speed devices all the time, this is prohibitively expensive for
many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower
devices, and then copy data to faster disk drives when needed. The HSM system monitors
the way data is used and makes best guesses as to which data can safely be moved to
slower devices and which data should stay on the fast devices.
wikipedia.org
brno10-ceitec-hsm
(tape robot)
HSM repositories are
suitable for archiving and
backing up data.
C2115 Practical introduction to supercomputing -21-Lesson 3
Units
original marking
wikipedia.org
C2115 Practical introduction to supercomputing -22-Lesson 3
Network infrastructure
Ethernet is the name of a set of technologies for local area networks (LANs) that use cables
with twisted double line, optical cables for communication at transmission speeds from 10
Mbit/s to 100 Gbit/s.
InfiniBand (abbreviated IB), a computer-networking communications standard used in
high-performance computing, features very high throughput and very low latency. It is
used for data interconnect both among and within computers. InfiniBand is also utilized as
either a direct, or switched interconnect between servers and storage systems, as well as
an interconnect between storage systems.
wikipedia.org
Infiniband is suitable for data-intensive parallel tasks that use multiple compute nodes.
C2115 Practical introduction to supercomputing -23-Lesson 3
Batch system
Batch processing is the execution of a series of programs (so-called batches) on a
computer without the participation of the user. Batches are prepared in advance so that
they can be processed and handed over without the user's participation. All input data is
prepared in advance in files (scripts) or entered using parameters on the command line.
Batch processing is the opposite of interactive processing, where the user provides the
required inputs only while the program is running.
Advantages of batch processing
▪ sharing computer resources between many users and programs
▪ postponing batch processing until the computer is less busy
▪ Eliminate delays caused by waiting for user input
▪ maximizing computer utilization improves investment utilization (especially for more
expensive computers)
wikipedia.org
Our local clusters, MetaCentrum: PBSPro
IT4I: PBSPro
PBSPro is derived from OpenPBS.
C2115 Practical introduction to supercomputing -24-Lesson 3
Exercises 1
1. What is the name of your workstation (computer) on the WOLF cluster?
2. What is the role of this computer within the WOLF cluster?
3. Find out the names of the front nodes of the MetaCentrum virtual organization from
the documentation.
4. Verify that you can log ib to one of the front nodes in MetaCentrum.
5. How many hard disks can fail in a disk group that is protected by RAID6?
6. Can RAID0 be used for data protection?
7. What is the combination of RAID6 and RAID0?
8. What type of accelerator is used in a supercomputer salomon (IT4I)?