C2115 Practical Introduction to Supercomputing 7th Lesson -1C2115 Practical Introduction to Supercomputing Petr Kulhánek, Jakub Štěpán kulhanek@chemi.muni.cz National Centre for Biomolecular Research, Faculty of Science Masaryk University, Kotlářská 2, CZ-61137 Brno CZ.1.07/2.2.00/15.0233 7th Lesson C2115 Practical Introduction to Supercomputing 7th Lesson -2- Contents  Exercise LV.2 results, discussion  Computer architecture limiting factors, application types and relation to limiting factors  Exercise network transfer measurements  Batch systems definition, overview C2115 Practical Introduction to Supercomputing 7th Lesson -3Exercise LV.2  Results  Discussion C2115 Practical Introduction to Supercomputing 7th Lesson -4- Results wolf01, 4 CPU, Intel(R) Xeon(R) CPU X3460 @ 2.80GHz, L1: 32kB, L2: 256kB, L3: 8192kB load_cpu Number of concurrent processes Real runtime [s] Theoretical runtime [s] Overhead [%] 1 20.15 20.15 4 30.20 20.15 49.9 8 61.67 40.30 53.0 12 94.12 60.45 55.7 16 126.23 80.60 56.6 20 159.87 100.75 58.7 24 191.64 120.90 58.5 100100  theory real t t overhead How many % slower is run to theoretical ideal. C2115 Practical Introduction to Supercomputing 7th Lesson -5- Results wolf01, 4 CPU, Intel(R) Xeon(R) CPU X3460 @ 2.80GHz, L1: 32kB, L2: 256kB, L3: 8192kB load_cpu Number of concurrent processes Real runtime [s] Theoretical runtime [s] Overhead [%] 1 20.15 20.15 4 30.20 20.15 49.9 8 61.67 40.30 53.0 12 94.12 60.45 55.7 16 126.23 80.60 56.6 20 159.87 100.75 58.7 24 191.64 120.90 58.5 Overheadgrows 100100  theory real t t overhead How many % slower is run to theoretical ideal. C2115 Practical Introduction to Supercomputing 7th Lesson -6- Results wolf01, 4 CPU, Intel(R) Xeon(R) CPU X3460 @ 2.80GHz, L1: 32kB, L2: 256kB, L3: 8192kB load_cpu Number of concurrent processes Real runtime [s] Theoretical runtime [s] Overhead [%] 1 20.15 20.15 4 30.20 20.15 49.9 8 61.67 40.30 53.0 12 94.12 60.45 55.7 16 126.23 80.60 56.6 20 159.87 100.75 58.7 24 191.64 120.90 58.5 Find reason for high start overheads 100100  theory real t t overhead How many % slower is run to theoretical ideal. Overheadgrows C2115 Practical Introduction to Supercomputing 7th Lesson -7Computer architecture  Limiting factors  Application types and their relation to limiting factors C2115 Practical Introduction to Supercomputing 7th Lesson -8Architecture, general view CPU North bridge South bridge Memory SATA controller Hard drives Network (ethernet) C2115 Practical Introduction to Supercomputing 7th Lesson -9Architecture, limiting factorsfaktory CPU North bridge South bridge Memory Network (ethernet) SATA controller Hard drives cache Fastest component is CPU Other components are slower RAM ~10 GB/s SATA disc SATA III: 600 MB/s Network 10/100/1000 Mb/s Limiting factors C2115 Practical Introduction to Supercomputing 7th Lesson -10Architecture, limiting factors CPU North bridge South bridge Memory Network (ethernet) SATA controller Hard drives cache Fastest component is CPU Other components are slower RAM ~10 GB/s SATA disc SATA III: 600 MB/s Network 10/100/1000 Mb/s Limiting factors High latencies C2115 Practical Introduction to Supercomputing 7th Lesson -11- 1. Use command wget to download install image of Ubuntu Server 12.04.1 LTS 2. State transfer rate for different number of concurrent downloading processes in teams. What is the limiting factor for the transfer? Exercise VI.1 $ wget http://www.ubuntu.com/start-download?distro=server&bits=64&release=lts C2115 Practical Introduction to Supercomputing 7th Lesson -12Batch systems  Definition  Overview C2115 Practical Introduction to Supercomputing 7th Lesson -13Batch processing Batch processing is running of series of programs (so called batches) on computer with no user interaction. Batches are prepared in advance and submitted for processing without user interaction. All input data are prepared in advance in files (scripts) or given in command line arguments. Batch processing is opposite to interactive processing when user gives input data during actual program run. Batch processing advantages  resource sharing among many users and programs  start of batch run when computer has enough resources (low load)  remove of user input delays  maximizing computer usage improves computer investments utilization (expensive machines) Source: www.wikipedia.cz, adjusted C2115 Practical Introduction to Supercomputing 7th Lesson -14Batch system tools OpenPBS http://www.mcs.anl.gov/research/projects/openpbs/ PBSPro http://www.pbsworks.com Oracle Grid Engine http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html Open Grid Scheduler http://gridscheduler.sourceforge.net/ Torque http://www.adaptivecomputing.com/products/open-source/torque/ C2115 Practical Introduction to Supercomputing 7th Lesson -15Batch system tools OpenPBS http://www.mcs.anl.gov/research/projects/openpbs/ PBSPro http://www.pbsworks.com Oracle Grid Engine http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html Open Grid Scheduler http://gridscheduler.sourceforge.net/ Torque http://www.adaptivecomputing.com/products/open-source/torque/ open source C2115 Practical Introduction to Supercomputing 7th Lesson -16Batch system tools OpenPBS http://www.mcs.anl.gov/research/projects/openpbs/ PBSPro http://www.pbsworks.com Oracle Grid Engine http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html Open Grid Scheduler http://gridscheduler.sourceforge.net/ Torque http://www.adaptivecomputing.com/products/open-source/torque/ open source Used as batch system in MetaCentrum VO, and on clusters SOKAR and WOLF