Revision 3 C2115 Practical introduction to supercomputing Lesson 2 Petr Kulhánek kulhanek@chemi.muni.cz National Centre for Biomolecular Research, Faculty of Science Masaryk University, Kamenice 5, CZ-62500 Brno 15 Practical introduction to supercomputing Lesson 2 Content > Computational Chemistry Group overview of solved projects > Model problems and systems for exercises matrix multiplication, numerical integration, QM and MD calculations 15 Practical introduction to supercomputing Lesson 2 -2- Computational chemistry group overview of solved projects group leader: prof. RNDr. Jaroslav Koča, DrSc. 15 Practical introduction to supercomputing Lesson 2 -3- Computational chemistry Oft iWfc WtfcU AT* etfOtHij 4Ue 5&i** http://www. ninger. co m/im ages/com p. jpg 15 Practical introduction to supercomputing Lesson 2 Computational chemistry Computational chemistry (computer chemistry) Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into efficient computer programs, to calculate the structures and properties of molecules and solids. While computational results normally complement the information obtained by chemical experiments, it can in some cases predict hitherto unobserved chemical phenomena. It is widely used in the design of new drugs and materials. www.wikipedia.org 15 Practical introduction to supercomputing Lesson 2 -5- Nobel Prize in Chemistry 1998/201 Walter Kohn John A, Pople N2-> N1 N5 N6 N7 N8 N9 N10 HVk (r) = EWk (r) HF CI methods HF, DFT CISD CISDT CISDTQ Methods MP methods MP2 MP3, MP4 (SDQ) MP4 MP5 MP6 MP7 CC methods CC2 (iterative) CCSD (iterative) CCSD (T), CC3 (iterative) CCSDT CCSDTQ (iterative) Scaling, time demands: http://en.wikipedia.org/wiki/Time_complexity HF - Hartree - Fock method, DFT - theory functional density, CI - methods of configuration interaction, MP - M0llerova-Plesset perturbation theory, CC - method of bound clusters, N - number of basis functions Jensen, F. Introduction to computational chemistry; 2nd ed .; John Wiley & Sons:Chichester, England; Hoboken, NJ, 2007. 15 Practical introduction to supercomputing Lesson 2 Quantum chemical calculations 15 Practical introduction to supercomputing Lesson 2 -13- Molecular mechanics Schrodinger equation => quantum mechanical view HJFir) approximations using classical physics no explicit electron motion is considered (movement is implicitly included in empirical parameters) f(r)=f + f +f +f +f n VV nbonds ^angles ^torsions _el ^vdw Classical physics => mechanical view covalent contributions non-covalej^fcontributions Formal scaling: N2 -> N log2N N - number of atoms 15 Practical introduction to supercomputing Lesson 2 Moleculer dynamics 6E(R) dR = F Fi = mi*i a = d\ 2nd Newton's law of motion (law of force) dE(R) dR = m d2r. V system of second order differential equations requires a numerical solution discretization of molecular motion in short time intervals given by the fastest movement (vibration of bonds) 1 f S typical integration step Imperfections in integration are removed by use of thermostats and barostats, which also provide the required simulation conditions. 15 Practical introduction to supercomputing Lesson 2 Repair of damaged DNA DNA is exposed to a number of factors that damage it. To avoid degradation of genetic information, damaged DNA is repaired by a number of mechanisms that work with different efficiencies. The aim of the project is to understand the method of detecting damage at the molecular level with a primary focus on the mechanical properties of damaged DNA. UV light Ionization Chemical Cellular Replication exposure radiation exposure metabolism errors i i i i i pathology lesion apo (Z ra . u i_ m ■u - c ■n o E C) QJ ho CH 8' * ty' ■ aj i ! f lesion ► -Q o- QJ i C i o u I cr I OJ ^ —q a-—d n- -a o- c '53 4-1 O Bouchal, T .; Durník, I .; Illík, V .; Réblová, K .; Kulhánek, P. Importance of Base-Pair Opening for Mismatch Recognition. Nucleic Acids Res. 2020. https://doi.org/10.1093/nar/gkaa896. 15 Practical introduction to supercomputing Lesson 2 Gly cosy I transferases Glycosytransferases are enzymes that catalyze the transfer of activated sugar moiety to oligosaccharides, proteins or other biomolecules. They are important in post-translational modification of proteins, regulation or creation of structural support. Mycobacterium tuberculosis (pathogenic bacteria) Motivation: inhibitor of important membrane component synthesis -> antibiotic capsule-like material mycolic acid layer B o n c >■ o ce membrane □ AAurNac/Gc o Galf OGIcNAc □ Ara L-Rhamnose Clostridium difficile (pathogenic bacteria) Motivation: inhibitor of glycosyl-transferase toxin activity -> antidote cell death 15 Practical introduction to supercomputing Lesson 2 Example ppGalNAcT2 (QM/MM) HO ,0H HO O NH?1 O O oi-p-o-p-o 0 0 N -acety I g a I atotosa m i n + _ °A H "SerVThr OH OH 0 x 0 II 0 11 ( 1 H 01 -p-1 o-p-o-1 1 _ 0 0" HO OH OH + HO^ ^-^Cl NH 0^ NH ^Ser/Thr Supervisors or consultants: > prof. RNDr. Jaroslav Koča, DrSc. (Computational Chemistry - Center for Structural Biology - Central European Institute of Technology) > Mgr. Stanislav Kozmon, Ph.D. (Institute of Chemistry, Slovak Academy of Sciences) > Ing. Igor Tvaroška, DrSc. (Institute of Chemistry, Slovak Academy of Sciences) Janoš, P.; Trnka, T; Kozmon, S.; Tvaroška, L; Koča, J. Different QM/MM Approaches To Elucidate Enzymatic Reactions: Case Study on ppGalNAcT2. J. Chem. Theory Comput. 2016,12 (12), 6062-6076. https://doi.org/10.1021/acs.ictc.6b00531. 15 Practical introduction to supercomputing Lesson 2 počáteční stav 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 -21- 15 Practical introduction to supercomputing Lesson 2 -22- 15 Practical introduction to supercomputing Lesson 2 -23- 15 Practical introduction to supercomputing Lesson 2 -24- 15 Practical introduction to supercomputing Lesson 2 -25- 1 15 Practical introduction to supercomputing Lesson 2 -26- 1 .01 1 15 Practical introduction to supercomputing Lesson 2 -27- 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 -30- 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 15 Practical introduction to supercomputing Lesson 2 Result 0 * LO 15 20 25 D i ID 15 20 point number 15 Practical introduction to supercomputing Lesson 2 -50- Specifics of methods Quantum mechanical methods: • computational complexity increases with the required accuracy of the calculation and the size of the studied model • these are computationally (CPU) as well as data (RAM) demanding calculations • acceleration using parallelization is possible, but usually does not scale well (scaling is not linear for very precise methods) • parallel run is more suitable on SMP nodes, it requires fast data connection of computing nodes, when run on clusters Molecular dynamic simulations (using molecular mechanics): • computational complexity increases with the size of the model and length of required sampling • Due to the low algorithmic complexity, calculations can be performed using GPGPU • creates a large amount of data (trajectories) • speeding up the calculation using parallel execution is easy • parallelization can be performed on several levels (calculation of forces, more walkers or replicas), for the last two cases it is possible to achieve linear scaling 15 Practical introduction to supercomputing Lesson 2 -51- Exercise 1 1. What does the time complexity O(N) determine? 2. How many times is calculation of potential energy of benzene molecule by quantum chemical method CCSD(T), if we change used base from aug-cc-pVDZ to aug-cc-pVTZ? The number of base functions is 192 for aug-cc-pVDZ and 414 for aug-cc-pVTZ. 3. If the potential energy calculation time using the CCSD(T)/aug-cc-pVDZ takes 5 hours, how long will be calculation using the CCSD(T)/aug-cc-pVTZ? 4. The enzyme-catalyzed first order reaction has a single rate determining step with activation Gibbs energy of 18 kcal/mol. What is the reaction half-life at 300 K? 5. How long would a molecular dynamic simulation of one enzyme-substrate complex from the previous task have to take to observe substrate transformation with 50% probability? 6. Determine the number of integration steps that need to be performed in simulation from task 5, assuming that integration step is 0.125 fs (QM / MM dynamics in CPMD). 7. Determine the machine time that would be required to perform simulation, assuming one integration step takes 5 seconds. Discuss the value. 8. Determine the machine time required to perform 1 |us long molecular dynamic simulation of a cellulose fragment within a water box with a total number of 408609 atoms on one GTX 1080 graphics card under NPT conditions? Use the data provided here for a solution: https://ambermd.org/GPUPerformance.php 15 Practical introduction to supercomputing Lesson 2 Model problems and systems 15 Practical introduction to supercomputing Lesson 2 -53- Matrix multiplication □□□□□□□ □□□□□□□ □□□□□□□ □□□□□□□ □□□□□□□ □□□□ □□□□ □□□□ □□□□ □□□□ □i □ □□□□ □□□□ □□□□ □□□□ □□□□ A(n, m) B(m, k) C(n, k) Use: • finding eigenvalues and vectors of square matrices (quantum chemistry) • solution of a system of linear equations (QSAR, QSPR) • transformation (displacement, rotation, scaling - display and graphics) Revision/self-study: • How is matrix multiplication done? • How many operations need to be performed? 15 Practical introduction to supercomputing Lesson 2 Numerical integration The calculation of certain integrals can be performed by numerical methods, which are used if: • the function cannot be integrated analytically • analytical integration is practically impossible (accuracy vs computational complexity) l + X -dx a certain integral is the area under the curve in the range of integration limits 15 Practical introduction to supercomputing Lesson 2 Numerical integration methods L = (y,+ yM) h 1i = yth h <-* trapezoidal method rectangular method 15 Practical introduction to supercomputing Lesson 2 Fulleren C https://en.wikipedia.org/wiki/Buckminsterfullerene Tasks: • creating a model of C60 molecule • geometry optimization • calculation of molecular vibrations Methods: • semiempirical quantum-chemical method PM6 15 Practical introduction to supercomputing Lesson 2 Chitin fibers HIM CM» HUM c o 4.. /* H C^O -Q "kO! 9—9' 1 H3 HJN I H c I I V . HE HC- Vh ■O-H building unit mechanical properties of chitin nanofibers 4400 6000 6600 6760 8998 Tasks: • MD simulation of 6000 fiber Strelcova, Z.; Kulhanek, P.; Friak, M.; Fabritius, H.-O.; Petrov, M.; Neugebauer, J.; Koca, J. The structure and dynamics of chitin nanofibrils in an aqueous environment revealed by molecular dynamics simulations. RSCAdv. 2016, 6 (36), 30710-30721 DOI: 10.1039/c6ra00107f 15 Practical introduction to supercomputing Lesson 2 Relationship with course C2115 Matrix multiplication: • limiting factors related to computer architecture (memory throughput) • optimized libraries for numerical calculations (BLAS, LAPACK, Intel MKL, AMD MCL) Numerical integration: • limiting factors related to computer architecture (rounding errors and their impact on the integration result) • parallelization of the calculation (OpenMP versus MPI) Fulleren C60: • running calculations in the program Gaussian • in MetaCentrum (PBSPro) • in the WOLF cluster (PBSPro and Infinity) Chitin fiber: • molecular dynamics simulations in pmemd • scaling CPU parallel implementation • CPU and GPU runtime comparison 15 Practical introduction to supercomputing Lesson 2 Exercise 2 Fulleren C60: 1. Build a 3D model of a fullerene C60 molecule and optimize it using the force field MMFF94. To build a 3D model, use a structure in SMILES format (wikipedia for C60). Save the resulting model in the format xyz. Use either avogadro or Nemesis program to build the model. Chitin fiber: Equilibrated the chitin fiber model can be found in the directory: /home/kulhanek/Documents/C2115/Lesson02/chitin system topology is 6000.parm7 coordinates, velocities and size of the box is in 6000.rst7 2. Display model in VMD. 3. How many atoms does the model contain? 4. How many fibers of chitin does the model contain? 5. What is the shape of the simulation box? 15 Practical introduction to supercomputing Lesson 2 [ Self-study_ 1. How is matrix multiplication done? 2. How many operations need to be performed when multiplying matrices? 3. What is the computational complexity of matrix multiplication? 4. Which numerical method integration is more accurate, rectangular or trapezoidal? 5. Find other methods of numerical integration. 6. Is it possible to calculate the indefinite integral by numerical integration? 15 Practical introduction to supercomputing Lesson 2 -61-