C7790 Introduction to Molecular Modelling -1Lesson 10 Structure Petr Kulhánek kulhanek@chemi.muni.cz National Centre for Biomolecular Research, Faculty of Science Masaryk University, Kamenice 5, CZ-62500 Brno JS/2022 Present Form of Teaching: Rev2 C7790 Introduction to Molecular Modelling TSM Modelling Molecular Structures C9087 Computational Chemistry for Structural Biology C7790 Introduction to Molecular Modelling -2- Context microworldmacroworld equilibrium (equilibrium constant) kinetics (rate constant) free energy (Gibbs/Helmholtz) partition function phenomenological thermodynamics statistical thermodynamics microstates (mechanical properties, E) states (thermodynamic properties, G, T,…) microstate ≠ microworld Description levels (model chemistry): • quantum mechanics • semiempirical methods • ab initio methods • post-HF methods • DFT methods • molecular mechanics • coarse-grained mechanics Structure EnergyFunction Simulations: • molecular dynamics • Monte Carlo simulations • docking • … C7790 Introduction to Molecular Modelling -3- Structure C7790 Introduction to Molecular Modelling -4Configuration Space )(RE R = point in 3N dimensional space (N is the number of atoms) },,,....,,,,,,{ 222111 NNN zyxzyxzyx=R Cartesian coordinates of the first atom Every point in the configuration space represents a unique structure of the system. Cartesian coordinates of the last atom C7790 Introduction to Molecular Modelling -5Models - Small Molecules line model tube model CPK model vdW model same structure other visualization C7790 Introduction to Molecular Modelling -6Models - Biomolecules line model line model with the backbone of protein Cartoon model surface of the biomolecule same structure other visualization Different models are used to highlight certain structural information or internal properties of a molecule or group of molecules, which facilitates an easier understanding of the studied problem. C7790 Introduction to Molecular Modelling -7Coarse-grained Models C7790 Introduction to Molecular Modelling -8Computer Representation of Models The structure can be represented in various ways. More than 100 formats are used in chemistry. They are either text or binary files. The format describes: ➢ the geometry of the system ➢ the names of atoms ➢ groups of atoms ➢ connectivity between atoms (bonds) ➢ and other information The system geometry is usually provided as: ➢ Cartesian coordinates ➢ internal coordinates ➢ variants of internal coordinates C7790 Introduction to Molecular Modelling -9Cartesian vs Internal Coordinates O H 1 0.974298 O 1 1.454349 2 96.868054 H 3 0.974298 1 96.868054 2 239.552651 Cartesian coordinates Internal coordinates (Z-matrix) bond length bond angle torsion angle 3N 3N-6 3N-5 Number of degrees of freedom: Number of degrees of freedom: (linear diatomic molecule) x y z O -0.180077 -0.046023 -0.062789 H 0.196208 -0.747659 0.498793 O 0.006537 1.047922 0.877207 H -0.931885 1.299156 0.951390 C7790 Introduction to Molecular Modelling -10Internal Coordinates 1 O 2 H 1 0.974298 3 O 1 1.454349 2 96.868054 4 H 3 0.974298 1 96.868054 2 239.552651 2-1 4-3 4-3-1 3-1-2 4-3-1-2 bond length (a) bond angle (b) torsion angle (c) http://www.ccl.net/cca/documents/molecular-modeling/node4.html C7790 Introduction to Molecular Modelling -11XYZ format number of atoms comment element xyz element xyz ................... element xyz The xyz format is a free-formatting text file (values ​​in columns can be separated by any number of spaces or other whitespace). The format only describes the geometry of the system. It does not contain information about bonds in the system. A program that works with the format must calculate this information (e.g., using atomic radii). positions are in angstroms (Å) 24 chorismate C -1.86100 -0.57700 0.31800 O -2.56800 0.47600 0.32600 O -2.20900 -1.75300 0.64200 C -0.38900 -0.41000 -0.18800 ................................................ H -0.50900 1.67900 -0.44800 C7790 Introduction to Molecular Modelling -12- .................................................................. ATOM 7 CB SER 1 5.814 16.335 8.213 1.00 0.00 ATOM 8 HB2 SER 1 6.870 16.427 7.958 1.00 0.00 ATOM 9 HB3 SER 1 5.610 16.900 9.123 1.00 0.00 ATOM 10 OG SER 1 5.491 14.946 8.427 1.00 0.00 ATOM 11 HG SER 1 6.026 14.600 9.145 1.00 0.00 ATOM 12 C SER 1 3.604 16.323 6.927 1.00 0.00 ATOM 13 O SER 1 2.605 16.742 7.521 1.00 0.00 ATOM 14 N GLN 2 3.567 15.251 6.134 1.00 0.00 ATOM 15 H GLN 2 4.401 14.914 5.675 1.00 0.00 .................................................................. PDB format The pdb format is employed to store the structures of biomolecules and their complexes. It is widely used but it has several limitations. Therefore, it is slowly substituted by more advanced formats such as PDBx/mmCIF and others. keyword atom number atom name Cartesian coordinates of atoms residue number residue name The pdb format does not usually contain information about bods in the system. The program that works with the format must calculate this information (based on template structures). For non-standard residues, the CONECT keyword can be used. in angstroms (Å) C7790 Introduction to Molecular Modelling -13Djungle of formats I acr -- ACR format adf -- ADF cartesian input format adfout -- ADF output format alc -- Alchemy format arc -- Accelrys/MSI Biosym/Insight II CAR format bgf -- MSI BGF format box -- Dock 3.5 Box format bs -- Ball and Stick format c3d1 -- Chem3D Cartesian 1 format c3d2 -- Chem3D Cartesian 2 format cac -- CAChe MolStruct format caccrt -- Cacao Cartesian format cache -- CAChe MolStruct format cacint -- Cacao Internal format can -- Canonical SMILES format. car -- Accelrys/MSI Biosym/Insight II CAR format ccc -- CCC format cdx -- ChemDraw binary format cdxml -- ChemDraw CDXML format cht -- Chemtool format cif -- Crystallographic Information File ck -- ChemKin format cml -- Chemical Markup Language cmlr -- CML Reaction format com -- Gaussian 98/03 Input copy -- Copies raw text crk2d -- Chemical Resource Kit diagram(2D) crk3d -- Chemical Resource Kit 3D format csr -- Accelrys/MSI Quanta CSR format cssr -- CSD CSSR format ct -- ChemDraw Connection Table format cub -- OpenDX cube format for APBS cube -- OpenDX cube format for APBS dmol -- DMol3 coordinates format dx -- OpenDX cube format for APBS ent -- Protein Data Bank format fa -- FASTA format fasta -- FASTA format fch -- Gaussian formatted checkpoint file format fchk -- Gaussian formatted checkpoint file format fck -- Gaussian formatted checkpoint file format feat -- Feature format fh -- Fenske-Hall Z-Matrix format fix -- SMILES FIX format fpt -- Fingerprint format fract -- Free Form Fractional format fs -- FastSearching fsa -- FASTA format g03 -- Gaussian98/03 Output g92 -- Gaussian98/03 Output g94 -- Gaussian98/03 Output g98 -- Gaussian98/03 Output gal -- Gaussian98/03 Output gam -- GAMESS Output gamin -- GAMESS Input gamout -- GAMESS Output C7790 Introduction to Molecular Modelling -14Djungle of formats II gau -- Gaussian 98/03 Input gjc -- Gaussian 98/03 Input gjf -- Gaussian 98/03 Input gpr -- Ghemical format gr96 -- GROMOS96 format gukin -- GAMESS-UK Input gukout -- GAMESS-UK Output gzmat -- Gaussian Z-Matrix Input hin -- HyperChem HIN format inchi -- InChI format inp -- GAMESS Input ins -- ShelX format jin -- Jaguar input format jout -- Jaguar output format k -- Compare molecules using InChI mcdl -- MCDL format mcif -- Macromolecular Crystallographic Information mdl -- MDL MOL format ml2 -- Sybyl Mol2 format mmcif -- Macromolecular Crystallographic Information mmd -- MacroModel format mmod -- MacroModel format mol -- MDL MOL format mol2 -- Sybyl Mol2 format molden -- Molden input format molreport -- Open Babel molecule report moo -- MOPAC Output format mop -- MOPAC Cartesian format mopcrt -- MOPAC Cartesian format mopin -- MOPAC Internal mopout -- MOPAC Output format mpc -- MOPAC Cartesian format mpd -- Sybyl descriptor format mpqc -- MPQC output format mpqcin -- MPQC simplified input format msi -- Accelrys/MSI Cerius II MSI format msms -- M.F. Sanner's MSMS input format nw -- NWChem input format nwo -- NWChem output format outmol -- DMol3 coordinates format pc -- PubChem format pcm -- PCModel Format pdb -- Protein Data Bank format png -- PNG files with embedded data pov -- POV-Ray input format pqr -- PQR format pqs -- Parallel Quantum Solutions format prep -- Amber Prep format qcin -- Q-Chem input format qcout -- Q-Chem output format report -- Open Babel report format res -- ShelX format rsmi -- Reaction SMILES format rxn -- MDL RXN format sd -- MDL MOL format sdf -- MDL MOL format C7790 Introduction to Molecular Modelling -15Djungle of formats III The formats usually contain, in addition to the 3D/2D structure, also accompanying information such as connectivity, force field parameters, atomic partial charges, various properties, etc. http://openbabel.org/wiki/Main_Page OpenBabel is a chemical toolbox designed to speak the many languages ​​of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas. smi -- SMILES format smiles -- SMILES format sy2 -- Sybyl Mol2 format t41 -- ADF TAPE41 format tdd -- Thermo format test -- Test format therm -- Thermo format tmol -- TurboMole Coordinate format txt -- Title format txyz -- Tinker MM2 format unixyz -- UniChem XYZ format vmol -- ViewMol format xed -- XED format xml -- General XML format xtc -- XTC format xyz -- XYZ cartesian coordinates format yob -- YASARA.org YOB format zin -- ZINDO input format How to convert? C7790 Introduction to Molecular Modelling -16Software for visualizations In addition to molecular modelling software (Avogadro, Nemesis, etc.), there are special software serving only for visualizing structures and results. Visual Molecular Dynamics (VMD) PyMOL https://www.ks.uiuc.edu/Research/vmd https://en.wikipedia.org/wiki/PyMOL ➢ scriptable (TCL, Python) ➢ advanced rendering ➢ available for MS Windows, Linux, macOS https://en.wikipedia.org/wiki/List_of_molecular_graphics_systems Overview of software: C7790 Introduction to Molecular Modelling -17- Summary ➢ Structures (Models) can be visualized in different ways. ➢ Visualization type is typically based on the intended description of studied phenomenon/property. ➢ Geometry can be represented in Cartesian and/or internal coordinates. ➢ Computational chemistry (molecular modelling) employs huge number of formats describing models, which complicates interoperability between different software. Homework: Is there a principal advantage of using Cartesian or internal coordinates?