Validation of biomacromolecular structures - motivation Radka Svobodová Vařeková Validation: Why to validate? Structural biology community found that some published structures contained serious errors 2 Nightmare before Christmas Validation: Why to validate? Garbage in, garbage out Interesting articles: • Matthews, B. W. (2007) Five retracted structure reports: inverted or incorrect? Protein science : a publication of the Protein Society, 16, 1013–6. • Johnston, C. A., Kimple, A. J., Giguere, P. M., and Siderovski, D. P. (2008) RETRACTED: Structure of the Parathyroid Hormone Receptor C Terminus Bound to the G-Protein Dimer Gb1g2. Structure, 16, 1086–1094. 3 Can we still find wrong structures in PDB? 4 Can we still find wrong structures in PDB? Example: Nipah G attachment glycoprotein (PDB ID 3D12, PNAS) Contains 30 instances of 11 different carbohydrates, each with one ring and five chiral atoms. Results: • 13 of these ligands have incorrect chirality • In a few cases, all chiral atoms exhibit incorrect chirality 5 Can we still find wrong structures in PDB? 6 Unfortunately yes Validation approaches 7 What to validate? How? Software Geometry (3D) Against tabular values Proteins and nucl. acids: WHAT_CHECK, PROCHECK, PROCHECK-NMR, AQUA, MolProbity, OOPS Ligands: ValLigURL, Mogul, Coot, PHENIX Topology (2D) Against a template Proteins and nucl. acids: -||- Ligands: pdb-care, MotiveValidator, ValidatorDB 1.5 Å 120° 1.7 Å Compilation: PDB validation reports Improved data quality  Cleaned up mmCIF data  Standard vocabularies  Experimental details, binding sites, secondary structure, antibiotic/inhibitor information, nucleic-acid parameters  Clean mmCIF files are used in production Validation reports  Summary  Quality vs. all PDB  Quality vs. entries at similar resolution  Overview of residuebased quality for every polymer  Table of ligands that may need your attention Shortcut: pdbe.org/valrep/1cbs 10 Topology validation (= validation of annotation) Topology validation – basic terms 11 Residue  Any component of a biomacromolecule  Examples: amino acids, nucleotides, saccharides, ions, …  In PDB, residue is annotated via “Residue ID” – a unique 3-letter code FUC, alpha-L-fucosePHE, phenylalanine Topology validation – basic terms 12 Types of residues: Standard residues:  amino acids  nucleotides Non-standard residues:  modified amino acids and nucleotides Ligands:  Chemical compounds which form a complex with a biomacromolecule (e.g., sugar, drug, heme).  Also ions are often referred as ligands Topology validation – basic terms 13 Principles of topology validation:  Subjects of topology validation are residues  Validated residue is compared with a model residue, which has the same Residue ID  The model residues are taken from a reference database  Differences between the model residue and the validated residue are reported Topology validation – approach 14 Complication with “shared atoms”:  When some residues bind together, one of them can lose an atom: MAG FUC PDB ID 1g1t H20 • Solution: When we validate a residue, we must include also its close surrounding MAG-FUC Input motif = validated residue + surrounding Topology validation - approach 15 Validated motif mapped to the Model Residue Input PDB entry Selection of validated residue and its close surrounding Mapping of input motif to the model residue (via subgraph matching) Topology validation – types of validation analyses 16  Completeness analyses  Missing atoms  Missing rings  Chirality analyses  Chirality on C atom, metal atom, high bond order atom, planarity  Advanced analyses  Substitution  Different atom naming  Foreign atoms Topology validation – types of validation analyses Prerequisite: Can we map the validated and the model residue? Degenerated structure Correct residue 1IVG_17_7716 (MAN) Topology validation – types of validation analyses ERRORS – INCOMPLETE STRUCTURE 18 Missing atom Missing ring Correct residue Topology validation – types of validation analyses ERRORS – CHIRALITY 19 Wrong chirality on metal atom 1E4M_16_4280 (MAN) Wrong chirality on C atoms MAN AVC 2P7E_0_95 (AVC) Topology validation – types of validation analyses ERRORS – CHIRALITY II 20 1BZ0_1_4428 (HEM) Wrong chirality (planar) 4A2U_2_13909 (CMP) CMP Wrong chirality on atom having high order bonds HEM Correct chirality (Tolerant) = Correct residues + residues having only these issues: • Wrong chirality (planar) • Wrong chirality on atom having high order bonds Topology validation – types of validation analyses WARNINGS 21 Substitution Foreign atom (= atom from neighboring residue) Different atom name O1 instead O5 Correct residue ADVANCED WARNINGS: Alternate locations Zero RMSD with model Different atom names - example 22 Topology validation Software tools PDB care (Lütteke et al., 2006): • Tool focused on carbohydrates validation • First application, which implements topology validation • Performs basic validation analyses (missing atoms, missing rings, wrong chirality) MotiveValidator (http://ncbr.muni.cz/MotiveValidator) : • Tool, which allows validation of all residues • Performs basic validation analyses + reports basic warnings (substitutions, foreign atoms, different naming) ValidatorDB (http://ncbr.muni.cz/ValidatorDB): • Database, containing validation results for all* ligands and non-standard residues in PDB (weekly updated) • Performs basic validation analyses + advanced validation analyses (report degenerated residue, distinguish type of chirality error) • Reports basic warnings + next warnings (Alternate locations, Zero RMSD with model) * Except amino acids, nucleotides, and small residues (<7 heavy atoms) Central European Institute of Technology Masaryk University Kamenice 753/5 625 00 Brno, Czech Republic www.ceitec.muni.cz | info@ceitec.muni.cz Thank you for your attention