Engineering of protein structures  Overview of mutations  Databases of mutations  Missense mutations  Prediction of mutational effects  Rational design of proteins Outline 2Engineering of protein structures Overview of mutations 3Overview of mutations  Mutations in DNA or RNA may occur  Errors in DNA replication during cell division  Exposure to mutagens (physical or chemical agents)  Viral infections  …Or scientist intervention   Can be harmful or not Overview of mutations 4Overview of mutations  Location in the DNA  Non-coding region –> affect gene expression (transcriptional regulation, mRNA stability, translation rates, location, etc.)  Coding region –> may affect protein sequence Overview of mutations 5Overview of mutations  Types  Point mutations – a single nucleotide is changed in DNA (or RNA)  Substitutions  Single nucleotide polymorphism (SNP – pronounced “snip”)  Genetic variation; occurs in > 1 % of population  About 10,000,000 in the human genome  Insertions or deletions  Codons have triple nature (3 nucleotides  1 amino acid)  Potential for frameshift (change in the grouping of codons, resulting in a different translation)  Can be very deleterious  Other types (duplications, translocations, inversions, etc.) Point mutations at protein level 6Overview of mutations  Types of point mutations  Silent (synonymous SNP) – no effect on protein sequence  Missense (non-synonymous SNP) – substitution of amino acid  Nonsense – introduction of a stop codon -> protein truncation Databases of mutations 7Databases of mutations  Human Genome Variation Society  http://www.hgvs.org  Lists all the available databases of human mutations  Central mutation databases (>20)  Substitutions in all genes  Variability in protein sequences  Data mainly from literature  Locus-specific databases (about 700)  Substitutions in specific genes  Typically manually annotated Central mutation databases 8Databases of mutations  Database of Single Nucleotide Polymorphisms - dbSNP  http://www.ncbi.nlm.nih.gov/SNP/  Repository for both SNP and short deletion and insertion  For human genome Central mutation databases 9Databases of mutations  Online Mendelian Inheritance in Man – OMIM  http://omim.org/  Comprehensive database of human genes and genetic phenotypes Central mutation databases 10Databases of mutations  Human Gene Mutation Database - HGMD  http://www.hgmd.cf.ac.uk/ac/index.php  Comprehensive collection of mutations in nuclear genes that underlie or are associated with human inherited disease Central mutation databases 11Databases of mutations  UniProtKB/Swiss-Prot  http://www.uniprot.org/UniProtKB/  High-quality manually annotated protein entries with partial lists of known sequence variants Locus-specific databases 12Databases of mutations  For information on gene-specific databases Missense mutations 13Missense mutations  What are they?...  How can they affect proteins? Missense mutations 14Missense mutations  Mutations affecting structure  Stability & folding  Aggregation  Mutations affecting function  Binding & catalysis  Transport processes  Protein dynamics  Protein localization Mutations affecting structure 15Missense mutations - structure  Major pathogenic consequences of missense mutation  Compromised folding – the protein has modified folds or presents more unfolded states  Decreased stability – the lifetime of the protein is decreased  Increased aggregation Mutations affecting structure 16Missense mutations - structure  Molecular basis of mutations affecting folding & stability  Introduced clashes – common for small to large mutations in buried residues  Loss of interactions – most pronounced effects related to H-bonds, salt bridges and aromatic interactions Mutations affecting structure 17Missense mutations - structure  Molecular basis of mutations affecting folding & stability  Altered conformation of protein backbone – mutations concerning residues with specific backbone angles (especially glycine and proline)  Changes in charge/hydrophobicity  Introducing hydrophilic/charged residue into the protein core  Introducing hydrophobic residue onto the protein surface NOTE: • Glycine – the most flexible amino acid • Proline – the most rigid Mutations affecting structure 18Missense mutations - structure  Mutations can reduce solubility or increase aggregation  Alterations on the surface residues may affects the solubility (ex: reduction of charge)  Hydrophobic mutations can increase protein aggregation  Aggregating proteins usually have high level of β-structures  Aggregation modulated by short specific sequences  Aggregation-prone regions (APRs) are sequences of 5-15 hydrophobic residues  They tend to stack and form amyloid fibrils (cross-β spines)  Some mutations can increase the propensity to form such amyloid structures Mutations affecting function 19Missense mutations - function  Effect on binding and catalysis  Binding sites are tuned to bind specific molecules and stabilize transition states  Mutations can improve or disrupt the binding and catalysis  Example – drug-resistance of HIV-1 protease mutants  Loss of interactions with inhibitors Ile84Val 3x higher Ki Ile50Val 37x higher Ki Means: isoleucine in position 84 was mutated to valine Mutations affecting function 20Missense mutations - function  Effect on ligand transport  Pathways are adjusted to permit transport of specific molecules  Mutations can speed-up or disrupt their transport or allow the transport of different molecules Leu177Trp => tunnel becomes almost closed release of products 500x slower Mutations affecting function 21Missense mutations - function  Effect on protein dynamics  Dynamics enables proteins to adapt to their binding partners and interchanging between conformations  Mutations can:  Make regions more rigid (targeting hinge or very mobile regions, ex.: loops ) -> reduced adaptability  Increase flexibility of rigid regions (targeting residues with many contacts in mobile elements) -> increased adaptability  These change may affect activity, specificity or even recognition Mutations affecting function 22Missense mutations - function  Effect on protein localization  After translation, the protein must be translocated to the appropriate cellular compartment  Translocation can be regulated by short sequences (Signal Peptides) on the N-terminus, by Translocation Complexes, Chaperones, etc.  Mutations can disrupt or alter the signal, or complex formation -> protein fails to be transported to the correct subcellular location  Missing protein -> inactive reaction pathways or unregulated signaling cascades  Mislocalized protein -> active in the wrong cellular compartment, causing harmful effects Prediction of mutational effects 23Prediction of mutational effects  Identification of mutable residues  Prediction of the effects on structure  Prediction of pathogenicity Identification of mutable residues 24Prediction of mutational effects - mutable residues  What is it? Identification of mutable residues 25Prediction of mutational effects - mutable residues  The effect of mutations on the protein can be predicted directly from the role of the modified residue  Mutation of evolutionary conserved residues  Residues important for protein function or stability tend to be highly conserved over evolution  Mutation of highly conserved residues -> often lead to destabilization or loss of function  Mutation of highly variable residues -> often neutral 26  Mutations affecting stability & folding  Mutation of residues with many contacts or with favorable interaction energy -> often destabilizing or compromise folding  Mutation of residues in protein core -> often destabilizing  Small residue to large -> steric clashes  Large to small -> loss of contacts (creation of a void)  Polar to non-polar -> loss of H-bond  Neutral to charged -> introduction of isolated charge  Mutation of residues on protein surface (often neutral)  Polar to hydrophobic -> desolvation penalty (destabilizing)  Mutation involving proline or glycine -> altered conformation Identification of mutable residues Prediction of mutational effects - mutable residues 27  Mutations affecting function  Mutation of residues in binding or active sites -> modify binding or catalysis  Mutation of residues in transport pathways -> modify transport  Mutation of hinge or mobile residues, residues on loops with many contacts -> modify flexibility  Mutation of residues directing protein localization -> mislocalization of proteins Identification of mutable residues Prediction of mutational effects - mutable residues 28  Tools for annotating (identifying) the role of residues  Individual tools for specific analysis  Evolutionary conservation – ex:. ConSurf, …  Residue contacts – ex: Contact Map Web Viewer, …  Residue interactions – ex: Protein Interaction Calculator, …  Accessible surface area – ex: AsaView, Naccess, …  Binding sites – ex: CASTp, metaPocket 2.0, meta-PPISP, …  Transport pathways – ex: CAVER 3.0, POREWALKER, …  Protein dynamics – ex: NMA, molecular dynamics, …  Protein localization – ex: SignalP, TargetP, Phobius, TMHMM, … Identification of mutable residues Prediction of mutational effects - mutable residues 29  HotSpot Wizard – meta-server combining several tools  http://loschmidt.chemi.muni.cz/hotspotwizard/  Homology modelling, MSA, conservation, correlation, pockets and tunnels detection, docking, stability prediction, design of smart library Identification of mutable residues Prediction of mutational effects - mutable residues Functional hot-spots Stability hot-spots (flexibility) Stability hot-spots (evolution) P P P P P P P P P P P P P P P W W - - - - G G G G G G G G G G GP P P P P P P P P P R R R R R R R R R R R M M M M M M M M M M M I I I I I I I I I I I T T T T T T T T T T Y Y Y Y Y Y Y Y Y A A A A A A A A A A A S S S S S S S S S S W W W W W W W W W W W W C C C C C C C C H H H H H H H H S S S S S S S S S S S L L L L L L L L L L L N N N N N N N N N N N N N N N N N N L L L L L L L L L L LI I L L Y Y Y R Correlated hot-spots P P P P P P P P P P P P P P P W W - - - - G G G G G G G G G G GP P P P P P P P P P R R R R R R R R R R R M M M M M M M M M M M I I I I I I I I I I I T T T T T T T T T T Y Y Y Y Y Y Y Y Y A A A A A A A A A A A S S S S S S S S S S W W W W W W W W W W W W C C C C C C C C H H H H H H H H S S S S S S S S S S S L L L L L L L L L L L N N N N N N N N N N N N N N N N N N L L L L L L L L L L LI I L L Y Y W W W R R R K K K K K W W W D D D E E E E E V V V Identification of mutable residues Prediction of mutational effects - mutable residues 30 Identification of mutable residues Identification of mutable residues Prediction of mutational effects - mutable residues 32 33  Prediction of mutant structures – general workflow  Mutated residue and its surroundings represented by rotamers from rotamer library (conformations derived form X-ray structures)  The best set of rotamers selected by Monte Carlo approach  Optionally – energy minimization, backbone flexibility  Comparing structures of mutant and native protein -> assessment of the mutational effect (G = GMut - GNative)  Available tools  Geometric: PyMOL; WhatIF  Energy-based: FOLDX, Rosetta-ddG  Homology: Swiss Model, MODELLER, etc. Prediction of effects on structure Prediction of mutational effects - structure 34  PyMOL  https://pymol.org/  Mutagenesis module  User can choose rotamers and visualize potential clashes  Very fast; fixed backbone; no mutational scoring Prediction of effects on structure Prediction of mutational effects - structure 35  WHATIF  https://swift.cmbi.umcn.nl/servers/html/index.html  Web server for multiple purpose including mutagenesis  Very fast  Fixed backbone conformation  Construction of single mutants only, or stabilizing Proline mutations  No scoring function Prediction of effects on structure Prediction of mutational effects - structure 36  FOLDX  http://foldxsuite.crg.eu/  Stand alone, with plug-in to Yasara modeling tool  Fast (minutes)  Fixed backbone conformation  Construction of single or multiple mutants  Empirical scoring function for calculation of stability change (ΔΔG) Prediction of effects on structure Prediction of mutational effects - structure 37  FOLDX Prediction of effects on structure Prediction of mutational effects - structure Prediction of effects on structure 38  Rosetta-ddG  Under https://www.rosettacommons.org/  Stand alone with bash and python scripts available  Slow (hours-days)  Fixed or flexible backbone conformation  Construction of single or multiple mutants  Empirical force field for calculating structure and stability of wild-type and mutant  Construction of PDB and prediction of stability change (ΔΔG) Prediction of mutational effects - structure Prediction of pathogenicity 39  Prediction of impact of mutation on protein function  Tools employ machine learning approaches  Trained on functional experimental data  Predictions can be based on sequence only  Qualitative results – i.e. deleterious versus neutral  Primarily intended for pathogenicity prediction (leading to disease)  Available tools  MutPred, SNAP, PhD-SNP, SIFT, MAPP …  PredictSNP – meta server combining many tools Prediction of mutational effects - pathogenicity 40  PredictSNP:  http://loschmidt.chemi.muni.cz/predictsnp/  Combines many tools for Protein or DNA assessment of SNPs Prediction of pathogenicity Prediction of mutational effects - pathogenicity 41 Prediction of pathogenicity Prediction of mutational effects - pathogenicity Prediction of pathogenicity 42  There are many more tools out there Prediction of mutational effects - pathogenicity 43Rational design of proteins  Protein engineering: sometimes we can use mutagenesis to rationally design proteins according to our needs  Properties that can be modified by mutagenesis  Such as?... Rational design of proteins 44Rational design of proteins  Protein engineering: sometimes we can use mutagenesis to rationally design proteins according to our needs  Properties that can be modified by mutagenesis  Stability  Function  Binging site (catalytic activity or substrate specificity)  Macromolecular interface  Molecular tunnels/channels  Solubility Rational design of proteins 45  Prediction of stability change upon mutation  Structure of mutant protein may not be produced  Tools often employ  Empirical scoring functions  Evolutionary conservation analysis (ex: back-to-consensus)  Machine learning approaches  Available tools  Energy-based: Rosetta-ddG, FOLDX   Evolution-based: FireProtASR  Hybrid approaches: FireProt, PROSS Rational design of proteins - stability Rational design: stability 46  FireProt  https://loschmidt.chemi.muni.cz/fireprotweb  In silico analysis of all possible mutations  Energy- and evolution-based analyses  Multiple-point mutants for gene synthesis Rational design of proteins - stability Rational design: stability 47  FireProt Rational design of proteins - stability Rational design: stability 48  FireProt Rational design of proteins - stability Rational design: stability 49  PROSS  https://pross.weizmann.ac.il/step/pross-terms/  Combination of mutations “allowed” by conservation analysis and Rosetta calculations (energy) Rational design of proteins - stability Rational design: stability 50  FireProtASR  https://loschmidt.chemi.muni.cz/fireprotasr  Ancestral sequence reconstruction (ASR)  Automated ancestral inference & phylogenetic tree  Useful to find stable ancestral enzymes Rational design of proteins - stability Rational design: stability 51  FireProtASR  https://loschmidt.chemi.muni.cz/fireprotasr  Ancestral sequence reconstruction (ASR)  Automated ancestral inference & phylogenetic tree  Useful to find stable ancestral enzymes Rational design of proteins - stability Rational design: stability  RosettaDesign  http://rosettadesign.med.unc.edu/  Monte Carlo sampling (random search) to predict minimum-energy structure of mutants  Predicts free energy changes upon mutations (G)  Helps design mutations to optimize the binding site and increase interactions with a ligand/substrate 52Rational design of proteins - function Rational design: function  PocketOptimizer  https://github.com/Hoecker-Lab/pocketoptimizer/  Aimed at maximizing the affinity of a binding site towards a ligand  Modular pipeline with different tools  Flexibility, docking, mutagenesis, energy calculation  Predicts global minimum-energy designs 53 Rational design: function Rational design of proteins - function  FuncLib  https://funclib.weizmann.ac.il  To redesign and/or optimize binding site  Utilizes evolution (conservation) and Rosetta calculations (energy) to introduce multiple-point mutations to modify the properties of the binding site  Can be used to improve the binding affinity towards a ligand  Outputs up to 50 multiple-point mutants for protein synthesis 54 Rational design: function Rational design of proteins - function  FuncLib 55 Rational design: function Rational design of proteins - function  AffiLib  https://affilib.weizmann.ac.il  To optimize protein-protein interface  Utilizes evolution (conservation) and Rosetta (energy) to introduce mutations and optimize macromolecular interface  Suggests mutations on the interface residues to improve the binding affinity  Outputs up to 50 multiple-point mutants for protein synthesis 56Rational design of proteins - function Rational design: function 57  Mutation Cutoff Scanning Matrix (mCSM-PPI2)  http://biosig.unimelb.edu.au/mcsm_ppi2/  To optimize protein-protein interface  Based on machine learning, evolutionary data and energy (FoldX)  Provides mutational G  Modes of calculations  Single mutation – single point mutations on interface  Mutation list – single mutations accordingly to a user  Alanine scanning (all interface residues are mutated to alanine)  Systematic – position saturation (all interface residues are mutated to all other 19 amino acids) Rational design of proteins - function Rational design: function  Aggrescan3D; SoluProt (see lecture 6 - Analysis of protein structures)  SolubiS  https://solubis.switchlab.org/  To identify stabilizing mutations that reduce the aggregation tendency of a protein  1) Identifies exposed APRs  2) Introduces “gatekeeper” residues (P, R, K, D and E) into APRs  3) Assesses the stability changes of mutations (ΔΔG) 58Rational design of proteins - solubility Rational design: solubility References I  Ng, P. C. & Henikoff, S. (2006) Predicting the effects of amino acid substitutions on protein function. Annual Review of Genomics and Human Genetics 7: 61-80.  Thusberg, J. & Vihinen, M. (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human Mutation 30: 703-714.  Potapov, V. et al. (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Engineering, Design & Selection 22: 553-560. References 59 References II  Khan, S. & Vihinen, M. (2010) Performance of protein stability predictors. Human Mutation 31: 675-684.  Bendl, J. et al. (2016) PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions. PLOS Computational Biology 12: e1004962.  Musil, M. et al. (2019) Computational Design of Stable and Soluble Biocatalysts. ACS Catalysis 9: 1033−1054.  Planas-Iglesias, J. et al. (2021) Computational design of enzymes for biotechnological applications. Biotechnology Advances 47:107696 References 60