Engineering of protein structures ❑ Overview of mutations ❑ Databases of mutations ❑ Missense mutations ❑ Prediction of mutational effects ❑ Rational design of proteins Outline 2Engineering of protein structures Overview of mutations 3Overview of mutations ❑ Mutations in DNA or mRNA may occur ▪ Errors in DNA replication during cell division ▪ Exposure to mutagens (physical or chemical agents) ▪ Viral infections ▪ By scientists’ intervention ❑ Mutations can be harmful or not Overview of mutations 4Overview of mutations ❑ Location in the DNA ▪ Non-coding region –> affect gene expression (transcriptional regulation, mRNA stability, translation rates, location, etc.) ▪ Coding region (exons) –> may affect protein sequence Overview of mutations 5Overview of mutations ❑ Types ▪ Point mutations – a single nucleotide is changed in DNA (or RNA) ▪ Substitutions ▪ Single nucleotide polymorphism (SNP – pronounced “snip”) ▪ Genetic variation; occurs in > 1 % of population ▪ About 10,000,000 in the human genome ▪ Insertions or deletions ▪ Codons have triple nature (3 nucleotides → 1 amino acid) ▪ Potential for frameshift (change in the grouping of codons, resulting in a different translation) ▪ Can be very deleterious ▪ Other types (duplications, translocations, inversions, etc.) Point mutations at protein level 6Overview of mutations ❑ Types of point mutations ▪ Silent (synonymous SNP) – no effect on protein sequence ▪ Missense (non-synonymous SNP) – substitution of amino acid ▪ Nonsense – introduction of a stop codon -> protein truncation Databases of mutations 7Databases of mutations ❑ Human Genome Variation Society ▪ http://www.hgvs.org ▪ Lists all the available databases of human mutations by types ❑ Central mutation databases (>20) ▪ Substitutions in all genes ▪ Variability in protein sequences ▪ Data mainly from literature ❑ Locus-specific databases (about 700) ▪ Substitutions in specific genes ▪ Typically manually annotated Central mutation databases 8Databases of mutations ❑ Database of Single Nucleotide Polymorphisms - dbSNP ▪ https://www.ncbi.nlm.nih.gov/snp/ ▪ Repository for both SNP and short deletion and insertion ▪ For human genome Central mutation databases 9Databases of mutations ❑ Online Mendelian Inheritance in Man – OMIM ▪ http://omim.org/ ▪ Comprehensive database of human genes and genetic phenotypes Central mutation databases 10Databases of mutations ❑ Human Gene Mutation Database - HGMD ▪ http://www.hgmd.cf.ac.uk/ac/index.php ▪ Comprehensive collection of mutations in nuclear genes that underlie or are associated with human inherited disease Central mutation databases 11Databases of mutations ❑ UniProtKB/Swiss-Prot ▪ http://www.uniprot.org/UniProtKB/ ▪ High-quality manually annotated protein entries with partial lists of known sequence variants Locus-specific databases 12Databases of mutations ❑ For information on gene-specific databases Missense mutations 13Missense mutations What are they?... How can they affect proteins? Missense mutations 14Missense mutations ❑ Mutations affecting structure ▪ Stability & folding ▪ Aggregation ❑ Mutations affecting function ▪ Binding & catalysis ▪ Transport processes ▪ Protein dynamics ▪ Protein localization Mutations affecting structure 15Missense mutations - structure ❑ Major pathogenic consequences of missense mutation ▪ Compromised folding – the protein has modified folds or presents more unfolded states ▪ Decreased stability – the lifetime of the protein is decreased ▪ Increased aggregation Mutations affecting structure 16Missense mutations - structure ❑ Molecular basis of mutations affecting folding & stability ▪ Introduced clashes – common for small to large mutations in buried residues ▪ Loss of interactions – most pronounced effects related to H-bonds, salt bridges and aromatic interactions Mutations affecting structure 17Missense mutations - structure ❑ Molecular basis of mutations affecting folding & stability ▪ Altered conformation of protein backbone – mutations concerning residues with specific backbone angles (especially glycine and proline) ▪ Changes in charge/hydrophobicity ▪ Introducing hydrophilic/charged residue into the protein core ▪ Introducing hydrophobic residue onto the protein surface NOTE: • Glycine – the most flexible amino acid • Proline – the most rigid Mutations affecting structure 18Missense mutations - structure ❑ Mutations can reduce solubility or increase aggregation ▪ Alterations on the surface residues may affects the solubility (ex: reduction of charge) ▪ Hydrophobic mutations can increase protein aggregation ▪ Aggregating proteins usually have high level of β-structures ❑ Aggregation modulated by short specific sequences ▪ Aggregation-prone regions (APRs) are sequences of 5-15 hydrophobic residues ▪ They tend to stack and form amyloid fibrils (cross-β spines) ▪ Some mutations can increase the propensity to form such amyloid structures Mutations affecting function 19Missense mutations - function ❑ Effect on binding and catalysis ▪ Binding sites are tuned to bind specific molecules and stabilize transition states ▪ Mutations can disrupt or improve the binding and catalysis ❑ Example – drug-resistance of HIV-1 protease mutants ▪ Loss of interactions with inhibitors Ile50Val 37x higher Ki Ile84Val 3x higher Ki Means: isoleucine in position 84 was mutated to valine Mutations affecting function 20Missense mutations - function ❑ Effect on ligand transport ▪ Pathways are adjusted to permit transport of specific molecules ▪ Mutations can speed-up or disrupt the transport, or allow the transport of different molecules Leu177Trp => tunnel becomes almost closed release of products 500x slower Mutations affecting function 21Missense mutations - function ❑ Effect on protein dynamics ▪ Dynamics enables proteins to adapt to their binding partners and interchanging between conformations ▪ Mutations can: ▪ Make regions more rigid (targeting hinge or very mobile regions, ex.: loops ) -> reduced adaptability ▪ Increase flexibility of rigid regions (targeting residues with many contacts in mobile elements) -> increased adaptability ▪ These change may affect activity, specificity or even recognition Mutations affecting function 22Missense mutations - function ❑ Effect on protein localization ▪ After translation, the protein must be translocated to the appropriate cellular compartment ▪ Translocation can be regulated by short sequences (Signal Peptides) on the N-terminus, by Translocation Complexes, Chaperones, etc. ▪ Mutations can disrupt or alter the signal, or complex formation -> protein fails to be transported to the correct subcellular location ▪ Missing protein -> inactive reaction pathways or unregulated signaling cascades ▪ Mislocalized protein -> active in the wrong cellular compartment, causing harmful effects Prediction of mutational effects 23Prediction of mutational effects ❑ Identification of mutable residues ❑ Prediction of the effects on structure ❑ Prediction of pathogenicity Identification of mutable residues 24Prediction of mutational effects - mutable residues What are these? Identification of mutable residues 25Prediction of mutational effects - mutable residues ❑ The effect of mutations on the protein can be predicted directly from the role of the modified residue ❑ Mutation of evolutionary conserved residues ▪ Residues important for protein function or stability tend to be highly conserved over evolution ▪ Mutation of highly conserved residues -> often lead to destabilization or loss of function ▪ Mutation of highly variable residues -> often neutral 26 ❑ Mutations affecting stability & folding ▪ Mutation of residues with many contacts or with favorable interaction energy -> often destabilizing or compromise folding ▪ Mutation of residues in protein core -> often destabilizing ▪ Small residue to large -> steric clashes ▪ Large to small -> loss of contacts (creation of a void) ▪ Polar to non-polar -> loss of H-bond ▪ Neutral to charged -> introduction of isolated charge ▪ Mutation of residues on protein surface (often neutral) ▪ Polar to hydrophobic -> desolvation penalty (destabilizing) ▪ Mutation involving proline or glycine -> altered conformation Identification of mutable residues Prediction of mutational effects - mutable residues 27 ❑ Mutations affecting function ▪ Mutation of residues in binding or active sites -> modify binding or catalysis ▪ Mutation of residues in transport pathways -> modify transport ▪ Mutation of hinge or mobile residues, residues on loops with many contacts -> modify flexibility ▪ Mutation of residues directing protein localization -> mislocalization of proteins Identification of mutable residues Prediction of mutational effects - mutable residues 28 ❑ Tools for annotating (identifying) the role of residues ▪ Individual tools for specific analysis ▪ Evolutionary conservation – ex:. ConSurf, … ▪ Residue contacts – ex: Contact Map Web Viewer, … ▪ Residue interactions – ex: Protein Interaction Calculator, … ▪ Accessible surface area – ex: AsaView, Naccess, … ▪ Binding sites – ex: CASTp, metaPocket 2.0, meta-PPISP, … ▪ Transport pathways – ex: CAVER 3.0, POREWALKER, … ▪ Protein dynamics – ex: NMA, molecular dynamics, … ▪ Protein localization – ex: SignalP, TargetP, Phobius, TMHMM, … Identification of mutable residues Prediction of mutational effects - mutable residues 29 ❑ HotSpot Wizard – meta-server combining several tools ▪ http://loschmidt.chemi.muni.cz/hotspotwizard/ ▪ Homology modelling, MSA, conservation, correlation, pockets and tunnels detection, docking, stability prediction, design of smart library Identification of mutable residues Prediction of mutational effects - mutable residues Functional hot-spots Stability hot-spots (evolution) P P P P P P P P P P P P P P P W W - - - - G G G G G G G G G G GP P P P P P P P P P R R R R R R R R R R R M M M M M M M M M M M I I I I I I I I I I I T T T T T T T T T T Y Y Y Y Y Y Y Y Y A A A A A A A A A A A S S S S S S S S S S W W W W W W W W W W W W C C C C C C C C H H H H H H H H S S S S S S S S S S S L L L L L L L L L L L N N N N N N N N N N N N N N N N N N L L L L L L L L L L LI I L L Y Y Y R Correlated hot-spots P P P P P P P P P P P P P P P W W - - - - G G G G G G G G G G GP P P P P P P P P P R R R R R R R R R R R M M M M M M M M M M M I I I I I I I I I I I T T T T T T T T T T Y Y Y Y Y Y Y Y Y A A A A A A A A A A A S S S S S S S S S S W W W W W W W W W W W W C C C C C C C C H H H H H H H H S S S S S S S S S S S L L L L L L L L L L L N N N N N N N N N N N N N N N N N N L L L L L L L L L L LI I L L Y Y W W W R R R K K K K K W W W D D D E E E E E V V V Stability hot-spots (flexibility) Identification of mutable residues Prediction of mutational effects - mutable residues 30 Identification of mutable residues Identification of mutable residues Prediction of mutational effects - mutable residues 32 33 ❑ Prediction of mutant structures – general workflow ▪ Mutated residue and its surroundings represented by rotamers from rotamer library (conformations derived form X-ray structures) ▪ The best set of rotamers selected by Monte Carlo approach ▪ Optionally – energy minimization, backbone flexibility ▪ Comparing structures of mutant and native protein -> assessment of the mutational effect (G = GMut - GNative) ❑ Available tools ▪ Geometric: PyMOL; WhatIF ▪ Energy-based: FOLDX, Rosetta-ddG ▪ Homology: Swiss Model, MODELLER, etc. Prediction of effects on structure Prediction of mutational effects - structure 34 ❑ PyMOL ▪ https://pymol.org/ ▪ Mutagenesis module ▪ User can choose rotamers and visualize potential clashes ▪ Very fast; fixed backbone; no mutational scoring Prediction of effects on structure Prediction of mutational effects - structure 36 ❑ FOLDX ▪ http://foldxsuite.crg.eu/ ▪ Stand alone, with plug-in to Yasara modeling tool ▪ Fast (minutes) ▪ Fixed backbone conformation ▪ Construction of single or multiple mutants ▪ Empirical scoring function for calculation of stability change (ΔΔG) Prediction of effects on structure Prediction of mutational effects - structure 37 ❑ FOLDX Prediction of effects on structure Prediction of mutational effects - structure Prediction of effects on structure 38 ❑ Rosetta-ddG ▪ Under https://www.rosettacommons.org/ ▪ Stand alone with bash and python scripts available ▪ Slow (hours-days) ▪ Fixed or flexible backbone conformation ▪ Construction of single or multiple mutants ▪ Empirical force field for calculating structure and stability of wild-type and mutant ▪ Construction of PDB and prediction of stability change (ΔΔG) ❑ AlphaFold 3, ESM Fold, etc. (ML-based) ▪ Only structural prediction (no stability score) Prediction of mutational effects - structure Prediction of pathogenicity 39 ❑ Prediction of impact of mutation on protein function ▪ Tools employ machine learning approaches ▪ Trained on functional experimental data ▪ Predictions can be based on sequence only ▪ Qualitative results – i.e. deleterious versus neutral ▪ Primarily intended for pathogenicity prediction (leading to disease) ❑ Available tools ▪ MutPred, SNAP, PhD-SNP, SIFT, MAPP … ▪ PredictSNP – meta server combining a pipeline of many tools Prediction of mutational effects - pathogenicity 40 ❑ PredictSNP: ❑ http://loschmidt.chemi.muni.cz/predictsnp/ ❑ Combines many tools for Protein or DNA assessment of SNPs Prediction of pathogenicity Prediction of mutational effects - pathogenicity 41 Prediction of pathogenicity Prediction of mutational effects - pathogenicity Prediction of pathogenicity 42 ❑ There are many more tools out there Prediction of mutational effects - pathogenicity 43Rational design of proteins ❑ Protein engineering: sometimes we can use mutagenesis to rationally design proteins according to our needs ❑ Properties that can be modified by mutagenesis Rational design of proteins Such as?... 44Rational design of proteins ❑ Protein engineering: sometimes we can use mutagenesis to rationally design proteins according to our needs ❑ Properties that can be modified by mutagenesis ▪ Stability ▪ Function ▪ Binging site (catalytic activity or substrate specificity) ▪ Macromolecular interface ▪ Molecular tunnels/channels ▪ Solubility Rational design of proteins 45 ❑ Prediction of stability change upon mutation ▪ Structure of mutant protein may not be produced ▪ Tools often employ ▪ Empirical scoring functions ▪ Evolutionary conservation analysis (ex: back-to-consensus) ▪ Machine learning approaches ❑ Available tools ▪ Energy-based: Rosetta-ddG, FOLDX  ▪ Evolution-based: FireProtASR ▪ Hybrid approaches: FireProt, PROSS Rational design of proteins - stability Rational design: stability 46 ❑ FireProt ▪ https://loschmidt.chemi.muni.cz/fireprotweb ▪ In silico analysis of all possible mutations ▪ Energy- and evolution-based analyses ▪ Multiple-point mutants for gene synthesis Rational design of proteins - stability Rational design: stability 47 ❑ FireProt Rational design of proteins - stability Rational design: stability 48 ❑ FireProt Rational design of proteins - stability Rational design: stability 49 ❑ PROSS ▪ https://pross.weizmann.ac.il/step/pross-terms/ ▪ Combination of mutations “allowed” by conservation analysis and Rosetta calculations (energy) Rational design of proteins - stability Rational design: stability 50 ❑ FireProtASR ▪ https://loschmidt.chemi.muni.cz/fireprotasr ▪ Ancestral sequence reconstruction (ASR) ▪ Automated ancestral inference & phylogenetic tree ▪ Useful to find stable ancestral enzymes Rational design of proteins - stability Rational design: stability 51 ❑ FireProtASR ▪ https://loschmidt.chemi.muni.cz/fireprotasr ▪ Ancestral sequence reconstruction (ASR) ▪ Automated ancestral inference & phylogenetic tree ▪ Useful to find stable ancestral enzymes Rational design of proteins - stability Rational design: stability ❑ RosettaDesign ▪ http://rosettadesign.med.unc.edu/ ▪ Monte Carlo sampling (random search) to predict minimum-energy structure of mutants ▪ Predicts free energy changes upon mutations (G) ▪ Helps design mutations to optimize the binding site and increase interactions with a ligand/substrate 52Rational design of proteins - function Rational design: function ❑ PocketOptimizer ▪ https://github.com/Hoecker-Lab/pocketoptimizer/ ▪ Aimed at maximizing the affinity of a binding site towards a ligand ▪ Modular pipeline with different tools ▪ Flexibility, docking, mutagenesis, energy calculation ▪ Predicts global minimum-energy designs 53 Rational design: function Rational design of proteins - function ❑ FuncLib ▪ https://funclib.weizmann.ac.il ▪ To redesign and/or optimize binding site ▪ Utilizes evolution (conservation) and Rosetta calculations (energy) to introduce multiple-point mutations to modify the properties of the binding site ▪ Can be used to improve the binding affinity towards a ligand ▪ Outputs up to 50 multiple-point mutants for protein synthesis 54 Rational design: function Rational design of proteins - function ❑ FuncLib 55 Rational design: function Rational design of proteins - function ❑ AffiLib ▪ https://affilib.weizmann.ac.il ▪ To optimize protein-protein interface ▪ Utilizes evolution (conservation) and Rosetta (energy) to introduce mutations and optimize macromolecular interface ▪ Suggests mutations on the interface residues to improve the binding affinity ▪ Outputs up to 50 multiple-point mutants for protein synthesis 56Rational design of proteins - function Rational design: function 57 ❑ Mutation Cutoff Scanning Matrix (mCSM-PPI2) ▪ http://biosig.unimelb.edu.au/mcsm_ppi2/ ▪ To optimize protein-protein interface ▪ Based on machine learning, evolutionary data and energy (FoldX) ▪ Provides mutational G ▪ Modes of calculations ▪ Single mutation – single point mutations on interface ▪ Mutation list – single mutations accordingly to a user ▪ Alanine scanning (all interface residues are mutated to alanine) ▪ Systematic – position saturation (all interface residues are mutated to all other 19 amino acids) Rational design of proteins - function Rational design: function ❑ Aggrescan3D; SoluProt (see lecture 7 - Analysis of protein structures) ❑ SolubiS ▪ https://solubis.switchlab.org/ ▪ To identify stabilizing mutations that reduce the aggregation tendency of a protein ▪ 1) Identifies exposed APRs ▪ 2) Introduces “gatekeeper” residues (P, R, K, D and E) into APRs ▪ 3) Assesses the stability changes of mutations (ΔΔG) 58Rational design of proteins - solubility Rational design: solubility References I ❑ Ng, P. C. & Henikoff, S. (2006) Predicting the effects of amino acid substitutions on protein function. Annual Review of Genomics and Human Genetics 7: 61-80. ❑ Thusberg, J. & Vihinen, M. (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Human Mutation 30: 703-714. ❑ Potapov, V. et al. (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Engineering, Design & Selection 22: 553-560. References 59 References II ❑ Khan, S. & Vihinen, M. (2010) Performance of protein stability predictors. Human Mutation 31: 675-684. ❑ Bendl, J. et al. (2016) PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions. PLOS Computational Biology 12: e1004962. ❑ Musil, M. et al. (2019) Computational Design of Stable and Soluble Biocatalysts. ACS Catalysis 9: 1033−1054. ❑ Planas-Iglesias, J. et al. (2021) Computational design of enzymes for biotechnological applications. Biotechnology Advances 47:107696 References 60