PROTEIN ENGINEERING 7. Rational and semi-rational design Loschmidt Laboratories Department of Experimental Biology Masaryk University, Brno Outline  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Outline  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Protein engineering  altering protein structure to improve its properties  three main approaches  rational design  directed evolution  semi-rational design Protein engineering approaches Protein engineering approaches Rational design Directed evolution Semi-rational design high-throughput screening/selection not essential essential advantageous but not essential structural and/or functional information both essential neither essential either is sufficient sequence space exploration low high, random moderate, targeted probability to obtain synergistic mutations moderate low high  worldwide Protein Data Bank (wwPDB)  http://www.wwpdb.org/  central repository of ~220,000 experimental macromolecular structures (April 2024)  RCSB PDB  https://www.rcsb.org  PDBe  https://www.ebi.ac.uk/pdbe  PDBj  https://pdbj.org Structural information Structural prediction  Alpha Fold 2  Galaxy: https://usegalaxy.eu/?tool_id=alphafold, Colab: https://colab.research.google.com/github/deepmind/alphafold/blob/ main/notebooks/AlphaFold.ipynb  structure prediction directly from sequence using deep learning, evolutionary information (MSA), and structure optimization  Multimer mode – lower accuracy  Not precise in sidechain orientations prediction (not appropriate for protein-ligand interaction - molecular docking)  Rare folds, alternative conformations, and co-factors not predicted Structure prediction  Alpha Fold database  https://alphafold.ebi.ac.uk  Database of protein structures predicted by Alpha Fold 2  Over 200 million sequences modeled (available also in UniProt)  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Semi-rational design  combine advantages of rational and random approaches  selection of promising target sites (hot-spots) → mutagenesis → creation of small “smart” libraries  based on knowledge of protein structure and function   high-throughput screening usually not needed   increased chance of obtaining variants with desired properties   certain knowledge of protein structure-function relationships is still required,  but not that much  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Identification of hot-spots  hot-spots for engineering catalytic properties  hot-spots for engineering thermostability Hot-spots for engineering catalytic properties  residues mediating substrate binding, transition-state stabilization or product release → mutations can improve or disrupt binding, catalysis or ligand transport  residues involved in protein-ligand interactions  residues located in binding pockets  residues located in access tunnels → these residues also include catalytic or other essential residues which generally should not be mutated! Analysis of protein-ligand interactions  requires 3D structure of protein-ligand complex  experimental structure (wwPDB, PDBbind)  theoretical model (molecular docking) Analysis of protein-ligand interactions  schematic diagrams of protein-ligand interactions LigPlot, LigPlot+ PoseView Analysis of binding pockets  binding and active sites of enzymes are often associated with structural pockets and cavities ligandactive site pocket Analysis of binding pockets  binding and active sites of enzymes are often associated with structural pockets and cavities Analysis of binding pockets  binding and active sites of enzymes are often associated with structural pockets and cavities mutation Analysis of binding pockets  binding and active sites of enzymes are often associated with structural pockets and cavities Analysis of binding pockets  binding and active sites of enzymes are often associated with structural pockets and cavities Analysis of binding pockets  binding and active sites of enzymes are often associated with structural pockets and cavities  most amino acid residues located in these pockets may come into contact with the ligands during the catalytic cycle → one can accurately predict which residues may interact with the ligand even without precise knowledge of ligand orientation in the active site  requires 3D structure of protein  software for detection of pockets  CASTp, fPocket, CavityPlus, etc. Analysis of binding pockets  detailed characterization of all pockets in the structure CavityPlus Analysis of access tunnels  buried binding or active sites are connected with bulk solvent by access tunnels ligand active site pocket access tunnel Analysis of access tunnels  buried binding or active sites are connected with bulk solvent by access tunnels Analysis of access tunnels  buried binding or active sites are connected with bulk solvent by access tunnels mutation Analysis of access tunnels  buried binding or active sites are connected with bulk solvent by access tunnels mutation Analysis of access tunnels  buried binding or active sites are connected with bulk solvent by access tunnels  adjusted to permit transport of specific molecules  mutations can speed-up or hinder transport of molecules as well as allow transport of other molecules  requires 3D structure of protein  software for detection of tunnels  Caver, Mole, HOLE, PoreWalker Analysis of access tunnels  Caver Web  https://loschmidt.chemi.muni.cz/caverweb Radius[Å] Length [Å] Hot-spots for engineering thermostability  highly flexible residues – introduction of rigidifying mutations  residues located in access tunnels → these residues may also include catalytic or other essential residues which generally should not be mutated! Identification of highly flexible residues  prediction based on crystallographic B-factors  reflect the degree of thermal motion, and thus the flexibility of individual residues  requires 3D structure of protein  experimental structure determined by X-ray crystallography (wwPDB) Identification of highly flexible residues  average B-factor of each residue in the target protein B-FITTER Analysis of access tunnels  saturation mutagenesis in tunnel residues has 2× better chance to significantly improve stability than mutagenesis in other protein regions (based on computational predictions)  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Evaluation of hot-spots  hot-spots identified by computational tools can be further evaluated to prevent replacing indispensable amino acid residues and to prioritize the hot-spots (i.e., order the hotspots based on their suitability for mutagenesis)  analysis of evolutionary conservation  prediction of effects of mutations on protein stability or function Analysis of evolutionary conservation  residues essential for maintaining structural or functional properties of a protein tend to be conserved during evolution  conserved residues are generally not recommended as suitable targets for mutagenesis - their replacement often leads to the loss of protein function  mutagenesis targeting highly mutable positions provides a significantly higher proportion of viable variants than random mutagenesis  targeting moderately or highly variable positions, which are expected to be tolerant to a wide range of substitutions, represents a good approach for producing efficient smart libraries (i.e., libraries with a high proportion of correctly folded and active variants) Analysis of evolutionary conservation  residue conservation can be derived from a multiple alignment of a set of related proteins (3D structure not required) Analysis of evolutionary conservation  residue conservation can be derived from a multiple alignment of a set of related proteins (3D structure not required) Analysis of evolutionary conservation  evolutionary conservation of individual positions in protein mapped on protein 3D structure ConSurf Prediction of mutation effects  computational tools for the prediction of effect of amino acid substitutions on protein stability or protein function  in silico site-saturation mutagenesis of identified hot-spots – check if mutations at a given site are likely to be tolerated  many highly destabilizing/deleterious mutations predicted for a certain position – given site is not a very good target for mutagenesis  sites with only a few highly destabilizing /deleterious mutations predicted can still represent promising hot-spots (the amino acids with potentially destabilizing/deleterious effects can be discarded from the library by the appropriate selection of degenerate codons) Prediction of mutation effects  effects on protein stability – usually requires 3D structure of protein  experimental structure (wwPDB)  theoretical model (AlphaFold, homology modeling)  effects on protein function – sequence information often sufficient Prediction of mutation effects  prediction of effect of substitutions on protein stability  Evaluation of the change of protein free energy upon mutation  Evaluation of contributions of individual interactions to total energy  Usually requires structural information  software for prediction of effect of mutation on stability  Rosetta, FoldX, CUPSAT, ERIS Prediction of mutation effects  prediction of effect of substitutions on protein stability Rosetta in HotSpot Wizard Prediction of mutation effects  prediction of effect of substitutions on protein function  Evaluation if a mutation would impair protein function  Hard to describe by physico-chemical properties > machine learning  Usually sequence based calculation  software for prediction of effect of mutation on function  PredictSNP, SIFT, MAPP, PhD-SNP… Prediction of mutation effects  prediction of effect of substitutions on protein function PredictSNP  AlphaMissense  https://github.com/google-deepmind/alphamissense  deep learning predictor based on AlphaFold  analysis of human and primate missense mutations  trained on population frequency data and uses sequence and predicted structural context  all single–amino acid substitutions in the human proteome are provided Prediction of mutation effects  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Selection of substitutions  substitutions introduced using degenerate codons  e.g., NNK (N = A/T/G/C; K = T/G) Selection of substitutions  all possible substitutions - NNK or NNS degenerate codons   encode all 20 amino acids with the lowest redundancy and price (mixture of 32 codons)   redundancy is not completely eliminated (3× Arg, Leu, Ser, 2× Ala, Gly, Pro, Thr and Val) Selection of substitutions  all possible substitutions - NNK or NNS degenerate codons  introduction of only selected substitutions using degenerate codons encoding reduced amino acid alphabets   do not encode all 20 amino acids   decreased library size → improved screening efficiency  NDT – balanced set of 12 amino acids (12 codons) Selection of substitutions  all possible substitutions - NNK or NNS degenerate codons  introduction of only selected substitutions using degenerate codons encoding reduced amino acid alphabets Selection of reduced amino acid alphabets  introduction of amino acids exhibiting certain properties  VRK – 8 hydrophilic amino acids (12 codons)  NYC – 8 hydrophobic amino acids (8 codons)  KST – 4 small amino acids (4 codons)  ... Selection of reduced amino acid alphabets  introduction of amino acids exhibiting certain properties  introduction of a balanced set of amino acids  NDT – balanced set of 12 amino acids (12 codons) Selection of reduced amino acid alphabets  introduction of amino acids exhibiting certain properties  introduction of a balanced set of amino acids  introduction of substitutions existing (at a given site) in known natural proteins  likely increasing the proportion of viable variants in the resulting library  can be obtained by analysis of multiple sequence alignment Selection of reduced amino acid alphabets  introduction of amino acids exhibiting certain properties  introduction of a balanced set of amino acids  introduction of substitutions existing (at a given site) in known natural proteins  discarding amino acids with potentially destabilizing/ deleterious effects  can be obtained by prediction of effects of mutations on protein stability or function HotSpot Wizard  meta-server combining several tools  automatic identification of hot-spots for engineering of enzyme catalytic properties  prioritization of hot-spots by their mutability  distribution of amino acids at individual positions  prediction of stability  molecular docking  design of smart libraries HotSpot Wizard Functional hot-spots Stability hot-spots (flexibility) Stability hot-spots (evolution) P P P P P P P P P P P P P P P W W - - - - G G G G G G G G G G GP P P P P P P P P P R R R R R R R R R R R M M M M M M M M M M M I I I I I I I I I I I T T T T T T T T T T Y Y Y Y Y Y Y Y Y A A A A A A A A A A A S S S S S S S S S S W W W W W W W W W W W W C C C C C C C C H H H H H H H H S S S S S S S S S S S L L L L L L L L L L L N N N N N N N N N N N N N N N N N N L L L L L L L L L L LI I L L Y Y Y R Correlated hot-spots P P P P P P P P P P P P P P P W W - - - - G G G G G G G G G G GP P P P P P P P P P R R R R R R R R R R R M M M M M M M M M M M I I I I I I I I I I I T T T T T T T T T T Y Y Y Y Y Y Y Y Y A A A A A A A A A A A S S S S S S S S S S W W W W W W W W W W W W C C C C C C C C H H H H H H H H S S S S S S S S S S S L L L L L L L L L L L N N N N N N N N N N N N N N N N N N L L L L L L L L L L LI I L L Y Y W W W R R R K K K K K W W W D D D E E E E E V V V HotSpot Wizard 1. protein structure HotSpot Wizard 2. residues indispensable for protein function: catalytic and binding residues HotSpot Wizard 3. functional residues: active site pocket and tunnels HotSpot Wizard 4. mutability of individual positions of protein HotSpot Wizard Hot spots Tunnels Cavities Docking Stability Design Library HotSpot Wizard  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Design of library  decisions to be made after evaluation and prioritization of hot-spots:  how many and which positions to target?  should the positions be randomized simultaneously or separately?  should all or only a reduced set of amino acids be introduced at individual positions? → dramatic effect on the size of the resulting library Design of library – HotSpot Wizard Design of library – HotSpot Wizard  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling Mutagenesis and screening  saturation mutagenesis - next lecture   Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling→ design of mutations Rational design  site-specific changes on the target enzyme  few amino-acid substitutions that are predicted to elicit desired improvements of enzyme function  based on detailed knowledge of protein structure, function and catalytic mechanism   relatively simple characterization of constructed variants   complexity of protein structure-function relationships   molecular modeling expertise usually required  Protein engineering approaches  Semi-rational design  identification of hot-spots  evaluation of hot-spots  selection of substitutions  design of library  mutagenesis and screening  Rational design  molecular modeling→ design of mutations Molecular modeling  “Theoretical or computational technique that provides insight into the behavior of molecular system.” A. R. Leach  Ligand binding  Molecular docking  Protein dynamics and transport of molecules  Molecular dynamics  Reaction barriers and mechanisms  Quantum chemistry or QM/MM  Protein design  Molecular mechanics, machine learning Molecular docking  predicts structure of receptor (protein) – ligand complex Molecular docking  Two components procedure  searching – finding the conformation of ligand in the active site of the enzyme  scoring – evaluation of the binding free energy  Docking software  Autodock, Vina, Gold, Medusa, Rosetta Dock… Molecular docking  Virtual screening  database of compounds + protein structure > molecular docking > re-scoring > compound prioritization > experimental testing  Principle  physical description of interactions within the system (force field)  Newton’s laws of motions  forces acting on all atoms due to all atoms  Provides information on energetics, amplitudes, and time scales of local motions on the atomic level Molecular dynamics force field Analysis of interactions Molecular dynamics Ligand transport Interaction with membraneLigand conversion Quantum chemistry  Modeling of reaction barriers  Enzymes increase speed of chemical reactions by decreasing activation barrier Quantum chemistry  Using quantum mechanics to create or break bonds (usually hybrid quantum mechanics/ molecular mechanics simulation) Design of stability  FireProt  https://loschmidt.chemi.muni.cz/fireprotweb  In silico analysis of all mutations  Energy- and evolution-based analyses  Multiple-point mutants for gene synthesis  Single-point prediction  User-defined mutations Design of stability  FireProt  FireProtASR  https://loschmidt.chemi.muni.cz/fireprotasr  ancestral sequence reconstruction  Analysis of protein evolution, sequence-based protein stabilization  Ancestrals are highly stable, have broad specificity and good yields Design of stability  FireProtASR Design of stability Design of solubility  ProteinMPNN  https://huggingface.co/spaces/simonduerr/ProteinMPNN  deep learning model for protein optimization via mutations  takes structure on the input and provides optimized sequence folding into the same backbone  good for improving yields and rescuing folding-compromised designs  AffiLib  https://affilib.weizmann.ac.il/bin/steps  RosettaDesign and evolution analysis to optimize macromolecular interface  mutations for improvement of the binding affinity  up to 50 multiple-point mutants for protein synthesis Design of protein-protein interactions Design of mutations MD of free protein Docking MD of complex QM of complex Protein design Experiment Investigated system Knowledge Design of mutations  design of modified enzymes by in silico screening  study of effects of all relevant mutations  selection and combination of the best mutations PROTEIN ENGINEERING 8. Directed evolution Loschmidt Laboratories Department of Experimental Biology Masaryk University, Brno