Macromolecular complexes and interactions  Macromolecular complexes  Structure of complexes  Prediction of 3D structures of complexes  Analysis of macromolecular complexes Outline 2Macromolecular complexes and interactions  Types of biologically relevant complexes  Protein – small molecule   Protein – protein  Protein – nucleic acids  Nucleic acids – small molecule  Biological relevance 3Macromolecular complexes Macromolecular complexes Biological relevance  Many proteins are formed by two or more polypeptide chains (protomers) interacting with each other  Protein-protein and protein-nucleic acid interactions have central importance for virtually every process in a living cell (molecular recognition)  Regulation  Transport  Signal transduction  Genetic activity (transcription, translation, replication, repair, ...)  ... Macromolecular complexes  Oligomerization  Native interactions between proteins in native conditions  Aggregation  Interactions between native proteins at extreme conditions  Interactions between misfolded/partially folded proteins  disease Protein-protein complexes 5Macromolecular complexes – protein-protein complexes  Obligate complexes  Protomers (individual polypeptides) do not function as independent structures, only when associated  Examples: GABA receptors, ATP synthase, many ion channels, ribosome, etc.  Non-obligate complexes  Protomers can exist and be functional as independent structures  Examples: hemoglobin, beta-2 adrenergic receptor, insulin receptor, etc. Protein-protein complexes 6Macromolecular complexes – protein-protein complexes GABAB receptor  Oligomerization is common  More than 35 % of proteins in a cell are oligomers  Tetramer is the average oligomeric state of proteins in E. coli  Homo-oligomers – the most common  Some proteins exists solely in the oligomeric state  Oligomers are often symmetric  Oligomerization interfaces are complementary  Oligomerization is favored by evolution Protein oligomerization 7Macromolecular complexes – protein-protein complexes  Why do proteins form oligomers? Advantages of oligomerization 8Macromolecular complexes – protein-protein complexes  Morphological function  More complex structures are often required for multiple functions  Cooperative function  Allostery  Multivalent binding  Enhanced stability  Smaller surface area  More interactions  … (ex. Translation error control) Advantages of oligomerization 9Macromolecular complexes – protein-protein complexes  Characteristics of oligomeric interface  Large surface area (> 1400 Å2)  Tendency to circular and planar shape (not for obligates)  Some residues protrude from the surface  More non-polar residues (about 2/3) than in other parts of surface  More polar residues (about 1/5) than in protein cores  About 1 H-bond per 200 Å2  Hot-spot residues  Responsible for most of the oligomeric interactions  More evolutionary conserved than other surface residues  Frequently polar residues, located about the center of the interface Oligomerization interface 10Macromolecular complexes – protein-protein complexes  Protein-nucleic acid interactions  Non-specific – electrostatic interactions with negative charge on the backbone of nucleic acid -> Lys and Arg residues  Specific – recognition of particular nucleotide sequences  Major groove – B-DNA  Minor groove – A-DNA or A-RNA  Single strand RNA  Typical interfaces/motifs  DNA binding proteins  RNA binding proteins Protein-nucleic acids complexes 11Macromolecular complexes – protein-nucleic acids complexes  DNA binding proteins  Helix-turn-helix  Zinc finger Protein-nucleic acids complexes 12Macromolecular complexes – protein-nucleic acids complexes  RNA binding proteins  Recognition is often also governed by particular structures of RNA  Many motifs employed Protein-nucleic acids complexes 13 RNA recognition motif K-homology domain Pumilio repeat domain Macromolecular complexes – protein-nucleic acids complexes  Quaternary structure in PDB database  Complex or crystallization artifact? Structure of complexes 14Structure of complexes  Asymmetric unit (ASU)  Macromolecular structures from X-ray crystallography deposited to PDB as a single asymmetric unit  The smallest portion of a crystal structure to which symmetry operations can be applied in order to generate the unit cell  Unit cell (crystal unit)  The basic unit of a crystal that, when repeated in three dimensions, can generate the entire crystal Quaternary structure in PDB database 15Structure of complexes – quaternary structure in PDB database Quaternary structure in PDB database 16Structure of complexes – quaternary structure in PDB database  Crystal contacts  Intermolecular contacts solely due to protein crystallization  Causes artifacts of crystallization  Crystal packing - complicates identification of native quaternary structure Crystalline environment 17Structure of complexes – quaternary structure in PDB database ASU Crystal U  Artifacts of crystallization  Concerns about conformation of some surface regions  Often loops or side chains are affected  Can complicate the evaluation of the effects of mutations Crystalline environment 18Structure of complexes – quaternary structure in PDB database  Biological unit  The functional form of a protein in nature  Also called: functional unit, biological assembly, quaternary structure  Can depend on the environment, post-translational modifications of proteins and their mutations Quaternary structure in PDB database 19Structure of complexes – quaternary structure in PDB database Homotetramer hemoglobin  Biological unit can consist of:  Multiple copies of the ASU  One copy of the ASU  A portion of the ASU Biological versus asymmetric unit 20Structure of complexes – quaternary structure in PDB database ASU Biol. U  Large assemblies  Viral capsid  Filamentous bacteriophage PF1 Biological versus asymmetric unit 21Structure of complexes – quaternary structure in PDB database ASU Biol. U ASU Biol. U  Problem  Most proteins in the PDB have three or more crystal contacts that sum up to 30% of the protein solvent accessible surface area  How to recognize biologically relevant contacts from crystal one? Complex or artifact? 22Structure of complexes – complex or artifact?  Experimental knowledge of oligomeric state helps with identifying of the structure of native complex  Search literature  Experimental methods  Gel filtration, static or dynamic light scattering, analytical ultracentrifugation, native electrophoresis, …  How to get the structure of a biological unit?  Author-specified assembly  Databases  Predictive tools Complex or artifact? 23Structure of complexes – complex or artifact?  REMARK 350 in headers of PDB file  Contains symmetry operations to reconstruct biological unit, but…  Verify author-proposed biological unit by other means  Sometimes the specific oligomers were not known at the time the ASU was published  Some authors may have failed to specify the biological unit even when it was known  Rarely, the specified biological unit might be incorrect  Employed by  RCSB PDB and other tools Author-specified assembly 24Structure of complexes – complex or artifact?  RCSB PDB  Generates a PDB file in which all protein chains are as separate models  complicates visualization and analysis Author-specified assembly 25Structure of complexes – complex or artifact?  PyMOL  Generate > Symmetry mates  to visualize nearest partners  You can select some and combine them in a PDB file Crystal lattice 26Structure of complexes – complex or artifact? Prediction of 3D structure of complexes  How can we predict macromolecular complexes? Prediction of 3D structure of complexes 27 Prediction of 3D structure of complexes  Homology-based methods  Machine learning-based threading  Macromolecular docking Prediction of 3D structure of complexes 28 Homology based methods  The model of a protein complex is built based on a similar protein complex with a known 3D structure  Assumes that the interaction information can be extrapolated from one complex structure to close homologs of interacting proteins  Close homologs (≥ 40% sequence identity) almost always interact in the same way (if they interact with the same partner)  Sequence similarity is only rarely associated with a similarity in interactions  Limited applicability (low number of templates) Prediction of 3D structure of complexes – homology based methods 29 Homology based methods  HOMCOS (Homology Modeling of Complex Structure)  https://homcos.pdbj.org/  Predicts 3D structure of homodimers and heterodimers by homology modeling  Optionally, identifies potentially interacting proteins  Steps: 1. BLAST search to identify homologous templates in the latest representative dataset of heterodimer (homodimer) structures 2. Evaluation of the model validity by the combination of sequence similarity and knowledge-based contact potential energy 3. Generation of a script for building full atomic model by MODELLER Prediction of 3D structure of complexes – homology based methods 30 Homology based methods Prediction of 3D structure of complexes – homology based methods 31 Homology based methods Prediction of 3D structure of complexes – homology based methods 32 Machine learning-based  AlphaFold-Multimer  Predicts 3D structure of multimers; similar to AlphaFold Prediction of 3D structure of complexes – homology based methods 33 Experimental AlphaFold-multimerRMSD (Ca) = 0.81 Å Macromolecular docking  Prediction of the best bound state for given 3D structures of two or more macromolecules  Difficult task  Large search space - many potential ways in which macromolecules can interact  Flexibility of the macromolecular surface and conformational changes upon binding  Can be facilitated by prior knowledge  Ex: known binding site → significant restriction of the search space  Distance constraints on some residues Prediction of 3D structure of complexes – macromolecular docking 34 Macromolecular docking  Macromolecule representation  Search algorithm  Scoring function Prediction of 3D structure of complexes – macromolecular docking 35 Macromolecule representation  Representation of the macromolecular surface (applicable to both receptor and ligand)  Geometrical descriptors of shape (set of spheres, surface normals, vectors radiating from the center of the molecule,...)  Discretization of space: grid representation Prediction of 3D structure of complexes – macromolecular docking 36 Macromolecule representation  Macromolecule flexibility  Fully rigid approximation  Soft docking – employs tolerant “soft” potential scoring functions to simulate plasticity of otherwise rigid molecule  Explicit side-chain flexibility – optimization of residues by rotating part of their structure or rotation of whole side-chains using predefined rotamer libraries  Docking to molecular ensemble of protein structure – composed from multiple crystal structures, from NMR structure determination or from trajectory produced by MD simulation Prediction of 3D structure of complexes – macromolecular docking 37 Macromolecule representation  Macromolecule flexibility  Rigid body docking – basic model that considers the two macromolecules as two rigid solid bodies  Semiflexible docking – one of the molecules is rigid, and one is flexible (typically the smaller one)  Flexible docking – both molecules are considered flexible Prediction of 3D structure of complexes – macromolecular docking 38 Macromolecular docking - search  Generally based on the idea of complementarity between the interacting molecules (geometric, electrostatic or hydrophobic contacts)  The main problem is the dimension of the conformational space to be explored:  Rigid docking: 6D (hard)  Flexible docking: 6D + Nfb (impossible!)  Information on the rough location of the binding surface (experimental or predicted) → reduction of the search space Prediction of 3D structure of complexes – macromolecular docking 39 Macromolecular docking - search  Exhaustive search  Full search of the conformational space: try every possible relative orientation of the two molecules  Computationally very expensive – 6 degrees of freedom for rigid molecules (translations + rotations)  Grid approaches Prediction of 3D structure of complexes – macromolecular docking 40 Macromolecular docking - search  Stochastic methods  Monte Carlo  Genetic algorithms  Brownian dynamics  ... Prediction of 3D structure of complexes – macromolecular docking 41 Macromolecular docking - scoring  Scoring functions  Evaluation of a large number of putative solutions generated by the search algorithms  Methods often use a two-stage ranking 1. Approximate and fast-to-compute function – used to eliminate very unlikely solutions 2. More accurate function – used to select the best among the remaining solutions Prediction of 3D structure of complexes – macromolecular docking 42 Macromolecular docking - scoring  Scoring functions  Empirical  Knowledge-based  Force field-based  Clustering-based – the presence of many similar solutions is taken as an indication of correctness (all solutions are clustered, and the size of each cluster is used as a scoring parameter) Prediction of 3D structure of complexes – macromolecular docking 43  Good scores – a combination of several parameters:  Low free energy or pseudo-energy based on force field functions  Large buried surface area  Good geometric complementarity  Many H-bonds  Good charge complementarity  Polar/polar contacts favored  Polar/non-polar contacts are disfavored  Many similar solutions (large clusters)  ... Prediction of 3D structure of complexes – macromolecular docking Macromolecular docking - scoring 44 Macromolecular docking - programs Prediction of 3D structure of complexes – macromolecular docking 45 Macromolecular docking - programs  ClusPro 2.0  http://cluspro.bu.edu/  Performs a global soft rigid-body search using PIPER docking program; employs knowledge-based potential  The top 1,000 structures are retained and clustered to isolate highly populated low-energy binding modes  A special mode for prediction of molecular assemblies of homo-oligomers Prediction of 3D structure of complexes – macromolecular docking 46 Macromolecular docking - programs  PatchDock  http://bioinfo3d.cs.tau.ac.il/PatchDock/index.html  Performs a geometry-based search for docking transformations that yield good molecular shape complementarity (driven by local feature matching rather than brute force searching of the 6D space): 1. The molecular surface is divided into concave, convex and flat patches 2. Complementary patches are matched → candidate transformations 3. Evaluation of each docking candidate by a scoring function considering both geometric fit and atomic desolvation energy 4. Clustering of the candidate solutions to discard redundant solutions  Results can be redirected to FireDock for refinement and re-scoring Prediction of 3D structure of complexes – macromolecular docking 47 Macromolecular docking - programs  PatchDock Prediction of 3D structure of complexes – macromolecular docking 48 Macromolecular docking - programs  FireDock  http://bioinfo3d.cs.tau.ac.il/FireDock/index.html  Refines and re-scores solutions produced by fast rigid-body docking algorithms  Optimizes the binding of each candidate by allowing flexibility in the side-chains and adjustments of the relative orientation of the molecules  Scoring of the refined candidates is based on softened van der Waals interactions, atomic contact energy, electrostatic, and additional binding free energy estimations Prediction of 3D structure of complexes – macromolecular docking 49 Analysis of macromolecular complexes  Binding energy  Macromolecular interface  Interaction hot spots Analysis of macromolecular complexes 50 Binding energy  FastContact  http://structure.pitt.edu/servers/fastcontact/  Rapidly estimates the electrostatic and desolvation components of the binding free energy between two proteins  Additionally, evaluates the van der Waals interactions using CHARMM and reports contribution of individual residues and pairs of residues to the free energy → highlight the interaction hot spots Analysis of macromolecular complexes – binding energy 51 Macromolecular interface  The region where two protein chains or protein and nucleic acid chain come into contact  Can be identified by the analysis of the 3D structure of the macromolecular complex Analysis of macromolecular complexes – interface analysis 52 Interface analysis  Provides information about basic features of macromolecular complexes interactions (e.g., shape complementarity, chemical complementarity,...)  Provides information about interface residues  Acquired information is useful for a wide range of applications  Design of mutants for experimental verification of the interactions  Development of drugs targeting macromolecular interactions  Understanding the mechanism of the molecular recognition  Computational prediction of interfaces and complex 3D structures  ... Analysis of macromolecular complexes – interface analysis 53 Interface analysis  Most common approaches for the definition of interfaces:  Methods based on the distance between interacting residues  Methods based on the change in the solvent accessible surface area (ASA) upon complex formation  Computational geometry methods (using Voronoi diagrams)  All three approaches provide very similar results Analysis of macromolecular complexes – interface analysis 54 Interface analysis - databases  PDBsum (Pictorial database of 3D structures in the Protein Data Bank)  http://www.ebi.ac.uk/pdbsum/  Provides numerous structural analyses for all PDB structures and AlphaFold DB (human proteins), including information about protein-protein and protein-nucleic acid interfaces  Protein-protein interactions – schematic diagrams of all proteinprotein interfaces and corresponding residue-residue interactions  Protein-nucleic acid interactions – schematic diagrams of proteinnucleic acid interactions generated by NUCPLOT Analysis of macromolecular complexes – interface analysis 55 Interface analysis - databases  PDBsum Analysis of macromolecular complexes – interface analysis 56 Interface analysis - databases  PDBsum Analysis of macromolecular complexes – interface analysis 57 Interface analysis - tools  Analyze interface of a given macromolecular complex  PISA (Protein Interfaces, Surfaces and Assemblies)  MolSurfer  Contact Map WebViewer  PIC (Protein Interaction Calculator)  … Analysis of macromolecular complexes – interface analysis 58 Interface analysis - tools  PISA (Protein Interfaces, Surfaces and Assemblies)  www.pdbe.org/pisa  An interactive tool for the exploration of macromolecular interfaces (protein, DNA/RNA and ligands), prediction of probable quaternary structures, database searches of structurally similar interfaces and assemblies  Overview and detailed characteristics of all interfaces found within a given structure (including those generated by symmetry operations)  Provides interface area, ΔiG, potential hydrogen bonds and salt bridges, interface residues and atoms, ... Analysis of macromolecular complexes – interface analysis 59 Interface analysis - tools  MolSurfer  http://projects.villa-bosch.de/dbase/molsurfer/index.html  Visualization of 2D projections of protein-protein and proteinnucleic acid interfaces as maps showing a distribution of interface properties (atomic and residue hydrophobicity, electrostatic potential, surface-surface distances, atomic distances,...)  2D maps are linked with the 3D view of a macromolecular complex  Facilitates the study of intermolecular interaction properties and steric complementarity between macromolecules Analysis of macromolecular complexes – interface analysis 60 Interface analysis - tools  MolSurfer Analysis of macromolecular complexes – interface analysis 61 Interface analysis - tools  Contact Map WebViewer  http://cmweb.enzim.hu/  Represents residue-residue contacts within a protein or between proteins in a complex in the form of a contact map  PIC (Protein Interaction Calculator)  http://pic.mbu.iisc.ernet.in/  Identifies various interactions within a protein or between proteins in a complex Analysis of macromolecular complexes – interface analysis 62 Interaction hotspots  Hot spots: the residues contributing the most to the binding free energy of the complex  Knowledge of hot spots has important implications to:  Understand the principles of protein interactions (an important step to understand recognition and binding processes)  Design of mutants for experimental verification of the interactions  Development of drugs targeting macromolecular interactions  ... Analysis of macromolecular complexes – interaction hotspots 63 Interaction hotspots  Hot spots are usually conserved and appear to be clustered in tightly packed regions in the center of the interface  Experimental identification by alanine scanning mutagenesis  if a residue has a significant drop in binding affinity when mutated to alanine it is labeled as a hot spot  Experimental identification of hot spots is costly and cumbersome → the computational predictions of hot spots can help! Analysis of macromolecular complexes – interaction hotspots 64 Prediction of hotspots - tools  Most of the available methods are based on the 3D structure of the complex  Knowledge-based methods  Combination of several physicochemical features  Evolutionary conservation, ASA, residue propensity, structural location, hydrophobicity,...)  Energy-based methods  Calculation of the change in the binding free energy (∆∆Gbind) of the complex upon in silico modification of a given residue to alanine Analysis of macromolecular complexes – interaction hotspots 65 Prediction of hotspots - tools  Robetta  http://old.robetta.org/alascansubmit.jsp  Energy-based method  Performs in silico alanine scanning mutagenesis of protein-protein or protein-DNA interface residues 1. The side chain of each interface residue is mutated to methyl 2. All side chains within 5 Å radius sphere of the mutated residue are repacked; the rest of the protein remains unchanged 3. For each mutant, ∆∆Gbind is calculated (residues with predicted ∆∆Gbind ≥ +1 kcal/mol = hot spot) Analysis of macromolecular complexes – interaction hotspots 66 Prediction of hotspots - tools  Robetta Analysis of macromolecular complexes – interaction hotspots 67 Prediction of hotspots - tools  KFC2 (Knowledge-based FADE and Contacts)  https://mitchell-web.ornl.gov/KFC_Server/  Knowledge-based method utilizing machine learning  Predicts hot spots in protein-protein interfaces by recognizing features of important binding contacts – solvent accessibility, residue position within the interface, packing density, residue size, flexibility and hydrophobicity of residues around the target residue  Optionally, user can provide data to improve the prediction (ConSurf conservation scores, Rosetta alanine scanning results or experimental data) 68 Prediction of hotspots - tools  KFC2 (Knowledge-based FADE and Contacts) 69 References I  Liljas, A. et al. (2009). Textbook Of Structural Biology, World Scientific Publishing Company, Singapore.  Goodsell, D. S. & Olson, A. J. (2000) Structural symmetry and protein function. Annual Review of Biophysics and Biomolecular Structure 29: 105-153.  Demachenko, A. P. (2001). Recognition between flexible protein molecules: induced and assisted folding. Journal of Molecular Recognition 14: 42-61.  Ali, M. H. & Imperiali, B. (2005) Protein oligomerization: How and why. Bioorganic & Medicinal Chemistry 13: 5013-5020.  Jahn, T. R. & Radford, S. E. (2008) Folding versus aggregation: Polypeptide conformations on competing pathways. Archives of Biochemistry and Biophysics 469: 100-117.  Csermely, P. et al. (2010) Induced fit, conformational selection and independent dynamic segments: an extended view of binding events. Trends in Biochemical Sciences 35: 539- 546. References 70 References II  Bujnicki, J. (2009). Prediction of Protein Structures, Functions, and Interactions, John Wiley & Sons, Ltd., Chichester, p. 302.  Tramontano, A. (2005). The Ten Most Wanted Solutions in Protein Bioinformatics, CRC Press UK, London, p. 186.  Tuncbag, N., et al. (2009). A survey of available tools and web servers for analysis of protein-protein interactions and interfaces. Briefings in bioinformatics 10: 217-232.  Ezkurdia, I., et al. (2009). Progress and challenges in predicting protein-protein interaction sites. Briefings in bioinformatics 10: 233-246.  Fernández‐Recio, J. (2011). Prediction of protein binding sites and hot spots. Computational molecular science 6: 680-698.  Szilagyi, A., et al. (2005). Prediction of physical protein–protein interactions. Physical biology 2: S1-S16.  Moreira, I. S., et al. (2010). Protein-protein docking dealing with the unknown. Journal of computational chemistry 31:317-342 References 71