Detection and extraction of key structural regions (patterns) Lukáš Pravda @WebchemTools ncbr.muni.cz/webchemistry 2 Outline  Variety of structural patterns  How can we find them?  Software tools  PatternQuery  Channels/tunnels/pores  Identification of channels and their properties  Software tools  MOLE  Practical session with examples Variety of structural patterns Lukáš Pravda @WebchemTools ncbr.muni.cz/webchemistry 4 Variety of structural patterns 5 Detection  The goal is to identify and possibly extract biological regions of ones interest within biomolecular structure.  Including but not limited to:  Active/binding/interaction sites  Sequences of amino acids or nucleic acids  Pockets/channels or void.  Super secondary motifs. 6  Database wide detection enables us to carry out experiments which not has been feasible before.  Output of these searches are often an input for further analyses:  Structural and functional assignment of newly determined structures.  Comparative analyses  Design and engineering of novel functional sites  Study of binding modes of certain atoms/residues OK, but wait why do we need them? 7 Software tools  A plethora of different software tools – these are usually a single purpose:  Detection of ligands  Binding site identification  Pockets/cavities  Channels  In house scripts and tools  The question is, can we do any better? 8 PatternQuery  Web-based application designed for detection and extraction of molecular (sub)structures - patterns of user interest.  Uses unique python like query language to define composition, topology and connectivity of these patterns.  Allows querying single structures as well as the entire PDB or its subset based on a number of criteria (organism of origin, resolution, date of release, …) 9 How does it look like? http://ncbr.muni.cz/PatternQuery 10 PatternQuery – Structure of language  Generator queries  Atoms(), Residues(), RegularMotifs()  Modifier queries  ConnectedResidues(), AmbientAtoms(), Filter()  Combinatory queries  Or(), Near(), Cluster()  So far some 50 different queries, which can be readily used! 11 PatternQuery – Thinking in queries  Find binding pocket of all ligands in the protein structure (distance <= 4Å) 12 PatternQuery – Thinking in queries  Find binding pocket of all ligands in the protein structure (distance <= 4Å) 13 Build a query I Atoms("Ca"). AmbientResidues(4) Atoms("Ca") 14 Build a query II Atoms("Ca") . AmbientResidues(4) . Filter(lambda l: l.Count(Atoms() > 6)) 15  Post-translationaly modified aminoacids  Het atoms not covalently bound to protein  Residues with a sugar moiety Biologically interesting queries I. HetResidues() . Filter(lambda l: l.IsNotConnectedTo(AminoAcids())) ModifiedResidues()NotAminoAcids() . Filter(lambda l: HetResidues() == 6)) Or(Rings(4 * ["C"] + ["O"]).ConnectedResidues(0), Rings(4 * ["C"] + ["O"]).ConnectedResidues(0)) 16  PA Lec-B sugar binding site Biologically interesting queries II. Near(4, Atoms("Ca"), Atoms("Ca")) .AmbientResidues(3) .Filter(lambda l: l.Count(Or(Rings(5 * ["C"] + ["O"]), Rings(4 * ["C"] + ["O"]))) > 0) .Filter(lambda l: l.Count(Atoms("P")) == 0) Questions? OK, let’s move ON! @WebchemTools ncbr.muni.cz/webchemistry Channels Lukáš Pravda @WebchemTools ncbr.muni.cz/webchemistry 19 Protein empty voids 20 What are the channels/tunnels?  A type of protein empty void.  Connects active/binding site with the bulk solvent.  Spans through membrane  They greatly influence protein specificity, selectivity and rate of chemical processes.  They look pretty(-ish ) 21 How can we find them?  Over the time a number of approaches has been developed.  Presently the most successful one relies on Delaunay Triangulation and Dijkstra’s algorithm.  Other approaches involves:  Grid search  Slice and optimization algorithms  Sphere-filling methods 22 Software tools  MOLE  CAVER  MolAxis  ChexVis  BetaVoid  HOLE  And others… 23 Use case – aquaporin 0  Large family of proteins permitting permeation of various molecules – mainly water.  Channel is a tight fit for water molecules.  How can water permeate through the channel, while protons don’t?  ar/R region in blue 24 Use case – bunyavirus  Negative-strand RNA viruses are serious human pathogens (Crimean-congo fever, Lassa fever, influenza).  How one can kill a virus?  Design a channel inhibitor! 25 Physicochemical properties – channel duality 26 MOLE computation  Input: Protein structure + set of parameters  Output: Channel profile, properties and lining r. 27 Result analysis - properties  Channel length vs channel radius  Check presence of bottlenecks and local narrowings.  Channel flexibility 28 Result analysis - properties  Hydropathy, polarity, mutability, formal charge  Evaluate independent layers as well as entire channel. 29 Where are my channels - I?  Q: No channel has been identified  Why?  A: Wrong set up of ProbeRadius, InteriorThreshold or Filtering criteria.  A: Substrate is blocking channel  Cyclooxygenase-2 (PDB ID: 4cox) complexed with non-selective inhibitor indomethacin 30 Where are my channels - II?  Q: No channel has been identified  Why?  A: Active site is located on the surface on its vicinity  Pocket-like channel found in tyrosine kinase EPh4 (PDB ID 2vwx) 31 Where are my channels - III?  Q: No channel has been identified  Why?  A: No channel is there whatsoever 32 Where are my channels - IV?  Q: None of the found channels is relevant to me?  A: Multiple reasons. Usually wrong set up of ProbeRadius or InteriorThreshold parameters Questions? After a break we can continue with the hands-on experience! https://goo.gl/f5YrcE @WebchemTools ncbr.muni.cz/webchemistry