1 Cheminformatics Brno. Stuctural Bioinoformatics 2 Cheminformatics What is cheminformatics? Reading chemical structures from .sdf and .mol2 Chemical Tables (Sorting, Grid view, ..) Aromaticity Chirality Tautomer Enumeration Chemical Editor and View Modes Searching a Chemical Database (substructure & similarity) Formulating a Chemical Query Chemical Clustering and Trees Converting a Selection to 3D Property Prediction (LogP, LogS, DrugLikeness,...) 3 Cheminformatics: Introduction ˇ Explosive growth of the commercial chemicals. ˇ Millions of compounds available from 10­20 major vendors. ˇ Quickly changing: about 10­20% of a database may change every 3 months. New compounds emerge, old disappear ˇ Storage, Manipulations and Export of Chemical Libraries ˇ Predicted and Experimental Compound Properties ˇ Screening and SAR data ˇ Analog Design ˇ Virtual Compound Libraries ˇ Searching Chemical Libraries 4 Linear String Notation of Chemicals: Smiles and Smarts Smiles ­ Chemical Structures: ˇ A good tutorial can be found at the Daylight site: http://www.daylight.com/dayhtml/smiles ˇ Common atoms are represented by element symbols: C,N,O,Cl, .. ˇ Rare elements, charges, isotopes, are shown like this [Au], [H+] ˇ Single bonds are not shown, double bonds are `=`, tripple: `#' ˇ Branching is shown by parentheses (e.g. CC(=O)O) ˇ Ring closure is shown by matching digits ( C1CCCC1 ) Smarts ­ Chemical Patterns: ˇ [C,N,O] a list of possibilities, `*' ­ any atom, ~ any bond ˇ [C;R] in ring, c1ccccc1 5 Smarts: Atoms and Bonds ˇ http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html Atoms * ­ any, a ­ aromatic, A ­ aliphatic, D ­ neighbors, Q ­ noncarbon, Hn ­ total H­count, Rn ­ring membership, rn ­ ring size, vn ­ valence, #n ­ atomic number, @ ­ chirality, @@ clockwise chirality Bonds Logical operations: ! ­ not & ­ and , (comma) ­ or ; ­ and (weak) 6 Chemical Searching, Smarts Ring Membership Ring Size Attachment Point Hybridization Connectivity Charge Hydrogens H0, H1, H2.., Valence: v5, Isotopes R1, R2 r5,r6 [*] or [C*] ^3,^2,^1 D3,D2 [Mg++] [N­] 7 Chemical Searching: several fragments Just draw them side by side. Smart: Smart1 dot Smart2, e.g. CCO.O 8 Chemical Searching: variable bond distance A special type of bond can be introduced that defines a range of interatomic distances in terms of the minimal number of chemical bonds 9 Chemical Searching: Properties 10 Chemical Searching: Similarity ˇ Exact Match ˇ Substructure Searching ˇ Pattern Searching ˇ Similarity Searching (Tanimoto of Fingerprints) ˇ Divide both structures (A and B) into small fragment ˇ Merge fragment lists and form two "bit­strings", e.g. 010001000111 and 101111011001 ˇ Calculate a Tanimoto distance as nAB/nTotal nAB is the number of on­bits which are in common. ˇ Tanimoto distance is between 0.0 and 1.0 Chemical Similarity 11 Pharmacophore Searching Hydrogen bond acceptors Hydrogen bond donors Charge Hydrophobicity Aromatic Ring Centers A pharmacophore was first defined by Paul Ehrlich in 1909 as "a molecular framework" that carries the essential features responsible for a drug's biological activity" (Ehrlich. Dtsch. Chem. Ges. 1909, 42: p.17). Ligand Receptor Complex 12 Converting to 3D ˇ Mol­files: 0D (x=y=z=0), 2D (z=0), 3D ˇ Necessary for conversion: ­ Correct bond orders ­ Formal charges ­ Stereo indicators (chirality, cis­trans for 0D) ­ ICM converts 2D pictures and optimizes them in MMFF94 force field. 13 Aromaticity ˇ Requires a cyclic conjugated array of orbitals in the same plane. ˇ Huckel's Rules: The total number of electrons in the system 4n+2 , where n=0,1,2 ˇ Identical aromatic systems can be represented by different pattern of single and double bonds. ICM/Molcart matches aromaticity, not = and _. Example: Two ways to draw bonds in an aromatic system 14 Chirality ˇ " I call any geometrical figure, or group of points, chiral, and say that it has chirality, if its image in a plane mirror, ideally realized, cannot be brought to coincide with itself" . Lord Kelvin, Baltimore Lectures, 1884 ˇ Chiral centers: Four different substituents. ˇ Four states: Unset; R (rectus); S (sinister) and RS (unknown, can be a racemic mixture). ˇ Chirality (R or S) can be shown with stereo-bonds. ˇ You may need to enumerate all enantiomers to find which one has biological activity ˇ There are more complex types of chirality (e.g. axial chirality) 15 Stereoisomers The Sequence Rule for Assignment of Configurations to Chiral Centers Assign sequence priorities to the four substituents by looking at the atoms attached directly to the chiral center. The higher the atomic number of the immediate substituent atom, the higher the priority. ICM will automatically determine the state and generate stereoisomers 16 Tautomers Tautomers are formed by an interconvertible reaction called tautomerization whereby there is a formal migration of a hydrogen atom along with a switch of a single bond and an adjacent double bond. During tautomerization a chemical equilibrium of the tautomers will be reached based on several factors, including, pH, temperature and solvent. 17 Vendor Compounds vs Natural ˇ Non­charges ˇ Non­chiral ˇ Simple ˇ Flexible ˇ Charged ˇ Chiral ˇ Complex ˇ Rigid General Trends Vendor chemistry Natural compounds 18 Compound Properties, ADMET ˇ Potent compound in vitro fail because of: ˇ Absorption (from gut, through membranes, to blood). FDp (Fraction of Dose in portal vein). Fh (hepatic vein) ­ Disposition: from plasma to cells ­ Elimination: mainly via liver and kidney ˇ Distribution (to tissues) ˇ Metabolism ˇ Excretion ˇ Toxicity 19 Lipinski Rule of Five Correlates with absorption and permeation ˇ <= 5 hydrogen­bond donors ˇ <= 5*2 HB­acceptors ˇ <= 500 molecular weight ˇ cLogP <= 5 ˇ (extra: <= 5 torsions) 20 Properties (Contd) log S The aquous solubility of a compound significantly affects its absorption and distribution characteristics. Typically, a low solubility goes along with a bad absorption and therefore the general aim is to avoid poorly soluble compounds. PSA Polar Surface Area hERG potassium ion channels govern the repolarization phase of human ventricular action potentials. Many drugs or their metabolites cause hERG block, which can lead to cardiac arrhythmias and sudden death. Cytochrome P450 (CYP) Isoenzymes CYP isoenzymes are responsible for oxidative metabolism of many drugs, steroids and carcinogens. CYP isoenzymes are a group of heme­containing enzymes embedded primarily in the lipid bi­layer of the endoplastic reticulum of hepatocytes (liver cells). 21 Predicting Drug­Likeness Drugs Non-Drugs Support Vector Machine Cross-validated Training Descriptors www.molsoft.com Only 20 to 40% of the vendor database appear to be drug-like Types of numerical problems 4) Two class (SVM) 5) Multi-class (SVM-multiclass) 6) Quantitative param., e.g. LogP (PLS, SVM-regression) ... Concerns ˇ Insufficient data / Over-training ˇ Choice of descriptors ˇ Normalization of the descriptors ˇ Choice of the Non-Active training set Drug-likeness prediction ˇ World Drug Index compounds were filtered and divided into 2 groups ˇ SVM trained on group 1 ˇ Tested on group 2. 83% of the test group assigned correctly D-L Predictor 22 logP ­Predictions We have trained Partial Least Squares (PLS) Regression model on the set of 13151 compounds with expiremental logP values. The correlation coefficient (r2) for fitting this training database is 0.98, and the standard deviation (rmsd) is 0.38. The cross­validation test on randomly taken 50% of compounds as a train set and other 50% as a test set gives r2=0.94 and rmsd=0.61 In addition to good prediction quality this method is extremely fast and can be applied to large datasets. 23 Molcart >3x106 Diverse unique compounds MySQL Cheminformatics, chemical searching, clustering... Molcart N Property Prediction, Log P, LogS ... Compound databases of any size can be stored in Molcart and analyzed and searched using ICM cheminformatics and docking tools. SDF Files Chemical Vendors Molcart is a state of the art enterprise wide chemical database management system.