LOSCHMIDT LABORATORIES Structure of biomolecules □ Proteins Primary structure Secondary structure Tertiary structure Motifs and folds Quaternary structure □ Nucleic acids ■ Main types of structures □ Primary structural databases □ Structural data formats □ PDB and mmCIF formats Structure of biomolecules Primary structure Secondary structure Lys Lys Gly Gly 1-■-1 Leu -^- Val 4^—J Ala His Tertiary structure Quaternary structure Amino acid residues a Helix Polypeptide chain Assembled subunits Proteins - hierarchy of protein structure □ 20 L-amino acids (natural) □ Side chains ■ Charged, polar, hydrophobic A. Amino Acids with Electrically Charged Side Chains Positive Arginine (Arg) O Amino acid backbone Histidine (His) (J) NH -s Lysine (Lys)Q NH, NH, B. Amino Acids with Polar Uncharged Side Chains Serine Threonine Asparagi ie Glutamine (Ser)Q (Thr)Q (A»n) 0 (Gln)0 )-NH2 )-NH2 > -NH2 )—NH ( HO—( < < \ OH \ > ° _) NH2 Negative _A_ S- -\ Aspartic Acid Glutamic Acid (Asp)0 (Glu)Q NH, NH, C. Special Cases Side . chain Cysteine Glycine Proline (Cys) 0 (Gly) 0 (Pro)Q "P HP \-NH2 \_ -NH2 V \ SH s Amino group HoN 'a t Chiral centre COOH \ Acid group D. Amino Acids with Hydrophobic Side Chains Alanine Valine Isoleucine (Ala) Q (Val)Q dl') O Leucine (Leu)0 0=\ o=( NH. Methionine Phenylalanine Tyrosine (Met)(J) (Ph.)Q (Tyr)Q Tryptophan (Trp) © NH, □ Linear chain of amino acid residues MSLGAKPFGEKKFIEIKGRRMAYIDEGTGDPILFQHGNPTSSYLWRPJIM N-terminus C-terminus r □ Protein backbone ■ From N-terminus to C-terminus ■ Connected by covalent bonds □ Peptide bond (amide bond) Partial double bond character -> Planar geometry RiV 0 R2 k,^-; + . i <—> + i ^ i r Hj+N—CH—C-N—CH—COO" H3 N—CH—C=N—CH—COO" v. H2N-C — C Amino H I C I \ OH Acid Amino H l ^ R2 Acid O OH condensation -H20 Amino Acid Amino Proteins - primary structure Geometry of protein backbone □ Conformation of the peptide chain ■ Defined by O (phi) and ^ (psi) dihedral angle □ Ramachandran plot (0, 4J) -> The majority of proteins follow this distribution 180 ■180 % Amide plane Side group Amide plane ■180 180 cp (phi) = dihedral angle {C - N - Ca - C} Y (psi) = dihedral angle {N - Ca - C - N} Proteins - primary structure □ Local three-dimensional structure of polypeptide chain □ Governed by hydrogen bonding between backbone atoms □ DSSP (hydrogen bond estimation algorithm) ■ The most common method for assigning secondary structure ■ Starts by identifying the intra-backbone hydrogen bonds (between NH.....0=C) ■ Hydrogen bond exists if E < -0.5 kcal/mol ■ The type of repetition will assign the residue to one of 7 types (3 major types: helices, strands and loops) E = 0.084 { —+ —-------— 1 - 332 kcal/mol L roN rcH r0H ^CN J Polypeptide bond Proteins - secondary structure □ Types of helices ■ 3.613 helix (a-helix) - most common ■ 310 helix - less frequent, end of a-helices ■ 4.116 helix (n-helix) (rare) ■ Left-handed helix (very rare) -> Represented by helical cartoons or cylinders IP □ Right-handed (mostly) □ Hydrogen bonding ■ Within a single chain 310-helix ff* ^ Left-handed a-helix ( R-helix . .. a-hehx Proteins - secondary structure Type 3M a 71 Residues per turn 3.0 3.6 4.1 Atoms in H-bonded ring 10 13 16 Hydrogen bonding n - n + 3 n — 71 + 4 77 - n + 5 Angle between neighboring residues 120 100 88 Helical rise per amino acid residue (A) 2.0 1.5 1.15 -75 -60 -75 -5 -45 -40 □ Types of typical ß-structures ■ ß-sheets ■ ß-turns ■ ß-bulge ■ Polyproline helices □ Hydrogen bonding ^ ■ Between adjacent chains ß-sheets 1 polyproline helicei Proteins - secondary structure the sheets Proteins - secondary structure □ (3-turns ■ Short structures (4-5 residues) ■ Connects two (3-strands ■ Ideally H-bond between backbone of n and n+3 residues ■ Often includes glycine or proline on specific positions □ (3-bulge ■ Frequently occurs in antiparallel (3-sheets ■ Disrupts ideal H-bonding pattern ■ Increases twists of a sheet □ Polyproline helices ■ Typical in collagen and other strong fibers ■ Left-handed triple-stranded helix (unlike most of other helices) ■ Composed of three chains of repetitive sequence (Proline- □ Global three-dimensional structure of protein □ Governed mainly by hydrophobic interactions involving □ Supersecondary structures (motifs) ■ Small substructures formed by several secondary structures □ Domain ■ Structurally (functionally) independent regions ■ Compact parts of structure - around single hydrophobic core ■ Formed in separate folding unit (fold independently) □ Fold ■ General architecture of protein ■ Type of protein structure □ ß-harpin □ ß-meander □ Greek key □ Jellyroll Proteins - tertiary structure □ Helix-turn-helix □ Helix bundle □ ßaß unit Proteins - tertiary structure □ Parts of tertiary structure ■ Separate folding ■ Independent structure ■ Usually up to 200 residues Pyruvate kinase in glycolysis pyruvate kinase ADP ATP PEP -> pyruvate Nucleotide binding Substrate binding Regulatory domain Proteins - tertiary structure □ Some folds are very common, some are rare □ Classification of folds ■ Biochemical ■ Globular, membrane, fibrous proteins, intrinsically disordered ■ Structural □ Number of folds Theoretical maximum: 10,000 Currently: 1,195 (SCOP) vs 1,373 (CATH) all-a, all-ß, a/ß and a+ß proteins Proteins - tertiary structure □ Globular proteins □ Membrane proteins □ Fibrous proteins Structural classification of folds □ All-a (entirely a-helices) Up-and-down bundle Globin-like □ All-ß (entirely ß-strands) Jellyroll ß barrel ß propeller Proteins - tertiary structure □ a/ß (sequence alternates between a-helices and ß-strands) Rossmann TIM barrel □ a+ß (a-helices and ß-strands occur separately in sequence) ß-Grasp (ubiquitin-like) Proteins - tertiary structure □ Association of several protein chains (monomers/subunits) into oligomers (multimers) ■ Homomeric protein - from identical monomers ■ Heteromeric protein - from different types of monomers Homotetramer hemoglobin Heterodimer tryptophan synthase Heterotetramer immunoglobin Proteins - quaternary structure □ Composition Nucleotide Phosphate group is bonded to 5' carbon of sugar o II i -o—p—o f—u —ia o r Phosphate N Nitrogenous base group ©charge Sugar ch2 *c' h hc1 I \'_■ /1 h 3 sugar puckering ■ Denotes the phosphate-phosphate proximity ■ Two main types of conformation Nucleotide milt BASE C3'-endo (A-Conformation) BASE C2'-endo (B-Conformation) Base OR H 2'-deoxyribose (in DNA) Nucleic acids - primary structure □ Local interactions between nucleotide bases □ DNA base pairs: Adenine - Thymine Cytosine - Guanine □ RNA base pairs: Adenine - Uracil Cytosine - Guanine □ Complementarity due to hydrogen bonds Nucleic acids - secondary structure □ Leontis/Westhof classification Three base-paring edges ■ Watson-Crick (WC) ■ Hoogsteen (H) ■ Sugar(S) 12 types of base-paring 4- ^ DO • wc O ■ H □ ► s > cis trans ^7 Anti-parallel pairs ^ Parallel pairs Nucleic acids - secondary structure Tertiary structure of DNA □ Overall three-dimensional arrangement and folding □ Three types: A-DNA, B-DNA, Z-DNA □ B-DNA is the most common (described by Watson & Crick) A-DNA (rare) B-DNA (predominant) Z-DNA Type A-DNA B-DNA Z-DNA Helix sense Right Right Left Bases per turn 11 10.5 12 Helical rise per nucleotide (A) 2.6 3.4 3.7 Sugar pucker C3'-endo C2'-endo C2'-endo C3'-endo Nucleic acids - tertiary structure of DNA □ Grooves: crucial for DNA-protein interactions □ Major groove: wide and deep - where most proteins interact BSB BIBB □ Quaternary structures - with support of proteins □ Quaternary structures - with support of proteins Histonc Acetyl group (()( H, B: DNA acetylation DNA inaccessible, gene inactive DNA accessible, gene active Nucleic acids - higher structures of DNA Secondary structures of RNA □ Most common form: A-RNA helix (similar to A-DNA) Nucleic acids - secondary structure of RNA Secondary structures of RNA □ Junctions Regions connecting two or more stems Two-stem, three-stem and four-stem junction 5" 3' - - 3' 5' 5 3" F 3 -*J-J-^ ^LJ_L Nucleic acids - secondary structure of RNA Secondary structures of RNA □ Harpin loops ■ Sequence inversely self-complementary GGCUGGCUGUUCGCCAGCC 5' 3' 5' 3" 5' 3' 5' 3" Many subtypes - e.g.: GNRA, ANYA, UNCG tetraloops CI4 5" 3' 5' 3' C91 Nucleic acids - secondary structure of RNA Secondary structures of RNA □ Very complex-stem-loop structure Sz9- I I I I I ■ - ■ I I I I I I I Vfi' o S s ° I I I I I ■ I I I I I I T X 1=1 S 1 'coann^n nn*1 I I I I I u--t>a nrjrj I I I I I I ri^n-njn •aJ> = °nn_qH if. ■ 1 III *■I I I I I 1. 11 I -1 I I I I ■ j_ ■ ' a— rc- PI _o_ n 8 a. =^**7i- g=| 3ts-s-< a- I II 5 1 iii' ,_= 1 a—0 I I I I ■ ■ 1 i^-LuTm! I I ' -1 ■ ' I I I I I 11 ■ I ■ I I - • qZi-^i-jZiZi-^qziCiZiZiZi ^ ^ r«:ZiZirjiTi nrinri^iGO I I I I I I I I I I I I I I I I I I I I *v-^r?«-irT^i>i-i n-^ nann nrn r.]*nn*r.M II I I I ■ I I I I ■ if V if "= Nucleic acids - secondary structure of RNA Tertiary structures of RNA A-RNA dodecamer Phenylalanine transfer RNA Group I intron ribozyme Hammerhead ribozyme Guanine riboswitch Nucleic acids - tertiary structures of RNA □ Association of several chains of RNA Frequently joined with proteins Eukaryotic ribosome - ~ 6800 nt, 79 proteins P-tRNA 60S Growing peptide chain Outgoing empty tRNA Incoming tRNA bound to Amino Acid Ribosome Peptide Synthesis Ribosome in action: https://www.voutube.com/watch?v=Jml8CFBWcDs Nucleic acids - quaternary structure of RNA □ Worldwide Protein Data Bank (wwPDB) http://www.wwpdb.org/ □ RCSB Protein Data Bank (RCSB PDB) http://pdb.rcsb.org □ Nucleic Acid Knowledgebase (Nucleic Acid Database) https://www.nakb.org/ □ Biological Magnetic Resonance Data Bank (BMRB) https://bmrb.io/ □ Electron Microscopy Data Bank (EMDB) http://www.emdatabank.org/ □ Cambridge Structural Database (CSD) http://www.ccdc.cam.ac.uk/products/csd/ W O R L D W IDE SPDB P ROTE IN DATA B AN K SBPDBe Protein Data Bank in Europe PROTEIN DATA BANK NAKB Nucleic Acid Knowledgebase BMRB Biological. Magnetic Resonance Data Bank % EMDB Electron Microscopy Data Bank ...More details in lesson 4! Primary structural databases □ Different file formats used to represent 3D structure data ■ PDB ■ mmCIF ■ PDBML ■ MOL2 ■ ■ ■ ■ □ The spatial 3D coordinates and other information are recorded for each atom □ Designed in the early 1970s - first entries of PDB database □ Rigid structure of 80 characters per line, including spaces □ Still the most widely supported format Structural data formats - PDB format structure anrwtation arrnno acid field cofactor filed HEADER LYASE (CARBON-CARBON) 03-JUT -95 1DNP TITLE STRUCTURE OF DEQJIYPIBODIPYRIHIDINE PBQTOLYASE ■ * ■ -hid SOURCE 2 ORGANISM SCIENTIFIC: ESCHERICHIA COLI KEYWDS UN* REPAIR , ELECTRON TRANSFER , EXCITATION FHRRGV TRANSFER, KE*VJD£ 2 LYASE , CARBON-CARBON ATOM 21 ND1 HIS A :i 27.866 62.971 1.00 1 .07 N ATOM 22 CD2 EilS A 3 57*200 2B.354 61.B94 1.00 13.12 C ATOM 23 CEl HIS A 2 56.124 26.793 62.991 1.00 13.03 c ATOM 24 NE 2 HTS A 3 57,243 27.052 62.334 1.00 a. V) N ATOM 25 H LEU A 4 55.5B0 32*694 59.656 1.00 12. 61 N ATOM 26 CA LEU A 4 54.799 33.803 59.113 1.00 11.56 c ATOM 21 C I.KLI A 1 53,552 33.265 53.374 1.00 7.76 c ATOM 28 t: LEU A 4 53*650 32.363 57.532 1.00 6.99 0 ATOM 29 CE LEU A 4 55.656 34.603 59.174 1.00 9.Q3 c ATOM 30 CG LEU a 1 54.946 35.8B7 57.518 1.00 ?A)0 c ATOM 31 CDl LEU A 4 54.623 36.920 56.550 6.2L c iir- -MT'. 7641 AN7 FAD H 472 27.855 76.556 29.073 l . ::o 4.55 N HETATM 7642 AC 5 FAD 472 2B.524 76.026 27.955 1.00 2.00 c HETATH 7643 AC 6 FAD P. 472 2 9.049 77.609 2 7. f24 1.00 3.40 c BETATM 7iVH AN 6 FAD 1! 472 3D.7B7 77.757 26.654 1 .::(■ b.s? N / aiorn number / residua rsne residue number I \ x. y, z coordinates occupancy temperature factor I atom type atom polypeptide chain identifier Structural data formats - PDB format □ Atomic coordinates □ Chemical and biological features □ Experimental details of the structure determination □ Structural features ■ Secondary structure assignments ■ Hydrogen bonding ■ Biological assemblies ■ Active sites ■ ■ ■ • https://www.wwpdb.orq/documentation/file-fo^ • https://wwwxql.ucsf.edu/chimera/docs/UsersGuide/tutorials/pdbintro.html □ Advantages Widely used -> supported by majority of tools Easy to read and easy to use Can be manually edited -> Suitable for accessing individual entries Structural data formats - PDB format □ Disadvantages ■ Potential inconsistency between individual PDB entries as well as PDB records within one entry Ex: different residue numbering in SEQRES and ATOM sections -> Not suitable for computer extraction of information Primary sequence Atoms and residues in the file seqres 1 ~±9e Qmet sp glu asn ile thr ala ala pro ala asp pro ile seqres 2 396 leu gly leu ala asp leu phe arg ala asp glu arg pro atom 1 n met 5 41 .402 11 .897 15 2 62 1.00 48 .61 atom 2 ca met 5 40 919 13 .262 15 600 1.00 47.70 atom 9 n phe 6 39 627 14 .840 14 228 1.00 48 .66 atom 10 ca phe _ 6 39 199 15 .440 12 964 1.00 45 .33 Structural data formats - PDB format □ Disadvantages ■ Absolute limits on the size of certain items of data Ex.: max. number of atom records limited to 99,999; max. number of chains limited to 26, etc. -> Large systems such as the ribosomal subunit must be divided into multiple PDB files -> Not suitable for analysis and comparison of experimental and structural data across the entire database Structural data formats - PDB format □ Macromolecular crystallographic information file (mmCIF) □ Developed to handle increasingly complicated structural data □ Each field of information is explicitly assigned by a tag and linked to other fields through a special syntax PDB HEADER PLANT SEED PROTEIN 1 l-OCT-91 1CBN mmCIF _struct. entry_id '1CBM' _Struct.title 'PLANT SEED PROTEIN' _struct_keywords.entry_id 11CBN' _struct_keywords.text 'plant seed protein1 _database_2,database_id 'PDB' _database_2,database_code 11CBN' _database_PDB_rev.rev_num 1 database PDB rev,date original '1991-10-11 Structural data formats - mmCIF format □ Advantages ■ Easily parsable by computer software ■ Consistency of data across the database -> Suitable for analysis and comparison of experimental and structural data across the entire database □ Disadvantages ■ Difficult to read ■ Rarely supported by visualization and computational tools -> Not suitable for accessing individual entries Structural data formats - mmCIF format □ Protein Data Bank Markup Language (PDBML) □ Extensible Markup Language (XML) version of PDB format < PDBx:datablock datablockName="EXAMPLE■ xmlns:PDBx="http://deposit.pdb.org/pdbML/pdbx-vl.000.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://deposit.pdb.org/pdbML/pdbx-vl.000.xsd pdbx-vl.000.xsd"> < PDBx: ent i ty__polyCategory> oly entity_id= " 111 > no no DIVLTQSPASLSASVGETVTITCRASGNIHNYLAWYQQKQGKSPQLLVYYTTTLADG VPSRFSGSGSGTQYSLKINSLQPEDFGSYYCQHFWSTPRTFGGGTKLEIK < PDBx:pdbx_seq_one_le11 er_code_can> DIVLTQSPASLSASVGETVTITCRASGNIHNYLAWYQQKQGKSPQLLVYYTTTLADG VPSRFSGSGSGTQYSLKINSLQPEDFGSYYCQHFWSTPRTFGGGTKLEIK Structural data formats - PDBML format □ Gu, J. & Bourne, P. E. (2009). Structural Bioinformatics, 2nd Edition, Wiley-Blackwell, Hoboken. □ Liljas, A. et al. (2009). Textbook Of Structural Biology, World Scientific Publishing Company, Singapore. □ Schwede, T. & Peitsch, M. C. (2008). Computational Structural Biology: Methods and Applications, World Scientific Publishing Company, Singapore. □ Schaeffer, R.D & Daggett, V. (2011). Protein folds and protein folding. Protein Engineering, Design & Selection 24:11-19.