LOSCHMIDT LABORATORIES Structure of biomolecules □ Proteins ■ Primary structure ■ Secondary structure ■ Tertiary structure ■ Motifs and folds ■ Quaternary structure □ Nucleic acids ■ Main types of structures □ Primary structural databases □ Structural data formats □ PDB and mmCIF formats Primary Secondary Tertiary Quaternary structure structure structure structure □ 20 L-amino acids (natural) □ Side chains ■ Charged, polar, hydrophobic A. Amino Acids with Electrically Charged Side Chains Positive Arginine (Arg) O Amino acid backbone Side chain Histtdine (His) (J) NH -\ Lysine (Lys)Q NVNH B. Amino Acids with Polar Uncharged Side Chains Zl Serine Threonine Asparagine Glutamine (Ser)Q (Thr)Q (Asn) (J (Gln)0 )-NH2 )-NH2 >- NH2 )—NH ( HO—( < ( \ OH \ y O ) NHj Negative Aspartic Acid Glutamic Acid (Asp)Q [Clu)Q NH, 0b C. Special Cases Side . chain Amino group c; a H2NT f XOOH 1 \ Chiral centre Acid group D. Amino Acids with Hydrophobic Side Chains Alanine Valine Isoleucine (Ala) Q (Val)Q dl') O 0=\ o=( Leucine (Leu)Q "p NH. Methionine Phenylalanine Tyrosine (Met)(J) (Ph.)Q (Tyr)Q Tryptophan (Trp) © Cysteine Glycine Proline (Cys) Q (Gly) © (Pro) Q Vnh2 -NH2 V \ SH N NH, □ Linear chain of amino acid residues MSLGAKPFGEKKFIEIKGRRMAYIDEGTGDPILFQHGNPTSSYLWRfJIM N-terminus C-terminus r □ Protein backbone ■ From N-terminus to C-terminus ■ Connected by covalent bonds □ Peptide bond (amide bond) Partial double bond character -> Planar geometry RiV 0 R2 1^1/"* . „n- i + i ^ i r Hj+N—CH—C-N—CH—COO" H3 N—CH—C=N—CH—COO" H2N-C — C Amino H I C I \ OH Acid Amino H l ^ R2 Acid O OH condensation -H20 Amino Acid Amino Proteins - primary structure □ Conformation of the peptide chain ■ Defined by O (phi) and ^ (psi) dihedral angle □ Ramachandran plot (0, 4J) -> The majority of proteins follow this distribution 180 ■180 % Amide plane Side group Amide plane ■180 180 cp (phi) = dihedral angle {C-, - N - Ca - C} i|j (psi) = dihedral angle {N - Ca - C - N+1} Proteins - primary structure □ Conformation of the peptide chain ■ Defined by O (phi) and ^ (psi) dihedral angle □ Ramachandran plot (0, 4J) -> The majority of proteins follow this distribution Hi I V / Amide plane Side group Amide plane -180 180 cp (phi) = dihedral angle {C-, - N - Ca - C} i|j (psi) = dihedral angle {N - Ca - C - N+1} Proteins - primary structure □ Local three-dimensional structure of polypeptide chain □ Governed by hydrogen bonding between backbone atoms □ Types of structures Helices (3-Structures Regular patterns Loops and coils - Irregular patterns Helices Strands Loops Proteins - secondary structure □ DSSP (hydrogen bond estimation algorithm) The most common method for assigning secondary structure Starts by identifying the intra-backbone hydrogen bonds (between NH.....0=C) Hydrogen bond exists if E < -0.5 kcal/mol The type of repetition will assign the residue to one of 7 types (3 major types: helices, strands and loops) E = 0.084 { — +- roH ton 7JV J 332 kcal/mol Polypeptide bond Proteins - secondary structure □ Types of helices ■ 3.613 helix (a-helix) - most common ■ 310 helix - less frequent, end of a-helices ■ 4.116 helix (n-helix) (rare) ■ Left-handed helix (very rare) -> Represented by helical cartoons or cylinders IP □ Right-handed (mostly) □ Hydrogen bonding ■ Within a single chain 310-helix At / Left-handed a-helix R-helix . .. a-hehx Proteins - secondary structure Type 3M a ft Residues per turn 3.0 3.6 4.1 Atoms in H-bonded ring 10 13 16 Hydrogen bonding n - n + 3 n - « + 4 71-11 + 5 Angle between neighboring residues 120 100 88 Helical rise per amino acid residue (A) 2.0 1.5 1.15 -75 -60 -75 -5 -45 -40 □ Types of typical ß-structures ■ ß-sheets ■ ß-turns ■ ß-bulge ■ Polyproline helices □ Hydrogen bonding ^ ■ Between adjacent chains ß-sheets 1 polyproline helicei Proteins - secondary structure the sheets Proteins - secondary structure □ (3-turns ■ Short structures (4-5 residues) ■ Connects two (3-strands ■ Ideally H-bond between backbone of n and n+3 residues ■ Often includes glycine or proline on specific positions □ (3-bulge ■ Frequently occurs in antiparallel (3-sheets ■ Disrupts ideal H-bonding pattern ■ Increases twists of a sheet □ Polyproline helices ■ Typical in collagen and other strong fibers ■ Left-handed triple-stranded helix (unlike most of other helices) ■ Composed of three chains of repetitive sequence (Proline- □ Global three-dimensional structure of protein □ Governed mainly by hydrophobic interactions involving side chains of amino acid residues Proteins - tertiary structure □ Supersecondary structures (motifs) ■ Small substructures formed by several secondary structures □ Domain ■ Structurally (functionally) independent regions ■ Compact parts of structure - around single hydrophobic core ■ Formed in separate folding unit (fold independently) □ Fold ■ General architecture of protein ■ Type of protein structure □ Helix-turn-helix □ Helix bundle mi if * □ ßaß unit Proteins - tertiary structure ■iiittiiHiiaiiH □ ß-harpin V □ ß-meander □ Greek key □ Jellyroll V i V A y V k Proteins - tertiary structure □ Parts of tertiary structure ■ Separate folding ■ Independent structures ■ Usually up to 200 residues Ex: pyruvate kinase pyruvate kinase ADP ATP PEP pyruvate Nucleotide binding Substrate pfV/ binding Regulatory domain Proteins - tertiary structure □ Some folds are very common, some are rare □ Classification of folds ■ Biochemical ■ Globular, membrane, fibrous proteins, intrinsically disordered ■ Structural ■ all-a, all-ß, a/ß and a+ß proteins □ Number of folds ■ Currently: 1,195 (SCOP) vs 1,373 (CATH) ■ Theoretical maximum: 10,000 Proteins - tertiary structure □ Globular proteins □ Membrane proteins □ Fibrous proteins Structural classification of folds □ All-a (entirely a-helices) Up-and-down bundle Globin-like □ All-ß (entirely ß-strands) Jellyroll ß barrel ß propeller Proteins - tertiary structure □ a/ß (sequence alternates between a-helices and ß-strands) Rossmann TIM barrel □ a+ß (a-helices and ß-strands occur separately in sequence) ß-Grasp (ubiquitin-like) Proteins - tertiary structure □ Association of several protein chains (monomers/subunits) into oligomers (multimers) ■ Homomeric protein - from identical monomers ■ Heteromeric protein - from different types of monomers □ Composition Nucleotide Phosphate group is bonded to 5' carbon of sugar ©charge Sugar -Q—p—o —I5 o N Nitrogenous Phosphate \ / -^- group 3™^2 Nitrogenous 5-carbon base is bonded to 1' carbon of sugar -ch 2 JO*. cv h hc1 I\l_I oh oh Ribose in RNA suqar -ch 2 ^0«. h hc1' h 3c—c2. h oh h Deoxyribose in DNA Nitrogenous base o nh, h,c □ Phosphate 1 □ Pentose sugar III Cytosine(C) Uracil (U) in RNA Thymine (T) in DNA □ HeterOCyClJC b3Se Pyrimidines nh, nh n j Guanine (G) nh, n' n j Adenine (A) J N Purines are larger than pyrimidines □ DNA bases: A, T; G, C □ RNA bases: A, U; G, C [ Purines ] □ Rotation about glycosidic bond HO O. NH2 N HO HO NH2 N^O O, OH anti OH syn OH anti The anti conformation is dominant in DNA with rare exceptions Nucleic acids - basic building blocks □ Linear chain of nucleotides (oligonucleotides or polynucleotides) fc^CGAATTCGC|G~| o- o- I I o—p=o o—P=0 Nucleic acids - primary structure □ Linear chain of nucleotides (oligonucleotides or polynucleotides) rC^CGAATTCGCJG~| □ Sugar-phosphate backbone ■ Covalent character ■ Phosphodiester bond ■ From 5'-end to 3'-end 5' end HO 0 NH A, ,0^ N NH2 NH- O O-P-0 o 0° O-P-0. o N N NH2 N Me O O-P-0. O. *N oligonucleotide dGCAT (d indicates deoxyribose sugar, or a DNA sequence) 0 o NH oo 3' end T Nucleic acids - primary structure Sugar-phosphate backbone □ Very flexible backbone ■ Six torsion angles To base □ Ribose is not planar -> sugar puckering ■ Denotes the phosphate-phosphate proximity ■ Two main types of conformation 5.9 A I"' BASE ,0 C3'-endo (A-Conformation) BASE C2'-endo (B-Conformation) Base OR H 2'-deoxyribose (in DNA) Nucleic acids - primary structure □ Local interactions between nucleotide bases □ DNA base pairs: Adenine-Thymine Cytosine - Guanine □ RNA base pairs: Adenine - Uracil Cytosine - Guanine □ Complementarity due to hydrogen bonds Nucleic acids - secondary structure □ Leontis/Westhof classification Three base-paring edges ■ Watson-Crick (WC) ■ Hoogsteen (H) ■ Sugar(S) 12 types of base-paring • wc O ■ H □ ► s > DO cis trans ^7 Anti-parallel pairs ^ Parallel pairs Nucleic acids - secondary structure Tertiary structure of DNA □ Overall three-dimensional arrangement and folding □ Three types: A-DNA, B-DNA, Z-DNA □ B-DNA is the most common (described by Watson & Crick) A-DNA (rare) B-DNA (predominant!) Z-DNA (rarer) Type A-DNA B-DNA Z-DNA Helix sense Right Right Left Bases per turn 11 10.5 12 Helical rise per nucleotide (A) 2.6 3.4 3.7 Sugar pucker C3'-endo C2'-endo C2'-endo C3'-endo Nucleic acids - tertiary structure of DNA □ Grooves: crucial for DNA-protein interactions □ Major groove: wide and deep - where most proteins interact BSB BIBB □ Quaternary structures - with support of proteins □ Quaternary structures - with support of proteins Histone Acetyl group (()( H, B: DNA acetylation DNA inaccessible, gene inactive DNA accessible, gene active Nucleic acids - higher structures of DNA Secondary structures of RNA □ Most common form: A-RNA helix (similar to A-DNA) Nucleic acids - secondary structure of RNA Secondary structures of RNA □ Junctions Regions connecting two or more stems Two-stem, three-stem and four-stem junction 3' - - 3' 5' 5 3" F 3 -*J-J-^ ^LJ_L Nucleic acids - secondary structure of RNA Secondary structures of RNA □ Harpin loops ■ Sequence inversely self-complementary GGCUGGCUGUUCGCCAGCC 5' 3' 5" 3' 5' 3' 5' 3' Many subtypes - e.g.: GNRA, ANYA, UNCG tetraloops C91 5" 3' 5' 3' Nucleic acids - secondary structure of RNA Secondary structures of RNA □ Very complex-stem-loop structure 4 » u I I I I I ■+ • I I I I I I I I ....." *..... innnnnnj|.ijrilJiiin*iStfiu in {/■ """n a J. J- = " C 9 ?4i ono a.1 ° I I I I I ■ I I I I I I i *V____ 1 1 iIiQ ■ I I I I I I I I I I I 'VjnP rj--t>a nrjrj r.j^i ir.j I I I I I I i km i :. :i i K ft* ■h-3 *| III *■I I I I I 1. 11 I ■ I I I I I rj in i rj rj rj . .....-J ..... a— r^e^^ . :_ . : il-lJ a n— q 2_5c I* II I I I I • ■■ I **■ I I - ■ I ■ - I I I I III-I-II-- I 11 J., till- ,_ = 1 a—fl :-: 5 * I I I I I I I I I I I I 2 S W 1 l-n-S jjsrtii-, r"?-^ slining Ora jil-^rnrj-^rjrj^ 3rj asat^new* II I I I - I I I I 4 -ji i i im.i irj irjr i i - Nucleic acids - secondary structure of RNA Tertiary structures of RNA A-RNA dodecamer Phenylalanine transfer RNA Group I intron ribozyme Hammerhead ribozyme Guanine riboswitch Nucleic acids - tertiary structures of RNA □ Association of several chains of RNA m Frequently joined with proteins Eukaryotic ribosome - ~ 6800 nt, 79 proteins P-tRNA 60S Growing peptide chain Outgoing empty tRNA Incoming tRNA bound to Amino Acid Ritx>some Peptide Synthesis Ribosome in action: https://www.voutube.com/watch?v=Jml8CFBWcDs Nucleic acids - quaternary structure of RNA □ Worldwide Protein Data Bank (wwPDB) http://www.wwpdb.org/ □ RCSB Protein Data Bank (RCSB PDB) http://pdb.rcsb.org □ Nucleic Acid Knowledgebase (Nucleic Acid Database) https://www.nakb.org/ □ Biological Magnetic Resonance Data Bank (BMRB) https://bmrb.io/ □ Electron Microscopy Data Bank (EMDB) http://www.emdatabank.org/ □ Cambridge Structural Database (CSD) http://www.ccdc.cam.ac.uk/products/csd/ ...More W O R L D W IDE SPDB P ROTE IN DATA B AN K fflPDBe Protein Data Bank in Europe on PROTEIN DATA BANK NAKB Nucleic Acid Knowledgebase BMRB Biological. Magnetic Resonance Data Bank % EMDB Electron Microscopy Data Bank details in lesson 3! Primary structural databases □ Different file formats used to represent 3D structure data ■ PDB ■ mmCIF ■ PDBML ■ MOL2 ■ ■ ■ ■ □ The spatial 3D coordinates and other information are recorded for each atom □ Designed in the early 1970s - first entries of PDB database □ Rigid structure of 80 characters per line, including spaces □ Still the most widely supported format Structural data formats - PDB format structure anrwtation amino acid field cofactor filed HEADER LYASE f CARBON-CARBON) 03-JUT -95 1DNP TITLE STRUCTURE OF DE 03t YPI BOD IP YRI MID INE PHQTOLYASE SOURCE 2 ORGANISM SCIENTIFIC: ESCHERICHIA COLI KEYWDS DNA REPAIR, ELECTRON TRANSFER , EXCITATION ENERGY TRANSFER, J£E YWDS 2 LYASE , CARBON-CARBON ATOM 21 ND1 HIS A :\ 27.366 62.971 1.00 ] 1 . 07 N ATOM 22 CD2 EilS A 3 57,200 2B.354 61.694 1,00 13.12 C ATOM. 23 CEl HIS A 3 56.124 26.793 62.991 1.00 13.03 c ATOM" 24 NE 2 HTS A A 57,243 27,052 62.334 1.00 8.19 N ATOM 25 M LEU A 4 55,5B0 32,694 59,656 1,00 12.61 N ATOM 26 CA LEU A 4 54.799 33.603 59.113 1.00 11.56 c ATOM. 27 C LEU A 1 53,552 33,26? 56,374 1,00 7 . 7 i i c ATOM. 28 0 LEU A 4 53*650 32.363 57,532 1,00 6.99 0 ATOM 29 CE LEU A ■4 55.656 34.663 56.174 1.00 9.03 c ATOM 30 CG LEU 1 54*946 35,887 57,518 1,00 2.00 c ATOM. 31 CDl LEU A 4 54.623 36*920 56.550 1,00 6.21 c BETATH 7641 AN 7 HAl :■ H 472 27.$55 76,556 29,073 1.00 4.55 N HE T ATM 7642 ACS FAD t 472 28.524 76.026 27,955 1.S0 2.00 c HE T ATM 7643 AC6 FAD E 472 2 9.&46 77.609 21. f24 1.00 3.40 c BETATH _ 7644 AN6 FAD 1! 472 3D.7B7 77.757 25,664 1 .::(■ (i./? N / number / residue rsne residue number I \ x, y. z coordinates occupancy temperature factor atom type atom name polypeptide chain identifier Structural data formats - PDB format □ Atomic coordinates □ Chemical and biological features □ Experimental details of the structure determination □ Structural features ■ Secondary structure assignments ■ Hydrogen bonding ■ Biological assemblies ■ Active sites ■ ■ ■ • https://www.wwpdb.orq/documentation/file-for^ • https://www.cql.ucsf.edu/chimera/docs/UsersGuide/tutorials/pdbintro.html □ Advantages Widely used -> supported by majority of tools Easy to read and easy to use Can be manually edited -> Suitable for accessing individual entries Structural data formats - PDB format □ Disadvantages ■ Potential inconsistency between individual PDB entries as well as PDB records within one entry Ex: different residue numbering in SEQRES and ATOM sections -> Not suitable for computer extraction of information Primary sequence Atoms and residues in the file SEQRES 1 3 9 6 QMET SP GLU ASN ILE THR ALA ALA PRO ALA ASP PRO ILE SEQRES « + 4 2 396 LEU GLY LEU ALA ASP LEU PHE ARG ALA ASP GLU ARG PRO ATOM 1 N MET 5 41 402 11 .897 15 2 62 1.00 48.61 ATOM 2 CA MET 5 40 919 13 .262 15 600 1.00 47.70 ATOM 9 N PHE 6 39 627 14 .84 0 14 228 1.00 48.66 ATOM 10 CA PHE 6 39 199 15 .44 0 12 964 1.00 45.33 Structural data formats - PDB format □ Disadvantages ■ Absolute limits on the size of certain items of data Ex.: max. number of atom records limited to 99,999; max. number of chains limited to 26, etc. -> Large systems such as the ribosomal subunit must be divided into multiple PDB files -> Not suitable for analysis and comparison of experimental and structural data across the entire database Structural data formats - PDB format □ Macromolecular crystallographic information file (mmCIF) □ Developed to handle increasingly complicated structural data □ Each field of information is explicitly assigned by a tag and linked to other fields through a special syntax PDB HEADER PLANT SEED PROTEIN 1 l-OCT-91 1CBN mmCIF _struct.entry_id '1CBN' _Struct.title 'PLANT SEED PROTEIN 1 _struct_keywords.entry_id 11CBN' _struct_keywords.text 'plant seed protein1 _database_2.database_id 'PDB' _database_2.database_code 11CBN' _database_PDB_rev.rev_num 1 database PDB rev,date original '1991-10-11 Structural data formats - mmCIF format □ Advantages ■ Easily parsable by computer software ■ Consistency of data across the database -> Suitable for analysis and comparison of experimental and structural data across the entire database □ Disadvantages ■ Difficult to read ■ Rarely supported by visualization and computational tools -> Not suitable for accessing individual entries Structural data formats - mmCIF format □ Protein Data Bank Markup Language (PDBML) □ Extensible Markup Language (XML) version of PDB format < PDBx:datablock datablockName="EXAMPLE■ xmlns:PDBx="http://deposit.pdb.org/pdbML/pdbx-vl.000.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://deposit.pdb.org/pdbML/pdbx-vl.000.xsd pdbx-vl.000.xsd"> < PDBx: ent i ty__polyCategory>