Pokrocila bioinformatika RNA world Pokrocila bioinformatika Conformations of RNA • Primary structure of RNA similar to DNA • RNA, like DNA, can be single or double stranded, linear or circular. • Unlike DNA, RNA can exhibit different foldings • Different folds permit the RNAs to carry out specific functions in the cell Pokrocila bioinformatika Central dogma The flow of genetic information DNA RNA Protein transcription translation replication Pokrocila bioinformatika RNA “Three” different types of RNA • mRNA - messenger RNA, specifies order of amino acids during protein synthesis • tRNA - transfer RNA, during translation mRNA information is interpreted by tRNA • rRNA – ribosomal RNA, combined with proteins aids tRNA in translation Pokrocila bioinformatika Small subunit 18S rRNA Pokrocila bioinformatika then: • Discovery of catalytic RNA (ribonuclease P, self-splicing introns, hepatitis delta virus, …) • Discovery of other roles of RNA – ncRNAs - functional RNA molecules (RNA other than mRNA) Genomic dark matter • Ignored by gene prediction methods • Not in EnsEMBL • Computational complexity • ~10% of human gene count? – RNA interference (siRNA, miRNA, tiny-noncoding RNA, small modulatory RNA, .. – cofactor RNA (telomerases,..) – …. to be discovered Pokrocila bioinformatika Local RNA structures in untranslated regions (UTR) • have known roles in regulation of gene expression : – mRNA stabilization • 5’ UTR elements in bacteria reduce mRNA degradation • 3’ UTR elements in eukaryotes control mRNA degradation – mRNA translation • Control and Rate of translation • IRES (viruses) – mRNA localization • Transport – development – mRNA processing • Splicing of introns (alternative) • In the coding regions, redundancy of the genetic code leaves (some) room for RNA sec. structure on top Pokrocila bioinformatika Central dogma The flow of genetic information DNA RNA Protein transcription translation replication Pokrocila bioinformatika Properties of RNA molecules • Assemble in double-stranded helices like DNA Carry GENETIC INFORMATION like DNA • Fold in complex tertiary architectures like proteins Perform CHEMICAL CATALYSIS like proteins Pokrocila bioinformatika Biological sequence analysis • Proteins – “easy” • RNAs - hard Pokrocila bioinformatika 2D structures Watson-Crick base paired helices Pokrocila bioinformatika Main building block the RNA double helix held together by Watson-Crick pairs Pokrocila bioinformatika 2D structures Watson-Crick base paired helices Internal loops (symmetric, Asymmetric, bulge) Hairpin loops Single-strands junctions Multi-branched loops from which three or more stems radiate Pokrocila bioinformatika Hairpins DNA strands with self-complementary base sequences have the potential to form hairpin structures. Formed only with a single DNA (or RNA) strand. Hairpin is a common secondary/tertiary structure in RNA. It requires complementarity between part of the strand. Pokrocila bioinformatika Hairpins Pokrocila bioinformatika Hairpins Pokrocila bioinformatika Pokrocila bioinformatika Pokrocila bioinformatika RNA tertiary structures In addition to secondary structural interactions in RNA, there are also tertiary interactions. pseudoknots (A) kissing hairpins (B) hairpin-bulge (C) These complicated structures are usually not predictable by secondary structure prediction tools. A B C Pokrocila bioinformatika RNA base pairing • Watson-Crick base pairs – Form double stranded helices – Define the 2D structure (Main building block) – Dependence on monovalent ions • Non-Watson-Crick base pairs – Form RNA motifs – Responsible for RNA-RNA recognition & 3D fold – Dependence on Divalent ions (Mg2+) Pokrocila bioinformatika Three Interacting Edges Watson-Crick Edge Hoogsteen Edge Sugar Edge Purins Pokrocila bioinformatika Three Interacting Edges Watson-Crick Edge “CH” Edge Sugar Edge Pyrimidins Pokrocila bioinformatika Glycosidic bond orientation cis (default) trans Pokrocila bioinformatika Edge-to-Edge Pairing Types Watson-Crick Hoogsteen Sugar-edge Watson-Crick Hoogsteen Sugar-edge cis transX X = 12 types Pokrocila bioinformatika Edge-to-Edge Pairing Types Pokrocila bioinformatika Annotations for Non-Watson-Crick Pairs Pokrocila bioinformatika Annotations for Non-Watson-Crick Pairs Pokrocila bioinformatika statistics… purines (R) pyrimidines (Y) 11.4 % 12.0 % 76.6% 94.8% 2.4 % 2.8 % Pokrocila bioinformatika • What is a RNA motif ? • How can we detect the presence of a motif in a given RNA ? • How can we compare motifs ? Pokrocila bioinformatika RNA motif A RNA motif is an ensemble of ordered elements under constraints. • Sequential motifs : Strict : -AUGFuzzy : -AAUAxAA• Structural motifs: GNRA, UNCG, CUUG (tetraloops) Boxes C/D or H/ACA (snoRNAs) Pokrocila bioinformatika WIREs RNA 2012, 3:397–414. doi: 10.1002/wrna.117 Pokrocila bioinformatika Evolution laws • Three-dimensional architectures evolve less with time than sequences • Three-dimensional structures are dictated first by folding rules and secondarily by function • The phonetic structure of words are more stable than the meaning of words Pokrocila bioinformatika RNA alignments RNA sequences are aligned/compared differently because sequence variation in RNA maintain base-pairing patterns Alignments of RNA sequences will show covariation at interacting base- pair positions Pokrocila bioinformatika Covariation Pokrocila bioinformatika … RNA folding procedures… • Water molecules • Counter-ions • Co-ions • Polyamines, … Pokrocila bioinformatika RNA base pairing • Watson-Crick base pairs – Form double stranded helices – Define the 2D structure (Main building block) – Dependence on monovalent ions • Non-Watson-Crick base pairs – Form RNA motifs – Responsible for RNA-RNA recognition & 3D fold – Dependence on Divalent ions (Mg2+) Pokrocila bioinformatika RNA/ion interactions Divalent cations Mg2+ Mn2+ Ca2+ Sr2+ … Monovalent cations Na+ K+ Rb+ Cs+ Tl+ … Anions Cl- SO42- … Pokrocila bioinformatika RNA folding procedures In vitro folding: Kinetic vs. thermodynamic control In vivo folding: Sequential 5’>3’ or co-transcriptional Modular and hierarchical Pokrocila bioinformatika Architectural hierarchy by modular assembly in RNA • Helices and hairpin loops first form • Helices build sub-domains by parallel or endto-end packing • Local and specific recognition contacts occur cooperatively between preformed sub- domains. Pokrocila bioinformatika Parallel packing of helices Pokrocila bioinformatika End-to-end stacking of helices Pokrocila bioinformatika RNA self-assembly & folding Coupled Architectural & Electrostatic Hierarchies • Formation of helices that build subdomains by parallel or end-to-end packing & rapid collapse to compact states induced by nonspecific ion binding; • Specific RNA-RNA recognition & cooperative transitions to native state promoted by specific ion binding. Pokrocila bioinformatika RNA self-assembly & folding Pokrocila bioinformatika Kinetic values… • Stacking of single-strands : 1 ms • Hairpin formation : 10 -100 ms • Tertiary structure formation : 10 -100 ms • Native state : 1s - 10 min Pokrocila bioinformatika Only three ways to pair four segments Pokrocila bioinformatika Modeling algorithms • 3D structure : assembly of fragments • Stress 3D fold rather than sequence (inverse folding) • Search for a «!consensus!» 3D fold (global architecture) • 2D Topology (not strongly correlated with sequence) - RNA is right-handed > righthandedness of stacks, of junctions Pokrocila bioinformatika Basics of RNA structure prediction Two primary methods of structure prediction – Covariation analysis/Comparative sequence analysis • Takes into account conserved patterns of basepairs during evolution (2 or more sequences). • Pairs will vary at same time during evolution yet maintaining structural integrity • Manifestation of secondary structure – Minimum Free-Energy Method • Using one sequence can determine structure of complementary regions that are energetically stable Pokrocila bioinformatika Comparative Sequence Analysis Molecules with similar functions and different nucleotide sequences will form similar structures • Predicts secondary and tertiary structure from underlying sequence • Correctly identifies high percentage secondary structure pairings and a smaller number of tertiary interactions • Primarily a manual method Pokrocila bioinformatika Positional Covariation Helix is formed from two sets of sequences that are not identical. C G A U (G C A A) A U C G Search for positions that co-vary. Positions that co-vary with one another are possible pairing partners. Pokrocila bioinformatika A U C G Minimum Free energy method C G A C G C A A G U C G C C G G A A C C G G C C A A A A G G U U C C G G C G A C G C A A G U C G Pokrocila bioinformatika Minimum Free energy method Hypothesis: • The native secondary structure is the one with the minimum free energy • Searching for structures with stable energies • First a dot matrix analysis is carried out to highlight complementary regions (diagonal indicates succession of complementary nucleotides) • The energy is then calculated for each predicted structure by summing negative base stacking energies Pokrocila bioinformatika Minimum Free energy method • Assumption: The energy of each base pair is independent of all of the other pairs and the loop structure. • Consequence: Total free energy is the sum of all of the base pair free energies. Pokrocila bioinformatika De novo modeling Pokrocila bioinformatika Single sequence secondary structure prediction CentroidHomfold Secondary structure prediction by using homologous sequence information CyloFold Secondary structure prediction method based on placement of helices allowing complex pseudoknots. GTFold Fast and scalable multicore code for predicting RNA secondary structure. IPknot Fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. KineFold Folding kinetics of RNA sequences including pseudoknots by including an implementation of the partition function for knots. Mfold MFE (Minimum Free Energy) RNA structure prediction algorithm. RNA123 Secondary structure prediction via thermodynamic-based folding algorithms and novel structure-based sequence alignment specific for RNA. RNAstructure A program to predict lowest free energy structures and base pair probabilities for RNA or DNA sequences. Programs are also available to predict Maximum Expected Accuracy structures and these can include pseudoknots. Structure prediction can be constrained using experimental data, including SHAPE, enzymatic cleavage, and chemical modification accessibility. Sfold Statistical sampling of all possible structures. The sampling is weighted by partition function probabilities. … Pokrocila bioinformatika RNA homology search software ERPIN "Easy RNA Profile IdentificatioN" is an RNA motif search program reads a sequence alignement and secondary structure, and automatically infers a statistical "secondary structure profile" (SSP). An original Dynamic Programming algorithm then matches this SSP onto any target database, finding solutions and their associated scores. Infernal "INFERence of RNA ALignment" is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). GraphClust Fast RNA structural clustering method to identify common (local) RNA secondary structures. Predicted structural clusters are presented as alignment. Due to the linear time complexity for clustering it is possible to analyse large RNA datasets. PHMMTS "pair hidden Markov models on tree structures" is an extension of pair hidden Markov models defined on alignments of trees. RaveNnA A slow and rigorous or fast and heuristic sequence-based filter for covariance models. RSEARCH Takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. Structator Ultra fast software for searching for RNA structural motifs employing an innovative index-based bidirectional matching algorithm combined with a new fast fragment chaining strategy. Pokrocila bioinformatika Benchmarking Pokrocila bioinformatika Inverse folding • Another direction in sequence design is designing a sequence that folds into a given secondary structure. • This problem is called inverse folding, because it is the inverse of the problem of finding the secondary structure of a sequence with the minimum free energy. The inverse folding problem is to find a sequence whose minimum energy structure coincides with the given one Pokrocila bioinformatika Inverse folding Main aim: discovery of novel, structured and functional RNAs in transcriptomic data. Pokrocila bioinformatika Inverse folding RNAinverse The ViennaRNA package provides RNAinverse, an algorithm for designing sequences with desired structure. RNAiFold A complete RNA inverse folding approach based on constraint programming and implemented using OR Tools which allows for the specification of a wide range of design constraints. RNA-SSD/RNA Designer The RNA-SSD (RNA Secondary Structure Designer) approach first assigns bases probabilistically to each position based probabilistic models. Subsequently a stochastic local search is used to optimize this sequence. INFO-RNA INFO-RNA uses a dynamic programming approach to generate an energy optimized starting sequence that is subsequently further improved by a stochastic local search that uses an effective neighbor selection method. RNAexinv RNAexinv is an extension of RNAinverse to generate sequences that not only fold into a desired structure, but they should also exhibit selected attributes such as thermodynamic stability and mutational robustness. This approach does not necessarily outputs a sequence that perfectly fits the input structure, but a shape abstraction, i.e. it keeps the adjacency and nesting of structural elements, but disregards helix lengths and the exact number unpaired positions, of it. RNA-ensign This approach applies an efficient global sampling algorithm to examine the mutational landscape under structural and thermodynamical constraints. … and many others Pokrocila bioinformatika EteRNA- http://www.eternagame.org/web/ Pokrocila bioinformatika Predikujte 3D strukturu RNA: GCUACGAAGGAAGGAUUGGUAUGUGGUAUAUU CGUAGC http://rnacomposer.cs.put.poznan.pl vyzkousejte vsechny mody predpovedi 2D struktury – jak se od sebe lisi jednotlive modely? – ktery z modelu se nejvice blizi experimentalni structure PDB:6E8S?