Lecture 6 : smallRNA-seq and IP methods Modern methods for genome analysis (PřF:Bi7420) Vojta Bystry vojtech.bystry@ceitec.muni.cz NGS data analysis 22 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… Small RNA-seq 2 • Next-generation sequencing of short RNAs allows for profiling of various short (non-coding) RNAs (microRNAs, piRNAs, tRNAs,…) • Widely used method for identification of disease biomarkers => cancer research • Special interest is in small RNAs that are part of circulatory system (biofluids) because these can serve as non-invasive biomarkers Sequencing Bioinformatic analysisBlood collection miR-26a low high Small RNAs pool - microRNAs 3 • ~22nt long, regulate expression of other RNAs • Mature miRNA binds to the 3’ UTR of coding RNA (mRNA) -> degradation • ~2,000 mature miRNA known for human • isomiRs = sequence variants of miRNA No protein miRNA biogenesis canonical 5’ UAGCUUAUCAGACUGAUUGA 3’ AGCUUAUCAGACUGAUUGA GUAGCUUAUCAGACUGAUUGA UAGCUUAUCAGACUGAU UUAGCUUAUCAGACUGAUUGA CUAGCXUAUCAGACUGAUUG A 5’ isomiRs 3’ isomiRs polymorhpic isomiRs seed precursor miRNA AAGCCGUAGCUUAUCAGACUGAUUGACGAGCGC Small RNAs pool – tRNA fragments 4 • ~14-45nt long • Participate in various biological processes -> research ongoing! Small RNAs pool – other small RNAs 5 • PIWI-interacting RNAs (piRNAs) -> ~30nt long, most expressed in germinal cells where we know what they do; found to be expressed in the somatic cells as well but functions mainly unknown • Small nucleoler RNAs (snoRNAs) -> chemical modification of other RNAs • Small nuclear RNAs (snRNAs) -> pre-processing of coding RNAs in the nucleus • Y RNA-derived small RNAs -> DNA replication? • mRNA fragments -> random or not? CEITEC at Masaryk University 7 Module 1: First QC • Quality control of raw sequencing data • Scans FASTQ files for presence of adapters Results: • List of detected adapters (exact sequences) • Html/PDF report with plots and tables summarizing the quality of raw data Module 2 8 Module 2: Pre-processing • Adapters trimming • Trimming of low-quality bases, discarding of short reads • Read collapsing based on UMIs (if present) Results: • Cleaned FASTQ files • Html/PDF report with plots and table summarizing number of reads after each pre-processing step nucleotide position numberofreads miRNA piRNA too short reads trimmed reads untrimmed reads Length distribution for sample METSEQ-T03 9 Module 3: RNA quantification • Complicated due to complex nature of different short RNAs • Requires individual approach for each class of short RNAs • miRNA identification VS isomiRs identifications (3’/5’ additions, precursor ambiguity,…) 10 Module 3: RNA quantification • Complicated due to complex nature of different short RNAs • Requires individual approach for each class of short RNAs • piRNAs nested within other coding/non-coding RNAs A)A A) B) CEITEC at Masaryk University 12 Module 3: RNA quantification • Complicated due to complex nature of different short RNAs • Requires individual approach for each class of short RNAs -> most problematic are tRFs and piRNAs NGS data analysis 1212 Raw data .fastq Genome/Transcriptome Reference Mapping .bam Interaction analysis CHIP-seq Expression analysis RNAseq Variant analysis WES de-multiplexing Not known reference QC QC Experiment design Not ”classic” reference Metagenomics Reference assembly Immunogenetic VDJ-genes CRISPR sgRNA Methylation Bisulfide-seq… ChIP-seq IP methods RNA IP methods RNA IP methods Primary analysis + QC ● Alignment – standard DNA (RNA for CLIP) ● RNA-seq like QC ○ Check sequencing quality ○ RSeQC – Read Dstribution Primary analysis + QC ● IP experiment quality control ○ Sample correlation ■ Replicates control treatment ○ Strand cross-correlation ■ Shift of strand mapping ■ Shift should correlate with expected fragment size Primary analysis + QC ● IP experiment quality control ○ Sample correlation ■ Replicates control treatment ○ Strand cross-correlation ■ Shift of strand mapping ■ Shift should correlate with expected fragment size Primary analysis + QC ● Fingerprint profile ○ profile of cumulative read coverages ○ how evenly are the reads distributed over the genome Peak calling ● Read extension Peak calling ● Statistical assessment of peaks against background ● Background ○ Control sample – recommended ○ Model background from overall coverage of the sample ● Peak calling annotation ● Differential peak calling Post peak calling QC ● FRIP score = fraction of reads in peaks ○ High number is good ○ However can be low in specific experiments and still the experiment be OK ● Average peak profile 24 Peak calling results 25www.ceitec.eu CEITEC @CEITEC_Brno Vojta Bystry vojtech.bystry@ceitec.muni.cz Thank you for your attention!