The ENCODE Registry of ccREs and SCREEN Jill Moore, PhD Henry Pratt, MD/PhD student Zhiping Weng Lab University of Massachusetts Medical School The ENCODE Encyclopedia ENCODE Encyclopedia Registry of candidate cis-Regulatory Elements Promoter-like signatures DNase-seq & ATAC-seq (DHSs, peaks) Integrativelevel Annotationsfrommultiple datatypes Groundlevel Annotationsfromindividual datatypes ChIA-PET (interactions) Gene expression (levels) RAMPAGE (TSS activity) TF ChIP-seq (peaks, motifs) DNA methylation (levels) Histone mark ChIP-seq (peaks) RNA binding proteins (peaks, motifs) Chromatin states (ChromHMM, Segway) Linked genes Available Future plan Hi-C (TADs, compartments) Variant annotation (HaploReg, FunSeq RegulomeDB) Allele-specific events ... Rawdata &metadata ... ENCODEportal Reads (FASTQs) Mapped reads (BAMs) Signal (bigWigs) Uniform processing pipelines UCSC genome browser SCREEN Moore et al., Figure 2 Enhancer-like signatures CTCF-only Open chromatin regions across 28 cell types Some regions are ubiquitously open, particularly those at TSSs Distal regions are cell type specific but maintain same boundaries We can represent chromatin accessibility across biosamples as a set of consensus sites called representative DNase hypersensitivity sites (rDHSs) In total, we curate 2.1 M rDHSs in GRCh38 and 1.1 M rDHSs in mm10 Annotating rDHSs with ChIP-seq signals to define ccREs Scale chr12: 10 kb hg19 53,750,000 53,755,000 53,760,000 53,765,000 53,770,000 53,775,000 SP1 SP1 SP1 SP1 GM12878dnase 16 - 0 _ GM12878_4me3 20 - 0 _ GM12878_27ac 75 - 0 _ GM12878_ctcf 159.266 - 0 _ DNase H3K4me3 H3K27ac CTCF ccREs are a subset of rDHSs supported by high DNase AND high H3K4me3, high H3K27ac, and/or high CTCF signal in at least one biosample This results in 1.4M GRCh38 ccREs and 499k mm10 ccREs ccRE classification •  Promoters à promoter-like signatures (PLS) - Overlap annotated TSSs - Have high DNase and high H3K4me3 signals •  Enhancers à enhancer-like signatures (ELS) - Can be proximal or distal (pELS vs dELS) - Have high DNase and high H3K27ac signals •  Other regulatory elements •  DNase-H3K4me3: possibly novel promoters or poised enhancers -  Do not overlap TSS -  High DNase, high H3K4me3, but low H3K27ac signals •  CTCF-only: boundary elements/insulators - High DNase, high CTCF, but low H3K4me3 and H3K27ac signals 37k 185k 1M 56k 98k GRCh38 ccREs Classifying ccREs: cell type specific DNase H3K4me3 H3K27ac CTCF hepatocyte PLS pELS dELS DNase- H3K4me3 CTCF-only DNase-only Low DNase 19,820 30,708 31,320 8,299 22,638 24,913 1,328,658 hepatocyte specific ccREs (N=137k): Classifying ccREs: partial data classification •  For biosamples without all the four core marks, we implement a partial classification scheme •  Example: Spinal cord astrocyte We can annotate: PLS DNase-H3K4me3 CTCF-only DNase-only Low-DNase •  Without DNase data we just annotate with high or low signal •  All missing data is marked in SCREEN (Henry will show in demo) Orthogonal data support our classifications •  Transcription data: -  RAMPAGE -  CAGE -  GRO-seq & PRO-seq •  Functional validation assays: -  Mouse transgenic experiments -  MPRA Advantages to using the Registry of ccREs 1.  High resolution elements: widths between 150-350 bp DNase H3K4me3 H3K27ac Dermisfibroblastcells Advantages to using the Registry of ccREs 1.  High resolution elements: widths between 150-350 bp 2.  Boundaries of loci remain constant across hundreds of biosamples DNase H3K4me3 H3K27ac CTCF DNase H3K4me3 H3K27ac CTCF hepatocyte neuron Advantages to using the Registry of ccREs 1.  High resolution elements: widths between 150-350 bp 2.  Boundaries of loci remain constant across hundreds of biosamples 3.  ccREs are accessioned EH37XXXXXXX (human GRCh37/hg19 genome) EH38XXXXXXX (human GRCh38/hg38 genome) EM10XXXXXXX (mouse mm10 genome) Advantages to using the Registry of ccREs 1.  High resolution elements: widths between 150-350 bp 2.  Boundaries of loci remain constant across hundreds of biosamples 3.  ccREs are accessioned 4.  Easy data exploration and integration via SCREEN Data integration examples Differential gene expression and ccRE activity Histone marks RNA-seq WGBS Histone marks RNA-seq WGBS DNase Gene-ccRE links Hi-C Rao...Aiden (2014) Cell ChIA-PET eQTLs Tang...Ruan (2015) Cell crisprQTLs 613k ccREs 36k genes 52 biosamples Gasperini...Shendure (2019) Cell GWAS integration NHGRI-EBI GWAS Catalog 2) Identifying disease relevant biosamples 1) Predicting casual variants 397 studies with > 20 tagged SNPs 3,878 studies Tagged variant LD variant No overlap Live demo