D24–D33 Nucleic Acids Research, 2020, Vol. 48, Database issue Published online 8 November 2019 doi: 10.1093/nar/gkz913 Database Resources of the National Genomics Data Center in 2020 National Genomics Data Center Members and Partners*,† Received September 15, 2019; Revised September 30, 2019; Editorial Decision October 01, 2019; Accepted October 02, 2019 ABSTRACT The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements in higher-throughput and lower-cost sequencing technologies and accordingly the huge volume of multi-omics data generated at exponential scales and rates, NGDC is continually expanding, updating and enriching its core database resources through big data integration and valueadded curation. In the past year, efforts for update have been mainly devoted to BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas and IC4R. Newly released resources include three human genome databases (PGG.SNV, PGG.Han and CGVD), eLMSG, EWAS Data Hub, GWAS Atlas, iSheep and PADS Arsenal. In addition, four web services, namely, eGPS Cloud, BIG Search, BIG Submission and BIG SSO, have been significantly improved and enhanced. All of these resources along with their services are publicly accessible at https: //bigd.big.ac.cn. INTRODUCTION The National Genomics Data Center (NGDC), officially approved by the Ministry of Science & Technology and the Ministry of Finance of the People’s Republic of China in June 2019, is a national-level center dedicated to advancing life and health sciences by archiving, managing and processing a wide range of genomics related data. NGDC is established based on the BIG Data Center (1–3) at Beijing Institute of Genomics (BIG) of Chinese Academy of Sciences (CAS), jointly in close collaboration with two CAS institutions, namely, Institute of Biophysics (IBP) and Shanghai Institute of Nutrition and Health (SINH). Considering the rapid advancements in higher-throughput and lower-cost sequencing technologies, huge amounts of multi-omics data are generated at ever-growing rates and scales. Therefore, the primary mission of NGDC is to build archive platforms and information systems, develop advanced algorithms and tools to translate big data into big discovery, and provide open access to a suite of database resources in support of research activities of global users from both academia and industry. During the past year, NGDC has expanded, updated and enriched the amount and type of data through big data integration and value-added curation, particularly by close collaboration with IBP and SINH, with significant improvements and advances over the previous release. In terms of data attribute and curation intensity, database resources in NGDC can be generally divided into three categories: Data––raw sequence data and metadata, Information––value-added standardized information, and Knowledge––curated knowledge and knowledge graphs. Here, we provide a brief summary of new developments and recent updates, and describe the core resources and services of NGDC (Figure 1). All resources, along with their services, are publicly accessible through the home page of NGDC at https://bigd.big.ac.cn. NEW DEVELOPMENTS Human genome resources PGG.SNV (http://www.pggsnv.org) (4) is a human genome database, which gives much higher weight to previously under-investigated indigenous populations in Asia, as these genomes harbor an enormous number of variants that have not been observed in the extensively studied populations of European ancestry. In the current version, PGG.SNV archives 265 million single nucleotide variants (SNVs) across 220 147 present-day human genomes and 1018 ancient genomes and estimates their frequencies in 977 diverse populations, including 1009 newly sequenced genomes rep*To whom correspondence should be addressed: Zhang Zhang. Tel: +86 10 84097261; Fax: +86 10 84097720; Email: zhangzhang@big.ac.cn Correspondence may also be addressed to Wenming Zhao. Email: zhaowm@big.ac.cn Correspondence may also be addressed to Jingfa Xiao. Email: xiaojingfa@big.ac.cn Correspondence may also be addressed to Yiming Bao. Email: baoym@big.ac.cn Correspondence may also be addressed to Shunmin He. Email: heshunmin@ibp.ac.cn Correspondence may also be addressed to Guoqing Zhang. Email: gqzhang@picb.ac.cn Correspondence may also be addressed to Yixue Li. Email: yxli@sibs.ac.cn Correspondence may also be addressed to Guoping Zhao. Email: gpzhao@sibs.ac.cn Correspondence may also be addressed to Runsheng Chen. Email: crs@sun5.ibp.ac.cn † Full list provided in the Appendix. C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D25 Figure 1. The National Genomics Data Center’s core data resources. Three categories, namely, data, information and knowledge, are adopted to represent resources that are typically to deposit raw data/metadata (archives), house value-added information (databases) and integrate validated knowledge through literature curation (knowledgebases), respectively. It is noted that there are several databases that are not introduced in this report, namely, BioCode––Biological Tool Codes, GEN––Gene Expression Nebulas, iDog––Integrated Resource for Dog. A full list of data resources, which contains links to each resource, is available at https://bigd.big.ac. cn/databases. resenting 16 indigenous populations living in unusual environments (e.g. tropical forests and highlands) in East Asia and Southeast Asia. For each variant, PGG.SNV provides various approaches to query SNV information and nine types of annotations. In addition, PGG.SNV offers usersfriendly interfaces for data browsing and search and is equipped with an online tool for estimation of population genetic diversity and evolutionary parameters. PGG.Han (http://www.pgghan.org) (detailed in (5) in this issue) is a population genome database, which serves as the central repository of genomic data of the Han Chinese Genomes Initiative (Phase I). PGG.Han archives wholegenome sequencing or high-density genome-wide SNVs of 114 783 Han Chinese individuals (a.k.a. the Han100K), representing geographical sub-populations covering 33 of the 34 administrative divisions of China, as well as Singapore. PGG.Han provides: (i) an interactive interface for visualization of the fine-scale genetic structure of the Han Chinese population; (ii) genome-wide allele frequency of hierarchical sub-populations; (iii) ancestry inference for individual samples and controlling population stratification based on nested ancestry informative marker panels; (iv) a population-structure-aware shared control for genotype–phenotype association studies and (v) a HanChinese-specific reference panel for genotype imputation. Computational tools are implemented in PGG.Han and an online user-friendly interface is provided for data analysis and visualization. The Chinese Genomic Variation Database (CGVD; https: //bigd.big.ac.cn/cgvd) (detailed in (6) in this issue) is a genomic variation database for Chinese populations. CGVD is a sub-project of the CAS Precision Medicine Initiative project (CASPMI) (7), with the aim to establish the CAS professional cohort with whole-genome deep sequencing (25–30×) and build precise reference genomes for different Chinese sub-populations. In comparison with PGG.Han, CGVD features high-coverage sequencing data of 991 individuals of the CASPMI cohort and 301 Chinese individuals from the 1000 Genome Project (1KGP). Accordingly, it houses genomic variations of 48.30 million SNVs and 5.77 million small indels; in contrast to dbSNP (8), 28.49 million (46.67%) SNVs and 2.25 million (31.88%) indels are novel, indicating the advantage of deeper whole-genome sequencing coverage or/and the heterogeneity of genetic background in Chinese populations. Moreover, CGVD provides star-allele frequencies of drug metabolism related genes that are essential for pharmacogenomics studies in CASPMI and 1KGP related populations. It also integrates curated knowledge of genomic variation impacts on drug absorption, distribution, metabolism, excretion and toxicity. GWAS Atlas GWAS Atlas (https://bigd.big.ac.cn/gwas) (detailed in (9) in this issue) is a manually curated resource of genomewide variant-trait associations in plants and animals. In the current version, GWAS Atlas contains 75 467 varianttrait associations for 614 traits across seven cultivated plants (cotton, Japanese apricot, maize, rapeseed, rice, sorghum and soybean) and two domesticated animals (goat and pig), which were manually extracted and curated from 254 publications. More importantly, associations and traits are annotated and presented based on a set of ontologies (Plant Trait Ontology, Animal Trait Ontology for Livestock, etc.). Taken together, GWAS Atlas integrates high-quality curated GWAS associations for animals and plants and accordingly serves as a valuable resource for genetic research of important traits and breeding application. EWAS Data Hub Over the past decade, a large amount of epigenetic data, especially those sourced from DNA methylation array, has been accumulated as a result of numerous EWAS (epigenome-wide association study) projects. Hence, we present EWAS Data Hub (https://bigd.big.ac.cn/ewas/ datahub) (detailed in (10) in this issue), a data hub for collecting and normalizing DNA methylation array data as well as archiving associated metadata. The current release of EWAS Data Hub integrates a comprehensive collection of DNA methylation array data from 75 344 samples. Based on an effective normalization method to remove batch effects among different datasets, EWAS Data Hub provides high-quality reference DNA methylation profiles in terms of different contexts, involving 81 tissues/cell types (that contain 25 brain parts and 25 blood cell types), six ancestry categories, and 67 diseases (including 39 cancers). Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 D26 Nucleic Acids Research, 2020, Vol. 48, Database issue iSheep iSheep (https://bigd.big.ac.cn/isheep) is a specialized genomics resource for sheep (Ovis aries), providing a wealth of information on genotype and phenotype association, domestication and climatic adaptation of domestic sheep as well as their wild relatives. The current version of iSheep houses 70 390 968 unique SNPs and 12 318 530 indels obtained from 2777 samples (including 355 samples with whole-genome sequences, 1512 samples with 50KBeadChip and 911 samples with 600K-BeadChip) and provides comprehensive phenotypic information of 1459 worldwide sheep breeds. Meanwhile, iSheep offers an online tool to investigate the variations between individuals or among populations. Collectively, iSheep is a valuable genomics resource for the sheep research community, helpful to promote molecular breeding and farming industry for improved production traits. eLMSG eLMSG (eLibrary of Microbial Systematics and Genomics; http://www.biosino.org/elmsg) is a web microbial library that integrates not only taxonomic information, but also genomic information and phenotypic information (including morphology, physiology, biochemistry and enzymology). The taxonomic system of eLMSG is manually curated and composed of all validly and some effectively published taxa. For each taxon, the Latin name, taxon ID (NCBI taxonomy), etymology, rank, lineage, the dates of effective and/or valid publication, feature descriptions, nomenclature type and references for the proposal and emendations during the history of the taxon are presented. Besides these data, the species taxa contain information about 16S rRNA gene and/or genome sequences. All publicly available genome data of each type species including both type and non-type strains were collected, and if needed, re-annotated using the standardized analysis pipeline. Furthermore, pan-genomic data analyses were conducted for species with ≥5 genome sequences available. Finally, for all type species, taxonomically relevant phenotypic data were extracted and curated from literatures, which were further indexed into eLMSG as searchable and analyzable data records. Taken together, eLMSG is a comprehensive web platform for studying microbial systematics and genomics, potentially useful for better understanding microbial taxonomy, natural evolutionary processes and ecological relationships. PADS Arsenal PADS Arsenal (https://bigd.big.ac.cn/padsarsenal) (detailed in this issue) is a comprehensive public database of prokaryotic defense systems related genes (PADS). To address the challenges of ever-increasing prokaryotic genomic data and the progressive discovery of novel defense systems, we develop PADS Arsenal for browsing, searching, and analyzing various defense system genes. In the current version, PADS Arsenal integrates 6 600 264 defense systems genes, which belong to 18 defense systems, 63 701 genomes and 33 390 species of archaea and bacteria. In addition, it supports defense system gene analysis by equipping with an interactive online pipeline that includes sequence homology search, multiple sequence alignment and phylogenetic analysis. Meanwhile, PADS Arsenal provides a presence-absence variation (PAV) analysis function to visualize the dynamic variation of defense system genes. Collectively, PADS Arsenal integrates a comprehensive collection of defense system genes in archaea and bacteria and thus provides valuable resources to facilitate development of novel genome editing, engineering and regulation tools. RECENT UPDATES BioProject and BioSample BioProject (https://bigd.big.ac.cn/bioproject) and BioSample (https://bigd.big.ac.cn/biosample), designed in compliance with INSDC (International Nucleotide Sequence Database Collaboration; a joint initiative by DDBJ, EMBLEBI and NCBI) standards, are two public repositories of biological projects and biological samples, respectively. They collect and store descriptive metadata and information about biological projects and biological materials used for experiments. By providing a centralized access to all public projects and reciprocal links to their related data, BioProject supports various projects in terms of data types, ranging from genomic, transcriptomic, epigenomic and metagenomic sequencing projects to genome-wide association studies (GWAS) and variation analyses. Similarly, BioSample serves as a centralized access to all public samples and reciprocal links to BioProject as well as other relevant database resources. In the past year, BioSample has been significantly upgraded by adding the batch submission functionality and allowing users to submit information of multiple samples in a single table, which consequently had greatly improved the efficiency of data submission. As of August 2019, BioProject houses a total of 1248 biological projects submitted by 734 users from 219 organizations and BioSample includes a total of 87 107 samples from 482 species, presenting a dramatic increase in data submission (Figure 2). Genome Sequence Archive As a public data repository for archiving raw sequence reads, the Genome Sequence Archive (GSA; https://bigd. big.ac.cn/gsa) (11) accepts data submissions from all over the world and provides free access to all publicly available data for global scientific communities. Over the past year, GSA has been significantly enhanced by upgrading the metadata submission functionality to enable batch submission of experiments and runs in a single table. Till August 2019, GSA has archived a total of 55 057 Experiments and 59 566 Runs and housed >1200 Terabytes of submitted raw sequence data (Figure 2), showing the doubled volume by comparison with the previous release last August (namely, ∼580 TB). According to the statistics (https://bigd.big.ac. cn/gsa/statistics), data housed in GSA were submitted from 150 organizations and reported in >100 scientific journals, including Cell, Genome Research, Genomics Proteomics Bioinformatics, Nature, Plant Cell and PNAS. More importantly, GSA has been designated as supported repository for genes and gene expression data by Elsevier. All released Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D27 NumberofBioSamples NumberofBioProjects Feb.2016 Sept.2016 Apr.2017 Nov.2017 Jun.2018 Jan.2019 Aug.2019 BioProject BioSample 90000 72000 54000 36000 18000 0 1400 1120 840 560 280 0 A FileSize(TB) NumberofExperiments/Runs Experiment Run File Size Feb.2016 Sept.2016 Apr.2017 Nov.2017 Jun.2018 Jan.2019 Aug.2019 1300 1040 780 520 260 0 64000 51200 38400 25600 12800 0 B Figure 2. Statistics of data submissions to BioProject, BioSample, and GSA. (A) Data statistics of BioProject and BioSample. (B) Data statistics of Experiments and Runs as well as submitted files’ size in GSA. All statistics are frequently updated and publicly available at https://bigd.big. ac.cn/bioproject, https://bigd.big.ac.cn/biosample and https://bigd.big.ac. cn/gsa. data in GSA are publicly accessible and downloadable at ftp://download.big.ac.cn/gsa/. Genome Warehouse The Genome Warehouse (GWH; https://bigd.big.ac.cn/ gwh) is a public archival resource housing genome-scale data for a wide range of species. For each collected genome assembly, GWH incorporates detailed descriptive information, including metadata of biological sample, genome assembly, sequence data and genome annotation, and offers standardized quality control for genome sequence and genome annotation. Notably, in this version, the sequences of the northern Han reference genome (NH1.0; GWHAAAS00000000) has been deposited in GWH, which was de novo assembled with a contig N50 size of 3.6 Mb and a scaffold N50 size of 46.63 Mb (see (7) for details). In addition, GWH has been significantly upgraded by accepting updated submissions (including both genome sequence and updates of genome annotation) and improving web services for data submission, release and sharing. In particular, GWH provides data visualization for both genome sequence and genome annotation powered by JBrowse (12) and offers statistics and charts in light of assembly, genome, sequencing platform, assembly method, organization and download. Till September 2019, GWH has accepted 649 data submissions from organizations both nationally and internationally and covered a broad diversity of species, e.g. animals, plants, fungi, bacteria, archaea and viruses. Among them, 133 genome assemblies have been publicly released and reported in 19 international journals. Genome Variation Map The Genome Variation Map (GVM; https://bigd.big.ac.cn/ gvm) (13) is a public database of genome variations, including single nucleotide polymorphisms (SNP) and small insertions and deletions (indel). Different from dbSNP that only accepts human data submissions, GVM collects genome variations for a wide range of species and accepts submissions of different types of genome variations from all over the world. In the current version, GVM incorporates a total of ∼8.4 billion variants for 13 animals and 19 plants, including 7.2 billion SNPs and 1.2 billion indels. By comparison with the previous version, it has been updated by integrating 47 million variants from two newly added species (diploid wheat and cat). In addition, GVM has accepted 24 genome variation data submissions involving 23 056 samples from 10 species. Non-coding RNA Resources NONCODE (http://www.noncode.org) (14) is an integrated knowledgebase dedicated to the complete collection and annotation of non-coding RNAs (ncRNA). Almost all the types of ncRNAs (excluding tRNAs and rRNAs) were filtered automatically from literatures and other public databases and were later manually curated. The ncRNA sequences and their related information (such as chromosomal information, conservation, function, etc.) were collected and recorded. BLAST alignment search service and access through our custom UCSC Genome Browser were also incorporated. In the current version (v5.0), 17 species are included in NONCODE (human, mouse, cow, rat, chicken, fruit fly, zebrafish, nematode, yeast, Arabidopsis, chimpanzee, gorilla, orangutan, rhesus macaque, opossum platypus and pig). Consequently, NONCODE collects a total of 548,640 long ncRNAs (lncRNA), coupled with their expression profiles identified based on RNA-seq data for human and mouse as well as their predicted functions. Moreover, it also includes human lncRNA–disease relationships and SNP–lncRNA–disease relationships, human exosome lncRNA expression profiles and predicted RNA secondary structures of human transcripts. NPInter (http://bigdata.ibp.ac.cn/npinter) (15) is a database that documents experimentally identified functional interactions between ncRNAs (except tRNAs and rRNAs), especially lncRNAs, and protein related biomacromolecules (proteins, mRNAs or genomic DNAs). NPInter provides the scientific community with a comprehensive and integrated tool for efficient browsing Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 D28 Nucleic Acids Research, 2020, Vol. 48, Database issue and extraction of information on interactions between ncRNAs and biomolecules. With the development of high-throughput biotechnology, such as cross-linking immunoprecipitation (CLIP-seq) and Chromatin Isolation by RNA purification (ChIRP-seq), the number of known ncRNA interactions, has grown rapidly in recent years. In the current release, NPInter houses 609 020 RNA-RNA interactions, 488 315 RNA–protein interactions and 892 737 RNA–DNA interactions, and provides more user-friendly interfaces and functional modules. piRBase (http://www.regulatoryrna.org/database/ piRNA/) (16) is a comprehensive database of piRNA sequences, which are a class of small RNAs that is mainly expressed in animal germ line. piRBase integrates various piRNA-related high-throughput data in multiple species, leading to the largest collection of piRNAs and their annotations. Since its launch in 2014, piRBase has incorporated 264 datasets from 21 organisms and accordingly housed a total of ∼173 million piRNAs up to now. Furthermore, piRBase provides comprehensive annotations of piRNA sequences and genomic loci as well as piRNA targets and disease-related piRNAs. In addition, epigenetic and post-transcriptional regulation data were systematically integrated to support piRNA functional study. LncBook (17) (https://bigd.big.ac.cn/lncbook) and LncRNAWiki (18) (https://bigd.big.ac.cn/lncrnawiki), are two dedicated resources of human lncRNAs, through expert curation and community curation, respectively. In the past year, LncBook has been updated by removing 1196 redundant lncRNA transcripts and updating genomic annotations of 1046 lncRNA transcripts. As a result, LncBook provides a high-quality collection of 268 848 nonredundant lncRNA transcripts and 140 356 lncRNA genes. Also, LncBook presents tissue-specific lncRNAs (TS lncRNAs) for different tissues; among the 32 tissues, testis has the largest number of TS lncRNAs (9024 lncRNAs) and the following tissue is brain (2297 lncRNAs). In addition, LncBook is equipped with an online tool for coding potential prediction, which is able to accurately identify lncRNAs in a wide range of species (19). On the other side, LncRNAWiki (18), a wiki-based platform for community curation of human lncRNAs, has been updated by curating 291 human lncRNAs with functional experiment evidence, including 149 newly added lncRNAs and 142 existing lncRNAs with updated publications. Also, 65 redundant lncRNAs based on the approved and alias symbols (https://www.genenames.org) were removed. Consequently, in the current release, the number of functionally validated human lncRNAs in LncRNAWiki has grown to 1951. Together, LncBook and LncRNAWiki are of great potential to achieve comprehensive integration of human lncRNAs and their annotations (20). RNA Editing Resources Editome Disease Knowledgebase (EDK; https://bigd.big. ac.cn/edk) (21) and Plant Editosome Database (PED; https: //bigd.big.ac.cn/ped) (22) are two RNA editing resources for human and plants, respectively. In the updated version, EDK incorporates two new diseases associated with 51 experimentally validated abnormal editing events located in six mRNAs, and 10 aberrant activities involved with two editing enzymes. Furthermore, to provide an easy-to-use and downloadable reference for further functional investigation on individual RNA editing event, EDK incorporates detailed structured annotation information for each editing site, including gene, specific gene region, molecular effect, editing enzyme, associated disease and/or phenotype. As a featured database of RNA editosome in plants (22,23), PED has been updated by integrating two more editing factors, which had been recently verified to be involved in RNA editing processes and related to important phenotypes in Arabidopsis and new maize variety. Collectively, EDK and PED integrate more valuable information of editing enzymes (factors) and/or editing events associated with phenotypes, so as to help users facilitate systematic investigations on RNA editing machinery in both human and plants. MethBank The Methylation Bank (MethBank; https://bigd.big.ac.cn/ methbank) (24,25) is a databank of genome-wide DNA methylomes across a variety of species, with particular focus on human health and aging, animal embryonic development and plant growth and development. In the current version, MethBank offers 43 consensus reference methylomes (CRM) for human owing to large-scale DNA methylation array data public available, which are sourced from 10 healthy human tissues including 4577 peripheral blood samples, 26 prostate samples, 241 saliva samples, 322 skin samples, 98 breast samples, 38 colon samples, 206 kidney samples, 50 liver samples, 150 lung samples and 56 thyroid samples. In addition to CRMs, MethBank provides single-base resolution methylomes (SRM) based on wholegenome bisulfite sequencing data from human, plants and animals. Up to now, MethBank includes 40 SRMs from 26 healthy human tissues, 336 from different developmental stages in five economical plants and 18 from gametes and early embryos in two model animals. In addition, MethBank provides useful information on methylation data analysis tools, helpful for users to easily find any tool of interest. EWAS Atlas EWAS Atlas (https://bigd.big.ac.cn/ewas) (26) is a curated knowledgebase of epigenome-wide association studies. During the past year, it has been enriched by adding a total of 121 156 EWAS associations manually extracted and curated from 191 publications. It is noted that the MethylationEPIC (850K/EPIC) array becomes increasingly popular, so that the number of 850K-based publications in EWAS Atlas has increased accordingly. In addition, the online trait enrichment tool was further enhanced and EWAS knowledge graph (https://bigd.big.ac.cn/ewas/network) was newly developed to visualize and explore trait-gene networks. Till September 2019, EWAS Atlas has integrated 450 328 high-quality EWAS associations derived from 1003 studies in 401 publications, including 135 tissues/cell lines, 409 traits, 2689 cohorts and 409 ontology entities. Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D29 Information Commons for Rice Information Commons for Rice (IC4R; http://ic4r.org) (27,28) is a comprehensive resource dedicated to integrating multi-omics data for rice. To improve the completeness of gene structure and identify novel genes, the current implementation of IC4R incorporates a new gene annotation system IC4R-2.0 that is built based on a large number of 1503 public RNA-seq datasets, accordingly achieving higher integrity and quality by comparison with previous annotation systems. Specifically, IC4R-2.0 contains 56,221 protein-coding gene loci corresponding to 80 039 mRNAs, among which more than 27 000 gene loci are substantially improved with structural modification, 456 novel genes are identified, and 3215 lncRNAs and 4373 circular RNAs are annotated. In addition, although IC4R offers a high-density rice variation map of ∼18 million SNPs, these raw SNPs are not readily usable for population genetics, evolutionary analysis, association studies or genomic breeding in rice. To satisfy various needs of rice researchers on data mining of the integrated genotypic data, a committed module––SnpReady for Rice (SR4R, http://sr4r.ic4r.org), is developed and deployed in IC4R. SR4R features the lowest SNP redundancy and highest genetic diversity of rice populations. Currently, SR4R mainly integrates four reference SNP panels, including ‘hapmapSNPs’ after data filtration and genotype imputation, ‘tagSNPs’ selected from linkage disequilibrium (LD)-based redundancy removal, ‘fixedSNPs’ selected from genes exhibiting selective sweep signatures, and ‘barcodeSNPs’ selected from DNA fingerprinting simulation. The associated SNPs in these four panels as well as online toolkits are publicly available and download- able. LSD The leaf senescence database (LSD; https://bigd.big.ac.cn/ lsd) (29,30) is dedicated to the comprehensive collection of senescence-associated genes (SAGs) and their corresponding mutants through manual curation. In the current version (v3.0; see an update in (31) in this issue), LSD incorporates 5,853 SAGs and 617 mutants from 68 species. Notably, it integrates leaf senescence-associated transcriptome data in Arabidopsis, rice, soybean and poplar and identifies senescence-differentially expressed small RNAs (SensmRNA) in Arabidopsis. Moreover, LSD contains senescence phenotypes of 90 natural accessions (ecotypes) and 42 images of ecotypes in Arabidopsis and collects mutant seed information of SAGs in rice. Also, interaction pairs between Sen-smRNAs and senescence-associated transcription factors are integrated into LSD. Collectively, the updated LSD has the great potential to continue to provide useful information for the plant research community. Database Commons Database Commons (https://bigd.big.ac.cn/ databasecommons), a catalog of global biological databases, provides open access to a comprehensive collection of publicly available databases and their descriptive metadata. Currently, it catalogues a total of 4615 databases, involving more than 7000 publications and ∼2000 organizations throughout the world. In the past year, Database Commons has been updated by assigning category tag(s) to each database, linking related databases and providing citation information according to Europe PMC (32). Importantly, to improve the quality of descriptive metadata for each database, we sent invitations to database owners (according to the publications) to call for community curation of their own databases. As a result, a total of 287 database owners have responded and made valuable curations to 345 databases. eGPS Cloud eGPS Cloud (http://egpscloud.big.ac.cn) (33) is a multifunctional web portal that integrates comprehensive multiomics tools and provides online data analysis services for studying evolutionary Genotype-Phenotype Systems (eGPS). In the current release, eGPS Cloud is equipped with 15 tools and 20 visualization scripts, accordingly delivering four modularized web services, that is, genomics data analysis, population data analysis, evolutionary & network data analysis, and multi-omics data visualization. It allows users to configure customized parameters for different tools and perform various data analysis online in a straightforward and friendly manner. Ongoing efforts are linking eGPS Cloud with GSA in order to provide users with seamless services for raw sequence data analysis. BIG Search BIG Search (https://bigd.big.ac.cn/search) is a distributed and scalable full-text search engine built based on Elasticsearch (a highly scalable open-source search and analytics engine, https://www.elastic.co/). It features cross-domain search and facilitates users to gain access to a wide range of biological data almost in real-time. In the current version, BIG Search includes data indexes from all NGDC’s resources and 25 partner resources (see details at https: //bigd.big.ac.cn/partners). Additionally, EBI data resources have also been integrated into BIG Search powered by EBI Search RESTful API (34). In summary, BIG Search has been significantly updated by incorporating more data indexes from internal and external resources and displaying search results in a more user-friendly manner. BIG Submission BIG Submission (https://bigd.big.ac.cn/gsub) is a one-stop submission portal that provides submission services for a series of database resources in NGDC, including BioProject, BioSample, GSA, GWH and GVM. During the past year, BIG Submission has been upgraded by optimizing the web interfaces and expanding the storage and computing resources, with the purpose to meet the needs of the rapid growth of data submissions. Importantly, it has been equipped by Aspera, a high-speed transfer tool that can greatly improve the data transfer efficiency and provide users with better submission experiences. BIG SSO BIG Single Sign-On (SSO; https://bigd.big.ac.cn/sso) is a user access control system that refers to systems where a sin- Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 D30 Nucleic Acids Research, 2020, Vol. 48, Database issue gle authentication provides access to multiple applications by passing the authentication token seamlessly to configured applications. In the past year, HTTPS protocols have been deployed in all web sites for security transfer, so that the BIG SSO system has been updated to be much safer and more reliable. Meanwhile, services for user registration and update have been enhanced and delivered as a micro-service. CONCLUDING REMARKS NGDC provides a family of database resources through big data deposition, integration and translation, with the aim to support worldwide research activities in both academia and industry. In the past year, it has been significantly updated by archiving more data submissions, performing value-added curation, and improving web interfaces and services. And most importantly, it has been enhanced as the national center by joint efforts from BIG, IBP and SINH, forming an excellent line-up of field experts from the three institutions. Ongoing and future efforts are standardization of data models and curation processes, unification of web interfaces and SSO authentication across database resources, establishment of cloud infrastructure for big data storage and transfer, and development of a variety of databases and tools to facilitate the translation of big data into big discovery. NGDC is open to worldwide collaborations, particularly seeking the possibility to collaborate with INSDC members in dealing with big data archive. In addition, NGDC promotes big data sharing at a worldwide scale by setting up the Global Biodiversity and Health Big Data Alliance (BHBD; http://bhbd-alliance.org); by July 2019, 20 organizational members from 11 countries have joined the BHBD Alliance, with active collaborations in organizing international meetings/symposia, training courses and joint research projects. With more stable support from the government and CAS, NGDC will continue to grow to deliver a wide range of data resources and services in aid of both domestic and international research activities. ACKNOWLEDGEMENTS We thank a number of users for submitting data, sending suggestions, reporting bugs and getting involving in community curation. The National Genomics Data Center is indebted to its funders, including the Ministry of Science & Technology and the Ministry of Finance of the People’s Republic of China as well as Chinese Academy of Sciences. We would like to express our sincere thanks to the late Professor Bailin Hao (1934–2018), a leading bioinformatician of his generation, who had first advocated the establishment of national center since the 1990s. FUNDING Strategic Priority Research Program of the Chinese Academy of Sciences [XDA19050302, XDB13040500, XDB13040100]; National Key Research & Development Program of China [2018YFD1000505, 2018YFC2000100, 2018YFC1406902, 2018YFC0910400, 2018YFC0310602, 2017YFC1201200, 2017YFC0908405, 2017YFC0908404, 2017YFC0908403, 2017YFC0907505, 2017YFC0907503, 2017YFC0907502, 2016YFE0206600, 2016YFC0906403, 2016YFC0903003, 2016YFC0901904, 2016YFC0901903, 2016YFC0901702, 2016YFC0901604, 2016YFC0901603, 2016YFB0201702]; National Natural Science Foundation of China [91731303, 81670462, 31970565, 31871328, 31871294, 31801104, 31771465, 31771410, 31771388, 31671360, 31571358, 31525014, 1470330, 31961130380, 31711530221]; UK Royal Society-Newton Advanced Fellowship [NAF\R1\191094]; International Partnership Program of the Chinese Academy of Sciences [153F11KYSB20160008, 153D31KYSB20170121]; 13th Five-year Informatization Plan of Chinese Academy of Sciences [XXH13505-05]; Key Program of the Chinese Academy of Sciences [KJZD-EW-L14]; Key Research Program of Frontier Sciences of the Chinese Academy of Sciences [QYZDJ-SSW-SYS009]; Key Technology Talent Program of the Chinese Academy of Sciences; The 100 Talent Program of the Chinese Academy of Sciences; K.C. Wong Education Foundation; The Youth Innovation Promotion Association of the Chinese Academy of Sciences [2019104, 2018134, 2017141]; The Special Project on Precision Medicine under the National Key R&D Program [SQ2017YFSF090210]; The Open Biodiversity and Health Big Data Initiative of IUBS. Funding for open access charge: Strategic Priority Research Program of the Chinese Academy of Sciences. Conflict of interest statement. None declared. REFERENCES 1. BIG Data Center Members (2017) The BIG Data Center: from deposition to integration to translation. Nucleic Acids Res., 45, D18–D24. 2. BIG Data Center Members (2018) Database resources of the BIG data center in 2018. Nucleic Acids Res., 46, D14–D20. 3. BIG Data Center Members (2019) Database resources of the BIG data center in 2019. Nucleic Acids Res., 47, D8–D14. 4. Zhang,C., Gao,Y., Ning,Z., Lu,Y., Zhang,X., Liu,J., Xie,B., Xue,Z., Wang,X., Yuan,K. et al. (2019) PGG.SNV: Understanding the evolutionary and medical implications of human single nucleotide variations in diverse populations. Genome Biol., doi:10.1186/s13059-019-1838-5. 5. Gao,Y., Zhang,C., Yuan,L., Ling,Y., Wang,X., Liu,C., Pan,Y., Zhang,X., Ma,X., Wang,Y. et al. (2020) PGG.Han: The Han Chinese Genome Database and analysis platform. Nucleic Acids Res., doi:10.1093/nar/gkz829. 6. Zeng,J., Yuan,N., Zhu,J., Pan,M., Zhang,H., Wang,Q., Shi,S., Du,Z. and Xiao,J. (2019) CGVD: a genomic variation database for Chinese populations. Nucleic Acids Res., doi:10.1093/nar/gkz952. 7. Du,Z., Ma,L., Qu,H., Chen,W., Zhang,B., Lu,X., Zhai,W., Sheng,X., Sun,Y., Li,W. et al. (2019) Whole genome analyses of chinese population and De Novo assembly of a northern han genome. Genomics Proteomics Bioinform., 17, 229–247. 8. Sherry,S.T., Ward,M.H., Kholodov,M., Baker,J., Phan,L., Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311. 9. Tian,D., Wang,P., Tang,B.-X., Teng,X., Li,C., Liu,X., Zou,D., Song,S. and Zhang,Z. (2019) GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res., doi:10.1093/nar/gkz828. 10. Xiong,Z., Li,M., Yang,F., Ma,Y., Sang,J., Li,R., Li,Z., Zhang,Z. and Bao,Y.-M. (2019) EWAS Data Hub: a resource of DNA methylation array data and metadata. Nucleic Acids Res., doi:10.1093/nar/gkz840. 11. Wang,Y., Song,F., Zhu,J., Zhang,S., Yang,Y., Chen,T., Tang,B., Dong,L., Ding,N., Zhang,Q. et al. (2017) GSA: Genome Sequence Archive. Genomics Proteomics Bioinform., 15, 14–18. Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D31 12. Buels,R., Yao,E., Diesh,C.M., Hayes,R.D., Munoz-Torres,M., Helt,G., Goodstein,D.M., Elsik,C.G., Lewis,S.E., Stein,L. et al. (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol., 17, 66. 13. Song,S., Tian,D., Li,C., Tang,B., Dong,L., Xiao,J., Bao,Y., Zhao,W., He,H. and Zhang,Z. (2018) Genome Variation Map: a data repository of genome variations in BIG Data Center. Nucleic Acids Res., 46, D944–D949. 14. Fang,S., Zhang,L., Guo,J., Niu,Y., Wu,Y., Li,H., Zhao,L., Li,X., Teng,X., Sun,X. et al. (2018) NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res., 46, D308–D314. 15. Hao,Y., Wu,W., Li,H., Yuan,J., Luo,J., Zhao,Y. and Chen,R. (2016) NPInter v3.0: an upgraded database of noncoding RNA-associated interactions. Database (Oxford), 2016, baw057. 16. Wang,J., Zhang,P., Lu,Y., Li,Y., Zheng,Y., Kan,Y., Chen,R. and He,S. (2019) piRBase: a comprehensive database of piRNA sequences. Nucleic Acids Res., 47, D175–D180. 17. Ma,L., Cao,J., Liu,L., Du,Q., Li,Z., Zou,D., Bajic,V.B. and Zhang,Z. (2019) LncBook: a curated knowledgebase of human long non-coding RNAs. Nucleic Acids Res., 47, D128–D134. 18. Ma,L., Li,A., Zou,D., Xu,X., Xia,L., Yu,J., Bajic,V.B. and Zhang,Z. (2015) LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res., 43, D187–D192. 19. Wang,G., Yin,H., Li,B., Yu,C., Wang,F., Xu,X., Cao,J., Bao,Y., Wang,L., Abbasi,A.A. et al. (2019) Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics, 35, 2949–2956. 20. Ma,L., Cao,J., Liu,L., Li,Z., Shireen,H., Pervaiz,N., Batool,F., Raza,R.Z., Zou,D., Bao,Y. et al. (2019) Community curation and expert curation of human long noncoding RNAs with LncRNAWiki and LncBook. Curr. Protoc. Bioinform., 67, e82. 21. Niu,G., Zou,D., Li,M., Zhang,Y., Sang,J., Xia,L., Li,M., Liu,L., Cao,J., Zhang,Y. et al. (2019) Editome Disease Knowledgebase (EDK): a curated knowledgebase of editome-disease associations in human. Nucleic Acids Res., 47, D78–D83. 22. Li,M., Xia,L., Zhang,Y., Niu,G., Li,M., Wang,P., Zhang,Y., Sang,J., Zou,D., Hu,S. et al. (2019) Plant editosome database: a curated database of RNA editosome in plants. Nucleic Acids Res., 47, D170–D174. 23. Lo Giudice,C., Hernandez,I., Ceci,L.R., Pesole,G. and Picardi,E. (2019) RNA editing in plants: A comprehensive survey of bioinformatics tools and databases. Plant Physiol. Biochem., 137, 53–61. 24. Li,R., Liang,F., Li,M., Zou,D., Sun,S., Zhao,Y., Zhao,W., Bao,Y., Xiao,J. and Zhang,Z. (2018) MethBank 3.0: a database of DNA methylomes across a variety of species. Nucleic Acids Res., 46, D288–D295. 25. Zou,D., Sun,S., Li,R., Liu,J., Zhang,J. and Zhang,Z. (2015) MethBank: a database integrating next-generation sequencing single-base-resolution DNA methylation programming data. Nucleic Acids Res., 43, D54–D58. 26. Li,M., Zou,D., Li,Z., Gao,R., Sang,J., Zhang,Y., Li,R., Xia,L., Zhang,T., Niu,G. et al. (2019) EWAS Atlas: a curated knowledgebase of epigenome-wide association studies. Nucleic Acids Res., 47, D983–D988. 27. IC4R Project Consortium. (2016) Information Commons for Rice (IC4R). Nucleic Acids Res., 44, D1172–D1180. 28. Xia,L., Zou,D., Sang,J., Xu,X., Yin,H., Li,M., Wu,S., Hu,S., Hao,L. and Zhang,Z. (2017) Rice Expression Database (RED): an integrated RNA-Seq-derived gene expression database for rice. J. Genet. Genomics, 44, 235–241. 29. Li,Z., Zhao,Y., Liu,X., Peng,J., Guo,H. and Luo,J. (2014) LSD 2.0: an update of the leaf senescence database. Nucleic Acids Res., 42, D1200–D1205. 30. Liu,X., Li,Z., Jiang,Z., Zhao,Y., Peng,J., Jin,J., Guo,H. and Luo,J. (2011) LSD: a leaf senescence database. Nucleic Acids Res., 39, D1103–D1107. 31. Li,Z., Zhang,Y., Zou,D., Zhao,Y., Wang,H.-L., Zhang,Y., Xia,X., Luo,J., Guo,H. and Zhang,Z. (2019) LSD 3.0: a comprehensive resource for the leaf senescence research community. Nucleic Acids Res., doi:10.1093/nar/gkz898. 32. Levchenko,M., Gou,Y., Graef,F., Hamelers,A., Huang,Z., Ide-Smith,M., Iyer,A., Kilian,O., Katuri,J., Kim,J.H. et al. (2018) Europe PMC in 2017. Nucleic Acids Res., 46, D1254–D1260. 33. Yu,D., Dong,L., Yan,F., Mu,H., Tang,B., Yang,X., Zeng,T., Zhou,Q., Gao,F., Wang,Z. et al. (2019) eGPS 1.0: comprehensive software for multi-omic and evolutionary analyses. Natl. Sci. Rev., doi:10.1093/nsr/nwz079. 34. Madeira,F., Park,Y.M., Lee,J., Buso,N., Gur,T., Madhusoodanan,N., Basutkar,P., Tivey,A.R.N., Potter,S.C., Finn,R.D. et al. (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res., 47, W636–W641. APPENDIX Corresponding author: Zhang Zhang1,2,3,10,11,* Co-corresponding authors: Wenming Zhao1,2,3,10,* , Jingfa Xiao1,2,3,10,* , Yiming Bao1,2,3,10,11,* , Shunmin He1,4,10,* , Guoqing Zhang1,5,* , Yixue Li1,5,* , Guoping Zhao1,5,6,7,* , Runsheng Chen1,4,10,* NGDC MEMBERS (Arranged by project role and then by contribution except for Team Leader (TL), as indicated) PGG.Han: Yang Gao5,# , Chao Zhang5,# , Liyun Yuan5,# , Guoqing Zhang1,5,* (TL), Shuhua Xu5,14,15,16 (TL) PGG.SNV: Chao Zhang5,# , Yang Gao5,# , Zhilin Ning5,# , Yan Lu5,# , Shuhua Xu5,14,15,16 (TL) CGVD: Jingyao Zeng1,2,3,# , Na Yuan1,2,# , Junwei Zhu1,2 , Mengyu Pan1,2 , Hao Zhang1,2,3,10 , Qi Wang1,2,3,10 , Shuo Shi1,2,3,10 , Meiye Jiang1,2,3,10 , Mingming Lu1,2,3,10 , Qiheng Qian1,2,3,10 , Qianwen Gao1,2,3,10 , Yunfei Shang1,2,3,10 , Jinyue Wang1,2,3,10 , Zhenglin Du1,2,# (TL), Jingfa Xiao 1,2,3,10,* (TL) GWAS Atlas: Dongmei Tian1,2,# , Pei Wang1,2,3,10,# , Bixia Tang1,2,# , Cuiping Li1,2,# , Xufei Teng1,2,3,10 , Xiaonan Liu1,2,3,10 , Dong Zou1,2,3 , Shuhui Song1,2,3,# (TL) EWAS Data Hub: Zhuang Xiong1,2,3,10,# , Mengwei Li1,2,3,10,# , Fei Yang1,2,3,10,# , Yingke Ma1,2,3 , Jian Sang1,2,3,10 , Zhaohua Li 1,2,3,10,11 , Rujiao Li1,2,3,# (TL) iSheep: Zhonghuang Wang1,2,10,# , Qianghui Zhu9,10,# , Junwei Zhu1,2 , Xin Li9 , Sisi Zhang1,2 , Dongmei Tian1,2 , Hailong Kang1,2,10 , Cuiping Li1,2 , Lili Dong1,2 , Cui Ying1,2,10 , Guangya Duan1,2,10 , Shuhui Song1,2,3 , Menghua Li9,10 (TL), Wenming Zhao1,2,3,10,* (TL) eLMSG: Xiaoyang Zhi12,# (TL), Yunchao Ling5,# , Ruifang Cao5,# , Zhao Jiang12 , Haokui Zhou7 , Daqing Lv5 , Wan Liu5 , Hans-Peter Klenk13 , Guoping Zhao1,5,6,7,* , Guoqing Zhang1,5,* (TL) PADS: Yadong Zhang1,2,3,10,# , Zhewen Zhang1,2,3,# , Hao Zhang1,2,3,10 , Jingfa Xiao1,2,3,10,* (TL) BioProject & BioSample & GSA & BIG Submission: Tingting Chen1,2,# , Sisi Zhang1,2,# , Xu Chen1,2,# , Junwei Zhu1,2,# , Zhonghuang Wang1,2,3,10 , Hailong Kang1,2,3,10 , Lili Dong1,2 , Yanqing Wang1,2,# (TL) GWH: Yingke Ma1,2,3,# , Song Wu1,2,3,10 , Zhaohua Li1,2,3,10,11 , Zheng Gong1,2,3,10 , Meili Chen1,2,3,# (TL) GVM: Cuiping Li1,2,# , Dongmei Tian1,2,# , Xufei Teng1,2,3,10,# , Pei Wang1,2,3,10,# , Bixia Tang1,2,# , Xiaonan Liu1,2,3,10 , Dong Zou1,2,3 , Shuhui Song1,2,3,# (TL) NONCODE: Shuangsang Fang8 , Lili Zhang4,10 , Jincheng Guo8 , Yiwei Niu4,10 , Yang Wu8 , Hui Li8 , Lianhe Zhao8 , Xiyuan Li8 , Xueyi Teng4,10 , Xianhui Sun4,10 , Liang Sun8 , Runsheng Chen1,4,10,* , Yi Zhao8 (TL) Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 D32 Nucleic Acids Research, 2020, Vol. 48, Database issue piRBase: Jiajia Wang4,10,# , Peng Zhang4,# , Yanyan Li4,10 , Yu Zheng4,10 , Runsheng Chen1,4,10,* , Shunmin He1,4,10,* (TL) NPInter: Xueyi Teng4,10,# , Xiaomin Chen4,10,# , Hua Xue4,10,# , Yiheng Teng4,10 , Peng Zhang4 , Quan Kang4 , Yajing Hao4 , Yi Zhao8 , Runsheng Chen1,4,10,* , Shunmin He1,4,10,* (TL) LncBook & LncRNAWiki: Jiabao Cao1,2,3,10,# , Lin Liu1,2,3,10,# , Zhao Li1,2,3,10,# , Qianpeng Li1,2,3,10 , Dong Zou1,2,3 , Qiang Du1,2,3,10 , Amir A. Abbasi25 , Huma Shireen25 , Nashaiman Pervaiz25 , Fatima Batool25 , Rabail Z. Raza25 , Lina Ma1,2,3,# (TL) EDK & PED: Guangyi Niu1,2,3,10,# , Yuansheng Zhang1,2,3,10,# , Dong Zou1,2,3,# , Tongtong Zhu1,2,3,10,11 , Jian Sang1,2,3,10 , Mengwei Li1,2,3,10 , Lili Hao1,2,3,# (TL) MethBank: Dong Zou1,2,3,# , Guoliang Wang24,# , Mengwei Li1,2,3,10,# , Rujiao Li1,2,3,# (TL) EWAS Atlas: Mengwei Li1,2,3,10,# , Rujiao Li1,2,3 , Yiming Bao1,2,3,10,11,* (TL) IC4R: Jun Yan17,# , Jian Sang1,2,3,10,# , Dong Zou1,2,3,# , Chen Li22 , Zhennan Wang10,23 , Yuansheng Zhang1,2,3,10 , Tongtong Zhu1,2,3,10,11 , Shuhui Song1,2,3 (TL), Xiangfeng Wang17 (TL), Lili Hao1,2,3 (TL) LSD: Zhonghai Li18,# (TL), Yang Zhang1,2,3,10,# , Dong Zou1,2,3 , Yi Zhao19 , Houling Wang18 , Yi Zhang18 , Xinli Xia18,20 , Hongwei Guo18,21 , Zhang Zhang1,2,3,10,11,* Database Commons: Dong Zou1,2,3,# , Lina Ma1,2,3,# (TL) eGPS Cloud: Lili Dong1,2,# , Bixia Tang1,2,# , Junwen Zhu1,2,# , Qing Zhou1,2,10 , Zhonghuang Wang1,2,10 , Hongen Kang1,2,10 , Xu Chen1,2 , Li Lan1,2 , Yiming Bao1,2,3,10,11,* (TL), Wenming Zhao1,2,3,10,* (TL) BIG Search: Dong Zou1,2,3,# (TL) BIG SSO: Junwei Zhu1,2,# (TL), Bixia Tang1,2,# BHBD: Yiming Bao1,2,3,10,11,* , Li Lan1,2 , Xin Zhang1,2 , Yingke Ma1,2,3 , Yongbiao Xue26 (Project Leader) Hardware & System Administration: Yubin Sun1,2 , Shuang Zhai1,2 , Lei Yu1,2 , Mingyuan Sun1,2 , Huanxin Chen1,2 (TL) Writing Group: Zhang Zhang1,2,3,10,11,* , Wenming Zhao1,2,3,10,* , Jingfa Xiao1,2,3,10,* , Yiming Bao1,2,3,10,11,* , Lili Hao1,2,3 NGDC PARTNERS (Listed in alphabetical order by database names) AnimalTFDB: Hui Hu27 , An-Yuan Guo27 dbPAF & WERAM: Shaofeng Lin27 , Yu Xue27 dbPPT: Chenwei Wang27 , Yu Xue27 dbPSP: Wanshan Ning27 , Yu Xue27 CellMarker: Xinxin Zhang28 , Yun Xiao28 , Xia Li28 CGDB: Yiran Tu27 , Yu Xue27 circAtlas: Wanying Wu29 , Peifeng Ji29 , Fangqing Zhao29 DEG & DoriC: Hao Luo30,31,32 , Feng Gao30,31,32 iEKPD: Yaping Guo27 , Yu Xue27 GenTree: Hao Yuan33,34 , Yong E. Zhang10,33,34 hTFtarget: Qiong Zhang27 , An-yuan Guo27 iUUCD: Jiaqi Zhou27 , Yu Xue27 LncRNADisease: Zhou Huang35 , Qinghua Cui35,36 lncRNASNP: Ya-Ru Miao27 , An-Yuan Guo27 MiCroKiTS: Chen Ruan27 , Yu Xue27 PceRBase: Chunhui Yuan37 , Ming Chen37 PlantTFDB: Jin-Pu Jin38 , Feng Tian38 , Ge Gao38 PLMD: Ying Shi27 , Yu Xue27 PTMD: Lan Yao27 , Yu Xue27 , Qinghua Cui35,36 RhesusBase: Xiangshang Li39 , Chuan-Yun Li39 SEGreg: Qing Tang27 , An-Yuan Guo27 THANATOS: Di Peng27 , Yu Xue27 1 National Genomics Data Center, Beijing 100101, China 2 BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 3 CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 4 Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China 5 Bio-Med Big Data Center, Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200231, China 6 CAS Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200231, China 7 Center for Quantitative Synthetic Biology, Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China 8 Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 9 CAS Key Laboratory of Animal Ecology and Conservation Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China 10 University of Chinese Academy of Sciences, Beijing 100049, China 11 School of Future Technology, University of Chinese Academy of Sciences, Beijing 100049, China 12 Yunnan Institute of Microbiology, School of Life Sciences, Yunnan University, Kunming, Yunnan 650091, China 13 School of Natural and Environmental Sciences, Ridley Building 2, Newcastle University, Newcastle upon Tyne, UK 14 School of Life Science and Technology, ShanghaiTech University, Shanghai 201210, China 15 Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China 16 Collaborative Innovation Center of Genetics and Development, Shanghai 200438, China 17 Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China 18 Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Beijing 100083, China 19 College of Life Sciences, Peking University, Beijing 100871, China 20 College of Biological Sciences and Biotechnology, National Engineering Laboratory for Tree Breeding, Beijing Forestry University, Beijing 100083, China Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D33 21 Institute of Plant and Food Science, Department of Biology, Southern University of Science and Technology (SUSTech), Shenzhen, Guangdong 518055, China 22 Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China 23 Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China 24 College of Plant Protection, Hunan Agricultural University, Hunan 410128, China 25 National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pak- istan 26 Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China 27 Department of Bioinformatics and Systems Biology, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China 28 College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China 29 Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China 30 Department of Physics, School of Science, Tianjin University, Tianjin 300072, China 31 Frontier Science Center of Synthetic Biology, Key Laboratory of Systems Bioengineering, Tianjin University, Tianjin 300072, China 32 SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China 33 Key Laboratory of Zoological Systematics and Evolution and State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China 34 CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan 650223, China 35 Department of Biomedical Informatics, School of Basic Medical Sciences, MOE Key Lab of Cardiovascular Sciences, Center for Noncoding RNA Medicine, Peking University, Beijing 100190, China 36 Center of Bioinformatics, Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China 37 Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, Institute of Plant Science, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China 38 Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences, Peking University, Beijing 100871, China 39 Institute of Molecular Medicine, Peking University, Beijing 100871, China *To whom correspondence should be addressed: Zhang Zhang (zhangzhang@big.ac.cn). Correspondence may also be addressed to Wenming Zhao (zhaowm@big.ac.cn), Jingfa Xiao (xiaojingfa@big.ac.cn), Yiming Bao (baoym@big.ac.cn), Shunmin He (heshunmin@ibp.ac.cn), Guoqing Zhang (gqzhang@picb.ac.cn), Yixue Li (yxli@sibs.ac.cn), Guoping Zhao (gpzhao@sibs.ac.cn) and Runsheng Chen (crs@sun5.ibp.ac.cn). # The authors wish it to be known that, in their opinion, these authors should be regarded as Joint First Authors. Downloadedfromhttps://academic.oup.com/nar/article/48/D1/D24/5614641byMasarykovaUniverzitauseron13October2020