D18–D28 Nucleic Acids Research, 2021, Vol. 49, Database issue Published online 11 November 2020
doi: 10.1093/nar/gkaa1022
Database Resources of the National Genomics Data
Center, China National Center for Bioinformation in
2021
CNCB-NGDC Members and Partners*,†
Received September 14, 2020; Revised October 13, 2020; Editorial Decision October 14, 2020; Accepted October 16, 2020
ABSTRACT
The National Genomics Data Center (NGDC), part of
the China National Center for Bioinformation (CNCB),
provides a suite of database resources to support
worldwide research activities in both academia
and industry. With the explosive growth of multiomics
data, CNCB-NGDC is continually expanding,
updating and enriching its core database resources
through big data deposition, integration and translation.
In the past year, considerable efforts have
been devoted to 2019nCoVR, a newly established
resource providing a global landscape of SARSCoV-2
genomic sequences, variants, and haplotypes,
as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases
Database), LncExpDB, and TransCirc
(Translation potential for circular RNAs). Meanwhile,
a series of resources have been updated and
improved, including BioProject, BioSample, GWH
(Genome Warehouse), GVM (Genome Variation Map),
GEN (Gene Expression Nebulas) as well as several
biodiversity and plant resources. Particularly, BIG
Search, a scalable, one-stop, cross-database search
engine, has been signiﬁcantly updated by providing
easy access to a large number of internal and external
biological resources from CNCB-NGDC, our
partners, EBI and NCBI. All of these resources along
with their services are publicly accessible at https:
//bigd.big.ac.cn.
INTRODUCTION
The National Genomics Data Center (NGDC), part of the
China National Center for Bioinformation (CNCB) officially
founded in November 2019, was built based on the
BIG Data Center, Beijing Institute of Genomics (BIG), Chinese
Academy of Sciences (CAS), with joint efforts and
collaborations from two CAS institutions, viz., Institute of
Biophysics (IBP) and Shanghai Institute of Nutrition and
Health (SINH) as well as several partners (https://bigd.big.
ac.cn/partners). Powered by higher-throughput and lowercost
genomics sequencing technologies, large-scale sequencing
projects for precision medicine and biodiversity studies
have been conducted around the world, leading to large
amounts of multi-omics data that are still generated at evergrowing
rates and scales. Therefore, CNCB-NGDC is dedicated
to advancing life and health sciences by providing
open access to a suite of data resources and services in
support of global research activities on big data archive,
storage, management and public sharing as well as multidisciplinary
data-driven research (1–4).
During the past year of 2020, an ongoing pandemic
caused by severe acute respiratory syndrome coronavirus
2 (SARS-CoV-2) has resulted in more than 27 million infected
cases and 897 000 deaths (as of 9 September 2020). To
provide SARS-CoV-2 genome sequences and variants publicly
available for the global research community (5), in the
past year, CNCB-NGDC has made considerable efforts to
build a SARS-CoV-2 information resource (6) by genomic
data collection, curation and deep-mining with extensive
updates on a daily basis. Additionally, CNCB-NGDC has
continued to expand and update other resources through
data deposition, integration and curation. In terms of
database property, database resources of CNCB-NGDC
can be generally grouped into three layers: Data––raw data
*To whom correspondence should be addressed. Tel: +86 10 84097261; Fax: +86 10 84097720; Email: ybxue@big.ac.cn
Correspondence may also be addressed to Yiming Bao. Email: baoym@big.ac.cn
Correspondence may also be addressed to Zhang Zhang. Email: zhangzhang@big.ac.cn
Correspondence may also be addressed to Wenming Zhao. Email: zhaowm@big.ac.cn
Correspondence may also be addressed to Jingfa Xiao. Email: xiaojingfa@big.ac.cn
Correspondence may also be addressed to Shunmin He. Email: heshunmin@ibp.ac.cn
Correspondence may also be addressed to Guoqing Zhang. Email: gqzhang@picb.ac.cn
Correspondence may also be addressed to Yixue Li. Email: yxli@sibs.ac.cn
Correspondence may also be addressed to Guoping Zhao. Email: gpzhao@sibs.ac.cn
Correspondence may also be addressed to Runsheng Chen. Email: crs@ibp.ac.cn
†
Full list is provided in the Appendix.
C The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work
is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
Nucleic Acids Research, 2021, Vol. 49, Database issue D19
NATIO
NAL GENOMICS DATA
C
ENTER
CHINANATIO
NAL CENTER FOR BIOIN
FORMATION
INFORMATION
KNOW
LEDGE
DA
TA
GWH
GSA
Database Commons
BioSam
ple
BioProject
BioCode
iSheep
iDog
IC4R
GTDB
PADS
Arsenal
eLMSG2019nCoVR
TransCirc
NPInter
piRBase
NONCODE
NucMap
EWAS DataHub
MethBank
GEN
PGG.Han
PGG.SNV
GVM
EDK
PED
LncBook
BrainBase
GWAS Atlas
EWAS Atlas
Aging Atlas
Figure 1. Core data resources of CNCB-NGDC. Three categories, viz.,
data, information and knowledge, are adopted to represent resources that
are typically to deposit raw data/metadata (archives), house value-added
information (databases) and integrate validated knowledge through literature
curation (knowledgebases), respectively. A full list of data resources,
which contains links to each resource, is available at https://bigd.big.ac.cn/
databases.
and affiliated metadata, Information––standardized information
and analyzed results, and Knowledge––curated associations
and value-added knowledge. Here we provide a
brief overview of new databases and recent updates to existing
databases in CNCB-NGDC and describe its core resources
and services (Figure 1). All these resources, along
with their services, are publicly accessible through the home
page of CNCB-NGDC at https://bigd.big.ac.cn.
NEW DATABASES
2019nCoVR
The 2019 Novel Coronavirus Resource (2019nCoVR, https:
//bigd.big.ac.cn/ncov/) (6) is an open-accessed SARS-CoV-
2 information resource. It contains a comprehensive collection
of genome sequences and clinical information for all
publicly available SARS-CoV-2 isolates, which are manually
curated with quality evaluation and value-added annotations
by our in-house automated pipeline. Consequently,
it houses a dynamic landscape of SARS-CoV-2 genomic
variants and haplotypes on a global scale. Specifically,
2019nCoVR identifies all variants from complete and highquality
genomes, visualizes the spatiotemporal change for
each variant, and constructs haplotype network maps and
phylogenetic trees for the course of the outbreak. Moreover,
2019nCoVR offers a set of online tools covering various
needs for SARS-CoV-2 genomic data analysis. In addition,
it provides a full collection of literatures on COVID-19, including
published papers from PubMed as well as preprints
from bioRxiv and medRxiv through Europe PMC. Collectively,
all SARS-CoV-2 genome sequences, variants, haplotypes
and literatures are integrated and updated daily since
January 2020, making 2019nCoVR a valuable resource for
the global research community.
Aging Atlas
The Aging Atlas (https://bigd.big.ac.cn/aging; detailed in
(7) in this issue) is an integrative database in support of aging
research. It provides open access to large-scale multiomics
datasets generated by a variety of high-throughput
sequencing technologies, involving genomics, epigenomics,
transcriptomics, proteomics, metabolomics, pharmacogenomics
and single-cell omics. The current implementation
includes five modules: RNA sequencing, epigenomic regulation,
single-cell sequencing, protein interactions and geroprotective
compounds.
BrainBase
BrainBase (https://bigd.big.ac.cn/brainbase) is a curated
knowledgebase for brain diseases. Based on manual curation
of published articles and related databases, BrainBase
features comprehensive integration of disease associations
from multiple omics levels and its current version houses
a total of 4248 associations covering 113 brain diseases
and 3996 genes/CpG sites. In addition, based on bioinformatic
analysis on expression datasets, BrainBase collects
655 brain-specific genes, 575 brain-region-specific genes and
1128 cerebrospinal fluid (CSF)-detectable genes. With a
particular focus on glioma, BrainBase integrates 22 gliomarelated
omics datasets (genome, transcriptome, epigenome
and proteome) and provides multi-omics molecular profiles
for glioma, which are of great utility to identify potential
biomarkers for glioma diagnosis, prognosis and treatment
prediction. Thus, BrainBase bears great promise to serve as
a valuable knowledgebase for brain studies.
CGIR
The Chloroplast Genome Information Resource (CGIR;
http://bigd.big.ac.cn/cgir) is a curated resource of chloroplast
genome information through comprehensive integration
and value-added annotation. The current release of
CGIR contains 4709 chloroplast genomes of 4485 species;
4290 are retrieved from NCBI, and the rest 419 are from
CNCB-NGDC Genome Warehouse, among which 403
genome assemblies of 247 species are sequenced by National
Resource Center for Chinese Materia Medica and
publicly released for the first time. Based on expert curation,
we standardize taxonomic classification for each
chloroplast (including families, genera, and species) and
present a comprehensive high-quality collection of chloroplast
genomes that belong to 1887 genera and 441 families
and cover 1165 featured plants with one or more associated
category (namely, medicinal, edible, energy and wood).
Considering the importance of photosynthesis, we further
investigate presence/absence variation (PAV) of photosynthesis
genes among all collected genomes and detect
the strength of selective pressure acting on photosynthesis
genes by comparing nonsynonymous and synonymous
substitution rates. Moreover, we identify potential molecular
markers for all collected assemblies and obtain a total
of 120,152 DNA sequence signatures (DSSs) and 1 770
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
D20 Nucleic Acids Research, 2021, Vol. 49, Database issue
546 simple sequence repeats (SSRs), which are of broad utility
to identify species in Chinese Pharmacopoeia (2020 edition),
and to develop SNP markers and PCR methods for
species identification. In conclusion, CGIR is capable to
help users easily access chloroplast genome information.
GTDB
The Glycosyltransferases Database (GTDB; https://www.
biosino.org/gtdb/) (8) is an integrated resource for glycosyltransferase
annotations, incorporating comprehensive information
of protein classification families, catalytic reactions
and metabolic pathways, etc. In the current version,
GTDB contains 520 179 glycosyltransferases from 21 647
taxonomy nodes and 394 kinds of enzymatic reactions. In
addition, GTDB provides: (i) a powerful search to retrieve
the complete details of a query by combining multiple identifiers
and data sources; (ii) an interactive browser to visualize
data by different classifications and download data in
batches; (iii) a BLAST tool (9) to search against pre-defined
sequences, facilitating the annotation of biological function
of glycosyltransferases and lastly, (iv) GTdock (8), which
uses AutoDock Vina to perform docking simulations of several
glycosyltransferases with the same single acceptor.
LncExpDB
LncExpDB (https://bigd.big.ac.cn/lncexpdb; detailed in
(10) in this issue) is an expression database of human long
non-coding RNAs (lncRNAs). Based on our previous work
on LncBook (11), LncExpDB houses abundant expression
profiles of 101 293 non-redundant, manually-curated
lncRNA genes across 337 biological conditions, which can
be further classified into nine important biological contexts,
namely, normal tissue/cell line, cancer cell line, subcellular
localization, exosome, cell differentiation, preimplantation
embryo, organ development, circadian rhythm, and virus
infection. Among them, 92 016 lncRNA genes (90.8%) are
supported with reliable transcriptional evidence and more
than one third of lncRNAs (31249) have the capacity to
be highly expressed under certain conditions. Most importantly,
LncExpDB provides a collection of featured lncRNAs
and their interacting partners and thus is of great significance
to help users conduct functional studies on lncR-
NAs.
scMethBank
Single-cell bisulfite sequencing methods are widely used
to assay epigenomic heterogeneity in cell states. Large
amounts of data have been generated over the past several
years, bearing great promises in deeper understanding
of the epigenetic regulation of key biological processes.
scMethBank (https://bigd.big.ac.cn/methbank/scm) is an
integrated database of single-cell methylation maps. It is
dedicated to the collection, integration, analysis and visualization
of single-cell methylation data and metadata.
The current release of scMethBank includes 3166 single-cell
methylation profiles as well as curated metadata, covering
two species (human and mouse), 14 projects, 26 cell types
and two diseases, and provides user-friendly web interfaces
for data browsing, search and download.
TransCirc
TransCirc (https://www.biosino.org/transcirc/) is a specialized
database that provides evidence of translation potential
for circular RNAs (circRNAs) (detailed in (12) in this issue).
It integrates seven types of direct and indirect evidence
of coding potential for human circRNAs and their putative
translation products, including ribosome/polysome
binding evidence, internal ribosomal entry sites, N-6methyladenosine
modification data, sequence composition
scores, mass spectrometry data, etc. TransCirc can serve as
an important resource for investigating the translation capacity
of circRNAs and will be expanded to add new evidence
or additional species in the future.
UPDATES TO EXISTING DATABASES
BioProject & BioSample
BioProject (https://bigd.big.ac.cn/bioproject) and BioSample
(https://bigd.big.ac.cn/biosample) are two public repositories
of biological research projects and samples, respectively.
They collect descriptive metadata on biological
projects and samples investigated in experiments, and
provide centralized accesses to all public projects and
samples, as well as cross links to their related data resources.
BioProject organizes and classifies a huge volume
of projects in terms of various data types, ranging from
genomic, transcriptomic, epigenomic and metagenomic sequencing
efforts to genome-wide association studies and
variation analyses. BioSample supports a wide scope of
sample types, including human, plant, animal, microbe,
virus, pathogen and metagenome. Up to August 2020, there
are a total of 2288 biological projects and 176 288 biological
samples submitted by 1341 users from 364 organizations
(Figure 2A).
Genome Sequence Archive
The Genome Sequence Archive (GSA; https://bigd.big.ac.
cn/gsa) (13) is a public data repository for archiving raw
sequence reads. GSA accepts multi-omics data submissions
from all over the world and provides free access to
all publicly available data for global scientific communities.
In April 2020, GSA-Human (https://bigd.big.ac.cn/
gsa-human), a sub-database of GSA, was further established,
with the specific aim to provide a set of services for
secure management of human genetic data with controlled
access. Particularly, any data submission to GSA-Human
is affiliated with a Data Administration Committee (DAC)
that is responsible for authorizing/declining data access to
data requestor. As of August 2020, GSA (together with
GSA-Human), has archived a total of 181 123 experiments
and 198 262 runs and housed >4600 Terabytes of sequencing
data (Figure 2B), exhibiting the nearly quadruple volume
compared to the previous release last August (∼1200
TB).
Genome Warehouse
The Genome Warehouse (GWH; https://bigd.big.ac.cn/
gwh) is a public resource archiving genome-scale data of
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
Nucleic Acids Research, 2021, Vol. 49, Database issue D21
Figure 2. Statistics of data submissions to BioProject, BioSample and
GSA. (A) Data statistics of BioProject and BioSample. (B) Data statistics
of Experiments and Runs as well as file size in GSA. All statistics are frequently
updated and publicly available at https://bigd.big.ac.cn/bioproject,
https://bigd.big.ac.cn/biosample and https://bigd.big.ac.cn/gsa.
a wide range of species. GWH accepts worldwide submissions
of genome assemblies and incorporates detailed
descriptive information for each assembly. It offers standardized
quality control for genome assembly and equips
with a genome browser (14) for genome visualization. By
August 2020, GWH has received 9337 direct submissions
covering a broad diversity of species. Among them, 1491
genome assemblies have been publicly released and reported
in 52 journal articles. Particularly, in collaboration
with 2019nCoVR, GWH has received the submission of
815 SARS-CoV-2 genome assemblies with standardized
genome annotations (15). So far, 78 of the genomes have
been publicly released and 25 have been shared, with the
submitters’ permission, in GenBank (16) through a data exchange
mechanism established with NCBI. In this model,
GWH accessions are represented as secondary accessions
in GenBank records, which are retrievable by the Entrez
system. Collectively, the rapid growth of genome-scale data
submissions demonstrates the great potential of GWH as an
important resource for accelerating the worldwide genomic
research.
Genome Variation Map
The Genome Variation Map (GVM; https://bigd.big.ac.cn/
gvm) (17) is a public repository of genome variations, including
single nucleotide polymorphisms (SNP) and small
insertions and deletions (indel). Unlike NCBI dbSNP (dedicated
only for human genome variations since September
2017), GVM features data collection for a wide range
of species and accepts data submissions from all over the
world. During the past year, GVM has been significantly
updated by reorganizing data entities and metadata into six
modules in terms of species, project, sample, variation, association,
and submission. In addition, it has received 56
genome variation data submissions involving 43 754 samples
from 26 species. Till August 2020, GVM houses a total
of ∼960 million variants derived from 191 projects and
64 820 samples and covering 13 animals, 25 plants and 3
viruses.
GWAS Atlas
GWAS Atlas (https://bigd.big.ac.cn/gwas) (18) is a curated
resource of genome-wide variant-trait associations in plants
and animals. In the current version, GWAS Atlas has been
updated by integrating 78 950 associations across seven cultivated
plants and five domesticated animals that were manually
curated from 1088 studies in 304 publications. As a result,
a total of 31 684 genes and 735 traits were annotated
and presented based on a set of ontologies. Together, GWAS
Atlas provides high-quality curated GWAS associations for
plants and animals, and accordingly serves as a valuable resource
for genetic research of important traits and breeding
application.
Gene Expression Nebulas
The Gene Expression Nebulas (GEN) (https://bigd.big.ac.
cn/gen/) is a comprehensive data portal of gene expression
profiles across various biological conditions. Based on a
set of ontologies on disease, tissue and cell type, GEN integrates
large-scale publicly available bulk and single-cell
RNA sequencing datasets with strict criteria from raw sequence
repositories such as CNCB-NGDC GSA (13) and
NCBI SRA (19). All high-quality sequencing data are processed
with standardized pipeline and manually curated
based on meta information from GSA, NCBI GEO (20)
as well as publications. In the current version, GEN has
integrated human expression profiles across 25 631 experiments
and 99 tissues from 141 studies, including 22 128
single-cell experiments that cover 410 149 cells in 31 diseases
and 47 development stages. In addition, GEN has also integrated
plant expression profiles in 35 organs from 50 studies,
including 945 experiments for rice, 506 for soybean, 462
for sorghum and 78 for wheat, respectively. GEN provides
convenient and user-friendly web interfaces for data browsing,
search, visualization and batch downloading, and also
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
D22 Nucleic Acids Research, 2021, Vol. 49, Database issue
equips with a suite of analysis tools for differential gene expression,
functional enrichment, regulatory network, and
cell type annotation.
Editome Disease Knowledgebase
Editome Disease Knowledgebase (EDK; http://bigd.big.ac.
cn/edk) is a curated knowledgebase of editome-disease associations,
featuring comprehensive integration of abnormal
RNA editing events and aberrant RNA editing enzyme activities
associated with human diseases (21). In the past year,
the curated associations in EDK have been updated, including
36 diseases associated with 582 experimentally validated
abnormal editing events in 143 messenger RNAs, 4 microRNAs,
47 viruses and 79 aberrant activities involved in three
editing enzyme families. Moreover, based on controlled vocabulary
for viral classification, EDK has integrated virusRNA
editing disease associations from more than 200 pub-
lications.
NONCODE
NONCODE is a comprehensive database that hosts the
most complete collection of noncoding RNAs and their annotations
(22). Particularly, it is dedicated to providing the
full landscape of long non-coding RNAs (lncRNAs). In the
current version (v6), lncRNAs in human and mouse were
greatly updated, and the number of lncRNAs has been increased
from 548 640 to 644 509. Moreover, NONCODE
summarized a total of 13 749 lncRNA-cancer associations
from public databases and literature. For plants, NONCODE
housed a set of 94 697 lncRNAs and also introduced
two important new features: (i) tissue expression profiles
and function prediction of lncRNAs in five common plants;
(ii) conservation annotation of lncRNAs for 23 plants. Collectively,
NONCODE is a comprehensive portal of lncRNAs
for both plants and animals and is freely available at
http://v6.noncode.org/.
SmProt
SmProt is a dedicated database that provides the scientific
community with valuable information about small proteins
(23). Here, we introduce the update of SmProt, which
emphasizes the reliability of the translated sORF, the genetic
variation in the translated sORF, the translation event
or sequence of the disease-specific sORF, and the significant
increase in data volume. The updated SmProt also includes
more components, such as non-AUG translation initiation,
functions and new resources. Totally, the current
version of SmProt incorporated 802,906 unique small proteins
curated from 3 695 141 primary records. These proteins
were calculated from 419 Ribo-seq data sets and collected
from literature and other sources, including 370 cell
lines or tissues of 8 species (Homo sapiens, Mus musculus,
Rattus norvegicus, Drosophila, Danio rerio, Saccharomyces
cerevisiae, Caenorhabditis elegans and Escherichia coli). In
addition, small protein families identified from human microbiomes
were also collected. All datasets in SmProt are
publicly available for browse, search and bulk downloads at
http://bigdata.ibp.ac.cn/SmProt/.
MethBank
The Methylation Bank (MethBank; http://bigd.big.ac.cn/
methbank) (24,25) is a comprehensive database that integrates
consensus reference methylomes (CRMs) and singlebase
resolution methylomes (SRMs) across a variety of
species, with a particular focus on human health and aging,
animal embryonic development, and plant growth and development.
In the current version, MethBank presents 163
CRMs and 5 687 344 methylation profiles of corresponding
genes from 80 normal tissues/cells of human (deduced from
22,775 publicly available DNA methylation 450K data). In
addition to CRMs, it provides 394 SRMs, 19 701 343 methylation
profiles of genes, 1 258 420 methylated CpG Islands
and 304 884 differentially methylated promoters in different
genomic contexts based on whole-genome bisulfite sequencing
data from normal human tissues, different developmental
stages in five economically important plants, and
multi-stage gametes and early embryos in two model animals.
Moreover, MethBank is armed with online tools to
predict human methylation age and identify differentially
methylated promoters via Fisher’s exact test (26) with FDR
correction. In addition, MethBank provides useful information
on 421 methylation data analysis tools, helpful for users
to easily find any tool of interest.
EWAS Atlas
EWAS Atlas (https://bigd.big.ac.cn/ewas) (27) is a curated
knowledgebase of epigenome-wide association studies. In
the past year, it has been enriched by adding a total of
126 393 EWAS associations manually curated from 324
publications. Taking advantage of massive high-quality
DNA methylation data, EWAS toolkit (https://bigd.big.ac.
cn/ewas/toolkit), was greatly enhanced for a wide range
of EWAS analyses (trait enrichment, GO enrichment, motif
analysis, chromatin enrichment, etc.). Till August 2020,
EWAS Atlas has integrated 577 267 high-quality EWAS associations
derived from 1216 studies in 725 publications,
including 3124 cohorts, 155 tissues/cell lines, 498 traits
and 435 ontology entities. As a data portal of EWAS Atlas,
EWAS Data Hub (https://bigd.big.ac.cn/ewas/datahub)
(28) houses 95 783 samples of standardized DNA methylation
array data and metadata, and provides DNA methylation
profiles for a list of 485 512 probes in association with
36 397 genes.
Biodiversity Resources
Biodiversity resources are dedicated for specific species, including
economically important crops, domesticated animals
and livestock. Currently, there are four major biodiversity
resources in CNCB-NGDC, namely, iDog, iSheep,
Information Commons for Rice (IC4R) and SorGVD. iDog
(https://bigd.big.ac.cn/idog) is an integrated omics data resource
for dog, including eight data modules and one analysis
module (29). As a dedicated resource for the ongoing
Dog10K Project (30), iDog has been considerably updated
by integrating more data and deploying new online
tools. In the current version, iDog mainly houses two de
novo assembly genomes, 42 871 184 non-redundant SNPs
from 127 samples, 783 curated diseases, 473 standardized
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
Nucleic Acids Research, 2021, Vol. 49, Database issue D23
breeds for phenotype traits, 594 genotype-to-phenotype
(G2P) pairs and 27 534 gene profiles from public RNAseq
projects. iSheep (https://bigd.big.ac.cn/isheep) is a specialized
resource dedicated to integrating omics data for
sheep. Currently, it contains 82 689 498 genomic variations
(including 70 370 968 SNPs and 12 318 530 Indels) from
2778 samples, 26 802 genes and 1417 breed information of
worldwide sheep. Moreover, it includes 922 genome-wide
variant-trait associations linked with 922 variants and 110
traits. IC4R (http://ic4r.org) (31,32) is a curated database
that provides rice genome sequences, gene annotations and
multi-omics data profiles. It was updated by incorporating
a new gene annotation system with improved gene structure
and completeness (33). Meanwhile, SnpReady for Rice
(SR4R; http://sr4r.ic4r.org) (34), a committed sub-database
of IC4R, was built based on a collection of 18 million SNPs
identified from 5152 rice accessions. Accordingly, SR4R delivers
four reference SNP panels (2 097 405 hapmapSNPs,
156 502 tagSNPs, 1180 fixedSNPs and 38 barcodeSNPs),
offering a highly efficient rice variation map for different
needs. SorGVD (https://bigd.big.ac.cn/sorgvd) (35), is
a comprehensive database for sorghum genomic variations
and phenotypes. The updated version of SorGSD provides
curated information of 39 547 621 genomic variations (including
33 825 236 SNPs and 5 722 385 small INDELs)
from resequencing data and phenotypes of 289 sorghum ac-
cessions.
Plant Resources
Plants are the basis of our Earth’s ecosystems, providing the
world’s molecular oxygen and serving as basic human foods
and medicines. Currently, CNCB-NGDC has two major resources
developed from different aspects, viz., Plant Editosome
Database (PED) and Leaf Senescence Database
(LSD). PED (https://bigd.big.ac.cn/ped) (36) is a curated
database of plant RNA editing factors. In the past year, it
has been updated by integrating 94 RNA editing factors,
78 edited genes, and 1,796 RNA editing events from 34 organelles
of 29 species manually curated from 39 publications.
Most editing factors and genes are related to plant
growth and development, among which 43 RNA editing
factors and 7 edited genes are newly added in PED. LSD
(https://bigd.big.ac.cn/lsd) (37) is a comprehensive database
for the leaf senescence research community. It currently incorporates
5853 senescence-associated genes and 617 mutants
from 68 species.
Database Commons
Database Commons (https://bigd.big.ac.cn/
databasecommons) is a catalogue of worldwide biological
databases. It provides easy access to a global landscape of
all publicly available databases and their descriptive metadata
manually curated from their publications. Currently,
it catalogues a total of 5064 databases, involving 7595
publications and 1944 organizations throughout the world.
In the past year, in addition to more database entries and
publications, web interfaces have been greatly improved,
allowing users to access and browse databases by country,
institution, category, data type and object. Furthermore,
powered by Europe PMC APIs (38), citations to all collected
databases are added in an automated manner and
updated weekly. To promote the incorporation of more
databases and indexed data, Database Commons is open
to accept data entry from the global research community.
BIG Search
BIG Search (https://bigd.big.ac.cn/search) is a distributed
and scalable full-text search engine built on Elasticsearch
(a highly scalable search and analytics engine, https://www.
elastic.co/). It features cross-database search and provides
uniform interfaces for retrieving information from a wide
range of biological databases in real-time. In the current
version, BIG Search has been significantly updated by incorporating
data indexes from internal and external biological
resources, including all resources in CNCB-NGDC
and 38 partner resources (see details at https://bigd.big.ac.
cn/partners). Followed by the integration of EBI resources
using the EBI Search RESTful API (39) last year, NCBI
resources were added to BIG Search powered by NCBI Entrez
(40). In summary, BIG Search offers easy access to a
large number of biological resources and provides one-stop
cross-database search services for the global research com-
munity.
Education
The interdisciplinary nature of bioinformatics, coupled with
rapid advances in genomics, artificial intelligence and data
science, has made bioinformatics an increasingly dataintensive
and data-driven field, bearing great promise to
translate big data into big discovery in life and health sciences.
To provide bioinformatics education services to our
users, this year we established our online education platform
(https://bigd.big.ac.cn/education/) that provides a series
of educational materials including online courses, tutorials
and training documents. As a starting point, we
currently offered two courses (Bioinformatics and Genomics)
and online tutorials for briefly introducing our core
databases and services. In addition, we delivered training
offerings nationally and internationally, particularly in coordination
with the Global Biodiversity and Health Big
Data (BHBD) Alliance. Over the past year, we have conducted
training and outreach programs for international researchers
in China and over 100 people in Pakistan. We plan
to establish worldwide collaborations with peers who have
common interests in developing and enriching our educational
materials and contents.
CONCLUDING REMARKS
The year of 2020 was very special. For one thing, CNCBNGDC
has been significantly reinforced by joint efforts
from BIG, IBP and SINH, close collaborations from our
partners, and long-term, continuous support from the
whole research community. For another, to deal with the
pandemic caused by SARS-CoV-2, CNCB-NGDC has developed
2019nCoVR, a SARS-CoV-2 information resource,
with daily updates on data integration, curation, and analysis.
More importantly, the COVID-19 outbreak accelerated
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
D24 Nucleic Acids Research, 2021, Vol. 49, Database issue
our collaboration in data sharing with the INSDC through
SARS-CoV-2 genome sequence exchange with NCBI. We
will be using this model to expand data sharing to genome
sequences of other organisms and other data types. Meanwhile,
growth of multi-omics data, particularly in human,
is explosive. Consequently, database resources of CNCBNGDC
have been enriched and updated by accepting data
submissions from all over the world, performing valueadded
curation and annotation and also improving web interfaces
and data services.
Ongoing efforts include, but not limited to, optimization
of curation models and processes, improvement of web
functionalities and database usage statistics, upgrade of infrastructure
capability for big data storage and transfer,
integration of more datasets from different resources, and
continuous development of new resources and tools in aid
of data-driven studies. We will also put in more efforts to establish
and improve underlying links between our database
resources, with the aim to fully realize the findability, accessibility,
interoperability and reusability (FAIR) of different
levels of data. In addition, CNCB-NGDC heavily engages
in the BHBD Alliance (http://bhbd-alliance.org) in order to
accelerate the translation of big data into knowledge discovery
by global collaborations in data sharing and mining.
With more stable support, CNCB-NGDC will continue to
grow and deliver a family of data resources and services in
support of both domestic and international research activ-
ities.
ACKNOWLEDGEMENTS
We thank our users for submitting data, sending suggestions,
reporting bugs and getting involving in community
curation. CNCB-NGDC is indebted to its funders, including
the Ministry of Science & Technology and the Ministry
of Finance of the People’s Republic of China as well as Chinese
Academy of Sciences. We also thank the whole bioinformatics
community in China, particularly the late Prof.
Bailin Hao, who advocated the establishment of CNCB
since the 1990s.
FUNDING
Strategic Priority Research Program of the Chinese
Academy of Sciences [XDB38030200, XDA19050302,
XDA19090116, XDA24040201, XDB38050300,
XDB38030100, XDB38030400, XDA12030100,
XDB38040300]; National Key Research & Development
Program of China [2019YFA0801801, 2018YFA0801405,
2018YFD1000505, 2018YFC2000100, 2018YFC1406902,
2018YFC0910400, 2018YFC0310602, 2018YFA0903700,
2018YFA0900704, 2017YFC1201200, 2017YFC0908405,
2017YFC0908404, 2017YFC0908403, 2017YFC0907505,
2017YFC0907503, 2017YFC0907502, 2016YFE0206600,
2016YFC0906403, 2016YFC0903003, 2016YFC0901904,
2016YFC0901903, 2016YFC0901702, 2016YFC0901604,
2016YFC0901603, 2016YFB0201702, 2016YFA0501704];
National Natural Science Foundation of China [91731303,
81670462, 31970565, 31871328, 31871294, 31701117,
31970647, 31801104, 31771465, 31771410, 31771388,
31671360, 81701567, 31571358, 31525014, 1470330,
31961130380, 31711530221, 31771477, 31571366,
31822030, 31801113, 31801154, 31771458, 91940303,
91940306, 31661143031, 31730110, 31871281, 31970634,
31930021, 31970633]; International Partnership Program of
the Chinese Academy of Sciences [153F11KYSB20160008,
153D31KYSB20170121]; 13th Five-year Informatization
Plan of Chinese Academy of Sciences [XXH13505-
05]; Genomics Data Center Construction of Chinese
Academy of Sciences [XXH-13514-0202]; Fundamental
Research Funds for the Central Universities [2019kfyRCPY043];
UK Royal Society-Newton Advanced Fellowship
[NAF\R1\191094]; Key Program of the Chinese Academy
of Sciences [KJZD-EW-L14]; Key Research Program
of Frontier Sciences of the Chinese Academy of Sciences
[QYZDJ-SSW-SYS009]; Key Technology Talent
Program of the Chinese Academy of Sciences; The 100
Talent Program of the Chinese Academy of Sciences;
K.C. Wong Education Foundation; The Youth Innovation
Promotion Association of the Chinese Academy
of Sciences [2019104, 2018134, 2017141]; The Special
Project on Precision Medicine under the National Key
R&D Program [SQ2017YFSF090210]; China Postdoctoral
Science Foundation [2019M652623, 2018M632830];
The Open Biodiversity and Health Big Data Program of
IUBS; The Professional Association of the Alliance of
International Science Organizations [ANSO-PA-2020-07];
Funds for Basic Resources Investigation Research of the
Ministry of Science and Technology [2018FY10080002];
Special Project on National Science and Technology Basic
Resources Investigation [2019FY100102]; CAS Pioneer
100-Talent program; Key Research Program of the Chinese
Academy of Sciences [KFZD-SW-219-5]; Zhangjiang
special project of national innovation demonstration
zone [ZJ2018-ZD-013]; Science and Technology Service
Network Initiative of Chinese Academy of Sciences. Funding
for open access charge: Strategic Priority Research
Program of the Chinese Academy of Sciences.
Conflict of interest statement. None declared.
REFERENCES
1. National Genomics Data Center Members and Partners. (2020)
Database Resources of the National Genomics Data Center in 2020.
Nucleic Acids Res., 48, D24–D33.
2. BIG Data Center Members. (2019) Database Resources of the BIG
Data Center in 2019. Nucleic Acids Res., 47, D8–D14.
3. BIG Data Center Members. (2018) Database Resources of the BIG
Data Center in 2018. Nucleic Acids Res., 46, D14–D20.
4. BIG Data Center Members. (2017) The BIG Data Center: from
deposition to integration to translation. Nucleic Acids Res., 45,
D18–D24.
5. Zhang,Z., Song,S., Yu,J., Zhao,W., Xiao,J. and Bao,Y. (2020) The
Elements of Data Sharing. Genomics Proteomics Bioinformatics, 18,
1–4.
6. Zhao,W.M., Song,S.H., Chen,M.L., Zou,D., Ma,L.N., Ma,Y.K.,
Li,R.J., Hao,L.L., Li,C.P., Tian,D.M. et al. (2020) The 2019 novel
coronavirus resource. Yi chuan = Hereditas / Zhongguo yi chuan xue
hui bian ji, 42, 212–221.
7. Aging Atlas Consortium. (2021) Aging Atlas: a multi-omics database
for aging biology. Nucleic Acids Res., doi:10.1093/nar/gkaa894.
8. Zhou,C., Xu,Q., He,S., Ye,W., Cao,R., Wang,P., Ling,Y., Yan,X.,
Wang,Q. and Zhang,G. (2020) GTDB: an integrated resource for
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
Nucleic Acids Research, 2021, Vol. 49, Database issue D25
glycosyltransferase sequences and annotations. Database (Oxford),
2020, baaa047.
9. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J.
(1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410.
10. Li,Z., Liu,L., Jiang,S., Li,Q., Feng,C., Du,Q., Zou,D., Xiao,J.,
Zhang,Z. and Ma,L. (2021) LncExpDB: an expression database of
human long non-coding RNAs. Nucleic. Acids. Res.,
doi:10.1093/nar/gkaa850.
11. Ma,L., Cao,J., Liu,L., Du,Q., Li,Z., Zou,D., Bajic,V.B. and Zhang,Z.
(2019) LncBook: a curated knowledgebase of human long
non-coding RNAs. Nucleic. Acids. Res., 47, D128–D134.
12. Huang,W., Ling,Y., Zhang,S., Xia,Q., Cao,R., Fan,X., Fang,Z.,
Wang,Z. and Zhang,G. (2021) TransCirc: an interactive database for
translatable circular RNAs based on multi-omics evidence. Nucleic.
Acids. Res., doi:10.1093/nar/gkaa823.
13. Wang,Y., Song,F., Zhu,J., Zhang,S., Yang,Y., Chen,T., Tang,B.,
Dong,L., Ding,N., Zhang,Q. et al. (2017) GSA: Genome Sequence
Archive. Genomics Proteomics Bioinformatics, 15, 14–18.
14. Buels,R., Yao,E., Diesh,C.M., Hayes,R.D., Munoz-Torres,M.,
Helt,G., Goodstein,D.M., Elsik,C.G., Lewis,S.E., Stein,L. et al.
(2016) JBrowse: a dynamic web platform for genome visualization
and analysis. Genome Biol., 17, 66.
15. Ren,L.L., Wang,Y.M., Wu,Z.Q., Xiang,Z.C., Guo,L., Xu,T.,
Jiang,Y.Z., Xiong,Y., Li,Y.J., Li,X.W. et al. (2020) Identification of a
novel coronavirus causing severe pneumonia in human: a descriptive
study. Chin. Med. J. (Engl.), 133, 1015–1024.
16. Sayers,E.W., Cavanaugh,M., Clark,K., Ostell,J., Pruitt,K.D. and
Karsch-Mizrachi,I. (2020) GenBank. Nucleic Acids Res., 48,
D84–D86.
17. Song,S., Tian,D., Li,C., Tang,B., Dong,L., Xiao,J., Bao,Y., Zhao,W.,
He,H. and Zhang,Z. (2018) Genome Variation Map: a data
repository of genome variations in BIG Data Center. Nucleic Acids
Res., 46, D944–D949.
18. Tian,D., Wang,P., Tang,B., Teng,X., Li,C., Liu,X., Zou,D., Song,S.
and Zhang,Z. (2020) GWAS Atlas: a curated resource of
genome-wide variant-trait associations in plants and animals. Nucleic
Acids Res., 48, D927–D932.
19. Leinonen,R., Sugawara,H. and Shumway,M. (2011) The sequence
read archive. Nucleic Acids Res., 39, 19–21.
20. Barrett,T., Wilhite,S.E., Ledoux,P., Evangelista,C., Kim,I.F.,
Tomashevsky,M., Marshall,K.A., Phillippy,K.H., Sherman,P.M.,
Holko,M. et al. (2013) NCBI GEO: archive for functional genomics
data sets–update. Nucleic Acids Res., 41, D991–D995.
21. Niu,G., Zou,D., Li,M., Zhang,Y., Sang,J., Xia,L., Li,M., Liu,L.,
Cao,J., Zhang,Y. et al. (2019) Editome Disease Knowledgebase
(EDK): a curated knowledgebase of editome-disease associations in
human. Nucleic Acids Res., 47, D78–D83.
22. Fang,S., Zhang,L., Guo,J., Niu,Y., Wu,Y., Li,H., Zhao,L., Li,X.,
Teng,X., Sun,X. et al. (2018) NONCODEV5: a comprehensive
annotation database for long non-coding RNAs. Nucleic Acids Res.,
46, D308–D314.
23. Hao,Y., Zhang,L., Niu,Y., Cai,T., Luo,J., He,S., Zhang,B., Zhang,D.,
Qin,Y., Yang,F. et al. (2018) SmProt: a database of small proteins
encoded by annotated coding and non-coding RNA loci. Brief.
Bioinform., 19, 636–643.
24. Li,R., Liang,F., Li,M., Zou,D., Sun,S., Zhao,Y., Zhao,W., Bao,Y.,
Xiao,J. and Zhang,Z. (2018) MethBank 3.0: a database of DNA
methylomes across a variety of species. Nucleic Acids Res., 46,
D288–D295.
25. Zou,D., Sun,S., Li,R., Liu,J., Zhang,J. and Zhang,Z. (2015)
MethBank: a database integrating next-generation sequencing
single-base-resolution DNA methylation programming data. Nucleic
Acids Res., 43, D54–58.
26. Sprent,P. (2011) Fisher Exact Test. In: LovricM (ed). International
Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-642-04898-2 253.
27. Li,M., Zou,D., Li,Z., Gao,R., Sang,J., Zhang,Y., Li,R., Xia,L.,
Zhang,T., Niu,G. et al. (2019) EWAS Atlas: a curated knowledgebase
of epigenome-wide association studies. Nucleic Acids Res., 47,
D983–D988.
28. Xiong,Z., Li,M., Yang,F., Ma,Y., Sang,J., Li,R., Li,Z., Zhang,Z. and
Bao,Y. (2020) EWAS Data Hub: a resource of DNA methylation
array data and metadata. Nucleic Acids Res., 48, D890–D895.
29. Tang,B., Zhou,Q., Dong,L., Li,W., Zhang,X., Lan,L., Zhai,S.,
Xiao,J., Zhang,Z., Bao,Y. et al. (2019) iDog: an integrated resource
for domestic dogs and wild canids. Nucleic Acids Res., 47,
D793–D800.
30. Ostrander,E.A., Wang,G.D., Larson,G., vonHoldt,B.M., Davis,B.W.,
Jagannathan,V., Hitte,C., Wayne,R.K., Zhang,Y.P. and Dog,K.C.
(2019) Dog10K: an international sequencing effort to advance studies
of canine domestication, phenotypes and health. Natl. Sci. Rev., 6,
810–824.
31. IC4R Project Consortium. (2016) Information Commons for Rice
(IC4R). Nucleic Acids Res., 44, D1172–D1180.
32. Xia,L., Zou,D., Sang,J., Xu,X., Yin,H., Li,M., Wu,S., Hu,S., Hao,L.
and Zhang,Z. (2017) Rice Expression Database (RED): an integrated
RNA-Seq-derived gene expression database for rice. J Genet
Genomics, 44, 235–241.
33. Sang,J., Zou,D., Wang,Z., Wang,F., Zhang,Y., Xia,L., Li,Z., Ma,L.,
Li,M., Xu,B. et al. (2020) IC4R-2.0: rice genome reannotation using
massive RNA-seq data. Genomics Proteomics Bioinformatics, 18,
161–172.
34. Yan,J., Zou,D., Li,C., Zhang,Z., Song,S. and Wang,X. (2020) SR4R:
an integrative SNP resource for genomic breeding and population
research in rice. Genomics Proteomics Bioinformatics, 18, 173–185.
35. Luo,H., Zhao,W., Wang,Y., Xia,Y., Wu,X., Zhang,L., Tang,B.,
Zhu,J., Fang,L., Du,Z. et al. (2016) SorGSD: a sorghum genome
SNP database. Biotechnol. Biofuels, 9, 6.
36. Li,M., Xia,L., Zhang,Y., Niu,G., Li,M., Wang,P., Zhang,Y., Sang,J.,
Zou,D., Hu,S. et al. (2019) Plant editosome database: a curated
database of RNA editosome in plants. Nucleic. Acids. Res., 47,
D170–D174.
37. Li,Z., Zhang,Y., Zou,D., Zhao,Y., Wang,H.L., Zhang,Y., Xia,X.,
Luo,J., Guo,H. and Zhang,Z. (2020) LSD 3.0: a comprehensive
resource for the leaf senescence research community. Nucleic Acids
Res., 48, D1069–D1075.
38. Levchenko,M., Gou,Y., Graef,F., Hamelers,A., Huang,Z.,
Ide-Smith,M., Iyer,A., Kilian,O., Katuri,J., Kim,J.H. et al. (2018)
Europe PMC in 2017. Nucleic Acids Res., 46, D1254–D1260.
39. Madeira,F., Park,Y.M., Lee,J., Buso,N., Gur,T., Madhusoodanan,N.,
Basutkar,P., Tivey,A.R.N., Potter,S.C., Finn,R.D. et al. (2019) The
EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic
Acids Res., 47, W636–W641.
40. Gibney,G. and Baxevanis,A.D. (2011) Searching NCBI Databases
Using Entrez. Curr. Protoc. Hum. Genet.,
doi:10.1002/0471142905.hg0610s71.
APPENDIX
Corresponding author: Yongbiao Xue1,2,3,*
Co-corresponding authors: Yiming Bao1,2,3,4,*
, Zhang
Zhang1,2,3,4,*
, Wenming Zhao1,2,3,4,*
, Jingfa Xiao1,2,3,4,*
,
Shunmin He3,5,6,*
, Guoqing Zhang3,7,*
, Yixue Li3,7,*
,
Guoping Zhao3,7,8,9,*
, Runsheng Chen6,10,*
CNCB-NGDC MEMBERS (Arranged by project role and
then by contribution except for Team Leader (TL), as indi-
cated)
2019nCoVR: Shuhui Song1,2,3,4,#
, Lina Ma1,2,4,#
, Dong
Zou1,2,4,#
, Dongmei Tian1,2,4,#
, Cuiping Li1,2,4,#
, Junwei
Zhu1,2,4,#
, Zheng Gong1,2,3,4,#
, Meili Chen1,2,4
, Anke
Wang1,2,4
, Yingke Ma1,2,4
, Mengwei Li1,2,3,4
, Xufei
Teng1,2,3,4
, Ying Cui1,2,3,4
, Guangya Duan1,2,3,4
, Mochen
Zhang1,2,4,15
, Tong Jin1,2,3,4
, Chengmin Shi1,11
, Zhenglin
Du1,2,4
, Yadong Zhang1,2,3,4
, Chuandong Liu1,11
, Rujiao
Li1,2,4
, Jingyao Zeng1,2,4
, Lili Hao1,2,4
, Shuai Jiang1,2,4
,
Hua Chen1,11
, Dali Han1,11
, Jingfa Xiao1,2,3,4
, Zhang
Zhang1,2,3,4,*
(TL), Wenming Zhao1,2,3,4,*
(TL), Yongbiao
Xue1,2,3,*
(TL), Yiming Bao1,2,3,4,*
(TL)
Aging Atlas: Tao Zhang1,2,3,4,#
, Wang Kang1,3,11,#
, Fei
Yang1,2,3,4,#
, Jing Qu3,12,13
, Weiqi Zhang2,3,11,12,#
(TL), Yiming
Bao1,2,3,4,*
(TL), Guang-Hui Liu3,12,14,#
(TL)
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
D26 Nucleic Acids Research, 2021, Vol. 49, Database issue
BrainBase: Lin Liu1,2,3,4,#
, Yang Zhang1,2,3,4,#
, Guangyi
Niu1,2,3,4,#
, Tongtong Zhu1,2,4,15
, Changrui Feng1,2,3,4
, Xiaonan
Liu1,2,4,15
, Yuansheng Zhang1,2,3,4
, Zhao Li1,2,3,4
,
Ruru Chen1,2,4,16
, Qianpeng Li1,2,3,4
, Xufei Teng1,2,3,4
, Lina
Ma1,2,4,#
(TL)
CGIR: Zhongyi Hua17,#
, Dongmei Tian1,2,4,#
, Chao
Jiang17,#
, Ziyuan Chen17
, Fangshu He17
, Yuyang Zhao17
,
Yan Jin17
, Zhang Zhang1,2,3,4,*
, Luqi Huang17
, Shuhui
Song1,2,3,4,#
(TL), Yuan Yuan17,#
(TL)
GTDB: Chenfen Zhou7
, Qingwei Xu18
, Sheng He7,19
,
WeiYe7
, Ruifang Cao7
, Pengyu Wang7
, Yunchao Ling7
,
Xing Yan8
, Qingzhong Wang7
, Guoqing Zhang3,7,*
LncExpDB: Zhao Li1,2,3,4,#
, Lin Liu1,2,3,4,#
, Shuai Jiang1,2,4
,
Qianpeng Li1,2,3,4
, Changrui Feng1,2,3,4
, Qiang Du1,2,3,4
,
Lina Ma1,2,4,#
(TL)
scMethBank: Wenting Zong1,2,3,4,#
, Hongen Kang1,2,3,4,#
,
Mochen Zhang1,2,4,15
, Zhuang Xiong1,2,3,4
, Rujiao Li1,2,4,#
(TL)
TransCirc: Wendi Huan3,7,#
, Yunchao Ling7,#
, Sirui
Zhang3,7
, Qiguang Xia3,7
, Ruifang Cao7
, Xiaojuan Fan7
,
Zefeng Wang3,7,20,#
, Guoqing Zhang3,7,*
BioProject & BioSample & GSA & BIG Submission:
Xu Chen1,2,4,#
, Tingting Chen1,2,4,#
, Sisi Zhang1,2,4,#
,
Bixia Tang1,2,4,#
, Junwei Zhu1,2,4,#
, Lili Dong1,2,4
, Zhewen
Zhang1,2,4
, Zhonghuang Wang1,2,3,4
, Hailong Kang1,2,3,4
,
Yanqing Wang1,2,4,#
(TL)
GWH: Yingke Ma1,2,4,#
, Song Wu1,2,3,4#
, Hongen
Kang1,2,3,4
, Meili Chen1,2,4,#
(TL)
GVM: Cuiping Li1,2,4,#
, Dongmei Tian1,2,4,#
, Bixia
Tang1,2,4,#
, Xiaonan Liu1,2,3,4,#
, Xufei Teng1,2,3,4,#
, Shuhui
Song1,2,3,4,#
(TL)
GWAS Atlas: Dongmei Tian1,2,4,#
, Xiaonan Liu1,2,3,4,#
,
Cuiping Li1,2,4
, Xufei Teng1,2,3,4
, Shuhui Song1,2,3,4,#
(TL)
GEN: Yuansheng Zhang1,2,3,4,#
, Dong Zou1,2,4,#
, Tongtong
Zhu1,2,4,15,#
, Ming Chen1,2,4,15
, Guangyi Niu1,2,3,4
, Chang
Liu1,2,3,4
, Yujia Xiong21,22
, Lili Hao1,2,4,#
(TL)
EDK: Guangyi Niu1,2,3,4,#
, Dong Zou1,2,4,#
, Tongtong
Zhu1,2,4,15
, Xueying Shao23
, Lili Hao1,2,4,#
(TL)
SmProt: Yanyan Li6,24,#
, Honghong Zhou6,#
, Xiaomin
Chen3,6,#
, Yu Zheng6,24
, Quan Kang6
, Di Hao6
, Lili
Zhang3,6
, Huaxia Luo6
, Yajing Hao6
, Runsheng Chen6,10,*
,
Peng Zhang6,#
, Shunmin He3,5,6,*
MethBank: Dong Zou1,2,4,#
, Mochen Zhang1,2,4,15,#
,
Zhuang Xiong1,2,3,4
, Zhi Nie1,2,3,4
, Shuhuan Yu1,2,3,4
,
Rujiao Li1,2,4,#
(TL)
EWAS Atlas: Mengwei Li1,2,3,4,#
, Rujiao Li1,2,4
, Yiming
Bao1,2,3,4,*
(TL)
EWAS Data Hub: Zhuang Xiong1,2,3,4,#
, Mengwei Li1,2,3,4,#
,
Fei Yang1,2,3,4,#
, Yingke Ma1,2,4
, Jian Sang1,2,3,4
, Zhaohua
Li 1,2,4,15
, Rujiao Li1,2,4,#
(TL)
iDog: Bixia Tang1,2,4,#
, Xiangquan Zhang25,#
, Lili
Dong1,2,4,#
, Qing Zhou1,2,3,4
, Ying Cui1,2,3,4
, Shuang
Zhai1,2,4
, Yaping Zhang25
, Guodong Wang25,#
(TL),
Wenming Zhao1,2,3,4,*
(TL)
iSheep: Zhonghuang Wang1,2,3,4,#
, Qianghui Zhu3,26,#
,
Xin Li26
, Junwei Zhu1,2,4
, Dongmei Tian1,2,4
, Hailong
Kang1,2,3,4
, Cuiping Li1,2,4
, Sisi Zhang1,2,4
, Shuhui
Song1,2,3,4
, Menghua Li (TL)26,27
, Wenming Zhao1,2,3,4,*
(TL)
IC4R: Jun Yan28,#
, Jian Sang1,2,3,4,#
, Dong Zou1,2,4,#
, Chen
Li29
, Zhennan Wang3,30
, Yuansheng Zhang1,2,3,4
, Tongtong
Zhu1,2,4,15
, Shuhui Song1,2,3,4,#
(TL), Xiangfeng Wang28,#
(TL), Lili Hao1,2,4,#
(TL)
SorGSD: Yuanming Liu3,31,#
, Zhonghuang Wang1,2,3,4,#
,
Hong Luo31
, Junwei Zhu1,2,4
, Xiaoyuan Wu31
, Dongmei
Tian1,2,4
, Cuiping Li1,2,4
, Wenming Zhao1,2,3,4,*
(TL), HaiChun
Jing3,31,32,#
(TL)
PED: Ming Chen1,2,3,4,#
, Dong Zou1,2,4,#
, Lili Hao1,2,4,#
(TL)
NONCODE: Lianhe Zhao3,5,#
, Jiajia Wang6,24,#
, Yanyan
Li6,24,#
, Tinrui Song6
, Yu Zheng6,24
, Runsheng Chen6,10,*
,
Yi Zhao5,#
, Shunmin He3,6,*
Database Commons: Dong Zou1,2,4,#
, Furrukh
Mehmood33
, Shahid Ali33
, Amjad Ali34
, Shoaib Saleem33
,
Irfan Hussain33
, Amir A. Abbasi33
, Lina Ma1,2,4,#
(TL)
BIG Search: Dong Zou1,2,4,#
(TL)
Education: Dong Zou1,2,4,#
, Shuai Jiang1,2,4
, Zhang
Zhang1,2,3,4,*
(TL)
Writing Group: Shuai Jiang1,2,4,#
, Wenming Zhao1,2,3,4,*
,
Jingfa Xiao1,2,3,4,*
, Yiming Bao1,2,3,4,*
, Zhang Zhang1,2,3,4,*
CNCB-NGDC PARTNERS (Listed in alphabetical order
by database names)
BBCancer: Zhixiang Zuo35
, Jian Ren35
CancerSEA: Xinxin Zhang36
, Yun Xiao36
, Xia Li36
CellMarker: Xinxin Zhang36
, Yun Xiao36
, Xia Li36
CGDB: Yiran Tu37
, Yu Xue37
circAtlas: Wanying Wu38
, Peifeng Ji38
, Fangqing Zhao38
CircFunBase: Xianwen Meng39
, Ming Chen39
dbPSP & THANATOS: Di Peng37
, Yu Xue37
DEG & DoriC: Hao Luo40,41,42
, Feng Gao40,41,42
DiseaseEnhancer: Xinxin Zhang36
, Yun Xiao36
, Xia Li36
DrLLPS: Wanshan Ning37
, Yu Xue37
EPSD & WERAM: Shaofeng Lin37
, Yu Xue37
EVmiRNA: Teng Liu37
, An-Yuan Guo37
GenTree: Hao Yuan43,44
, Yong E. Zhang3,43,44
iEKPD: Xiaodan Tan37
, Yu Xue37
iUUCD: Weizhi Zhang37
, Yu Xue37
lnCAR: Yubin Xie35
, Jian Ren35
MiCroKiTS: Chenwei Wang37
, Yu Xue37
miRNASNP: Chun-Jie Liu37
, An-Yuan Guo37
PlantRegMap: De-Chang Yang45
, Feng Tian45
, Ge Gao45
PLMD: Dachao Tang37
, Yu Xue37
PTMD: Lan Yao37
, Yu Xue37
, Qinghua Cui46,47
RhesusBase: Ni A. An48
, Chuan-Yun Li48
RMVar: XiaoTong Luo35
, Jian Ren35
SEECancer: Xinxin Zhang36
, Yun Xiao36
, Xia Li36
* To whom correspondence should be addressed: Yongbiao
Xue (ybxue@big.ac.cn).
Correspondence may also be addressed to Yiming
Bao (baoym@big.ac.cn), Zhang Zhang
(zhangzhang@big.ac.cn), Wenming Zhao
(zhaowm@big.ac.cn), Jingfa Xiao (xiaojingfa@big.ac.cn),
Shunmin He (heshunmin@ibp.ac.cn), Guoqing Zhang
(gqzhang@picb.ac.cn), Yixue Li (yxli@sibs.ac.cn), Guoping
Zhao (gpzhao@sibs.ac.cn) and Runsheng Chen
(crs@ibp.ac.cn).
#
The authors wish it to be known that, in their opinion,
these authors should be regarded as Joint First Authors.
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
Nucleic Acids Research, 2021, Vol. 49, Database issue D27
1
China National Center for Bioinformation, Beijing
100101, China
2
National Genomics Data Center, Beijing Institute of Genomics,
Chinese Academy of Sciences, Beijing 100101,
China
3
University of Chinese Academy of Sciences, Beijing
100049, China
4
CAS Key Laboratory of Genome Sciences and Information,
Beijing Institute of Genomics, Chinese Academy of
Sciences, Beijing 100101, China
5
Institute of Computing Technology, Chinese Academy of
Sciences, Beijing 100190, China
6
National Genomics Data Center & Key Laboratory of
RNA Biology, Center for Big Data Research in Health, Institute
of Biophysics, Chinese Academy of Sciences, Beijing
100101, China
7
National Genomics Data Center & Bio-Med Big Data
Center, Key Laboratory of Computational Biology, CASMPG
Partner Institute for Computational Biology, Shanghai
Institute of Nutrition and Health, University of Chinese
Academy of Sciences, Chinese Academy of Sciences,
320 Yueyang Road, Xuhui, Shanghai 200031, China
8
CAS-Key Laboratory of Synthetic Biology, CAS Center
for Excellence in Molecular Plant Sciences, Institute of
Plant Physiology and Ecology, Chinese Academy of Sciences,
300 Fenglin Road, Xuhui, Shanghai 200032, China
9
Center for Quantitative Synthetic Biology, Institute of
Synthetic Biology, Shenzhen Institutes of Advanced Technology,
Chinese Academy of Sciences, Shenzhen, China
10
Guangdong Geneway Decoding Bio-Tech Co. Ltd, Foshan
528316, China
11
CAS Key Laboratory of Genomic and Precision
Medicine, Beijing Institute of Genomics, Chinese Academy
of Sciences, Beijing 100101, China
12
Institute for Stem cell and Regeneration, CAS, Beijing
100101, China
13
State Key Laboratory of Stem Cell and Reproductive Biology,
Institute of Zoology, Chinese Academy of Sciences,
Beijing 100101, China
14
State Key Laboratory of Membrane Biology, Institute
of Zoology, Chinese Academy of Sciences, Beijing 100101,
China
15
School of Future Technology, University of Chinese
Academy of Sciences, Beijing 100049, China
16
Sino-Danish College, University of Chinese Academy of
Sciences, Beijing 100049, China
17
National Resource Center for Chinese Materia Medica,
Chinese Academy of Chinese Medical Sciences (CACMS),
China
18
College of Computer, Hubei University of Education,
129 Second Gaoxin Road, Wuhan Hi-Tech Zone, WuHan
430205, China
19
School of Life Science and Technology, Shanghai Tech
University, 393 Middle Huaxia Road, Pudong, Shanghai
201210, China
20
CAS Center for Excellence in Molecular Cell Science, Chinese
Academy of Sciences, Shanghai 200031, China
21
Beijing Neurosurgical Institute, Beijing, China
22
Capital Medical University, Beijing, China
23
School of Computer Science and Engineering, South
China University of Technology, China
24
College of Life Sciences, University of Chinese Academy
of Sciences, Beijing 100049, China
25
State Key Laboratory of Genetic Resources and Evolution,
Kunming Institute of Zoology, Chinese Academy of
Sciences, Kunming 650223, China
26
CAS Key Laboratory of Animal Ecology and Conservation
Biology, Institute of Zoology, Chinese Academy of Sciences,
Beijing 100101, China
27
College of Animal Science and Technology, China Agricultural
University, Beijing 100193, China
28
Department of Crop Genomics and Bioinformatics, College
of Agronomy and Biotechnology, China Agricultural
University, Beijing 100094, China
29
Rice Research Institute, Guangdong Academy of Agricultural
Sciences, Guangzhou 510640, China
30
Institute of Zoology, Chinese Academy of Sciences, Beijing
100101, China
31
Key Laboratory of Plant Resources, Institute of Botany,
Chinese Academy of Sciences, Beijing 100093, China
32
Engineering Laboratory for Grass-Based Livestock Husbandry,
Chinese Academy of Sciences, Beijing 100093,
China
33
Faculty of Biological Sciences, Quaid-i-Azam University,
Islamabad 45320, Pakistan
34
Atta-ur-Rahman School of Applied Biosciences (ASAB),
National University of Sciences & Technology (NUST), Islamabad
44000, Pakistan
35
State Key Laboratory of Oncology in South China, Cancer
Center, Collaborative Innovation Center for Cancer
Medicine, School of Life Sciences, Sun Yat-sen University,
Guangzhou 510060, China
36
College of Bioinformatics Science and Technology,
Harbin Medical University, Harbin, Heilongjiang 150081,
China
37
Key Laboratory of Molecular Biophysics of Ministry of
Education, Hubei Bioinformatics and Molecular Imaging
Key Laboratory, Center for Artificial Intelligence Biology,
College of Life Science and Technology, Huazhong University
of Science and Technology, Wuhan 430074, Hubei,
China
38
Beijing Institutes of Life Science, Chinese Academy of Sciences,
Beijing 100101, China
39
Zhejiang University, Hangzhou, 310027, China
40
Department of Physics, School of Science, Tianjin University,
Tianjin 300072, China
41
Frontiers Science Center for Synthetic Biology and Key
Laboratory of Systems Bioengineering (Ministry of Education),
Tianjin University, Tianjin 300072, China
42
SynBio Research Platform, Collaborative Innovation
Center of Chemical Science and Engineering (Tianjin),
Tianjin 300072, China
43
Key Laboratory of Zoological Systematics and Evolution
and State Key Laboratory of Integrated Management
of Pest Insects and Rodents, Institute of Zoology, Chinese
Academy of Sciences, Beijing 100101, China
44
CAS Center for Excellence in Animal Evolution and Genetics,
Chinese Academy of Sciences, Kunming, Yunnan
650223, China
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021
D28 Nucleic Acids Research, 2021, Vol. 49, Database issue
45
Biomedical Pioneering Innovation Center (BIOPIC), Beijing
Advanced Innovation Center for Genomics (ICG),
Center for Bioinformatics (CBI), and State Key Laboratory
of Protein and Plant Gene Research at School of Life Sciences,
Peking University, Beijing 100871, China
46
Department of Biomedical Informatics, School of Basic
Medical Sciences, MOE Key Lab of Cardiovascular Sciences,
Center for Noncoding RNA Medicine, Peking University,
Beijing 100190, China
47
Center of Bioinformatics, Key Laboratory for NeuroInformation
of Ministry of Education, School of Life Science
and Technology, University of Electronic Science and Technology
of China, Chengdu, Sichuan 610054, China
48
Beijing Key Laboratory of Cardiometabolic Molecular
Medicine, Institute of Molecular Medicine, Peking University,
Beijing, China
Downloadedfromhttps://academic.oup.com/nar/article/49/D1/D18/5974090bygueston24February2021