2021
DOI: 10.1016/j.gpb.2021.08.001
|View full text |Cite
|
Sign up to set email alerts
|

The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types

Abstract: The Genome Sequence Archive (GSA) is a data repository for archiving raw sequence data, which provides data storage and sharing services for worldwide scientific communities. Considering explosive data growth with diverse data types, here we present the GSA family by expanding into a set of resources for raw data archive with different purposes, namely, GSA (https://ngdc.cncb.ac.cn/gsa/), GSA for Human (GSA-Human, https://ngdc.cncb.ac.cn/gsa-human/), and Open Archive for Miscellaneous Data (OMIX, https://ngdc.… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
384
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
10

Relationship

3
7

Authors

Journals

citations
Cited by 839 publications
(469 citation statements)
references
References 18 publications
2
384
0
1
Order By: Relevance
“…The datasets presented in the study are deposited in the Genome Sequence Archive (Chen et al, 2021) in National Genomics Data Center (Members and Partners, 2021), China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA005251) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa.…”
Section: Discussionmentioning
confidence: 99%
“…The datasets presented in the study are deposited in the Genome Sequence Archive (Chen et al, 2021) in National Genomics Data Center (Members and Partners, 2021), China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA005251) that are publicly accessible at https://ngdc.cncb.ac.cn/gsa.…”
Section: Discussionmentioning
confidence: 99%
“…CircleBase comprises 601 036 eccDNAs (candidates larger than 50M were removed) gleaned from 13 published papers on PubMed ( https://pubmed.ncbi.nlm.nih.gov/ ) and includes the following information: (i) chromosomal localizations of eccDNAs based on reference genomes hg19 and hg38; (ii) conditions or treatments; (iii) sample types; (iv) sequencing library types and (v) validation strategies. The eccDNA localizations were collected from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/ ) ( 16 , 17 ) and Genome Sequence Archive (GSA, https://ngdc.cncb.ac.cn/gsa/ ) ( 18 ).…”
Section: Methodsmentioning
confidence: 99%
“…A number of high-throughput RNA-seq projects and their associated datasets were collected from several public raw sequencing databases, including Genome Sequence Archive (GSA, https://ngdc.cncb.ac.cn/gsa/ ) ( 28 , 29 ), Sequence Read Archive (SRA, https://www.ncbi.nlm.nih.gov/sra/ ) ( 30 ), European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena ) ( 31 ) and DDBJ Sequence Read Archive (DRA, https://ddbj.nig.ac.jp/DRASearch/ ) ( 32 ). Only the datasets with median mapping rates ≥70% for bulk RNA-seq and ≥40% for scRNA-seq were kept for further processing.…”
Section: Methodsmentioning
confidence: 99%