2019
DOI: 10.1101/618025
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

recount-brain: a curated repository of human brain RNA-seq datasets metadata

Abstract: The usability of publicly-available gene expression data is often limited by the availability of high-quality, standardized biological phenotype and experimental condition information ("metadata"). We released the recount2 project, which involved re-processing ~70,000 samples in the Sequencing Read Archive (SRA), Genotype-Tissue Expression (GTEx), and The Cancer Genome Atlas (TCGA) projects. While samples from the latter two projects are well-characterized with extensive metadata, the ~50,000 RNA-seq samples f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(13 citation statements)
references
References 44 publications
0
13
0
Order By: Relevance
“…First, it will be important to continue to build better models for predicting missing metadata and correcting mistakes in metadata (Ellis et al, 2018). Second, it will be important to enable users with more detailed knowledge of the datasets to create their own collections of related datasets, possibly with their own hand-curated metadata (Razmara et al, 2019), and allow the sharing of such hand-curated collections with the wider community. Third, since metadata can sometimes be an unreliable way to find relevant datasets, it will be important to design methods that search for related datasets based on their contents rather than their metadata, e.g.…”
Section: Discussionmentioning
confidence: 99%
“…First, it will be important to continue to build better models for predicting missing metadata and correcting mistakes in metadata (Ellis et al, 2018). Second, it will be important to enable users with more detailed knowledge of the datasets to create their own collections of related datasets, possibly with their own hand-curated metadata (Razmara et al, 2019), and allow the sharing of such hand-curated collections with the wider community. Third, since metadata can sometimes be an unreliable way to find relevant datasets, it will be important to design methods that search for related datasets based on their contents rather than their metadata, e.g.…”
Section: Discussionmentioning
confidence: 99%
“…Isoform expression profiles : To collect expression profiles of isoforms, we first retrieved RNA-seq experiments for different types of normal human tissues from the NCBI Sequence Read Archive (SRA) database ( 40 ), where corresponding accession numbers were obtained from the Human Protein Atlas (HPA) database ( 41 ) and the recount-brain project ( 42 ) (see Supplementary Table S1 for a list of RNA-seq experiments). Next, we applied the tool Kallisto ( 43 ) to obtain quantified isoform expression profiles in each experiment (measured in Transcripts Per Million or TPM).…”
Section: Methodsmentioning
confidence: 99%
“…The extraordinary large volume of user-generated sequencing data available in public databases is increasingly being utilized in research projects alongside novel experiments (Simon et al ., 2018; Razmara et al ., 2019; Lung et al ., 2020; Rajesh et al ., 2021; Hippen and Greene, 2021; Wartmann et al ., 2021; Kasmanas et al ., 2021; Huang et al ., 2021; Klie et al ., 2021; Booeshaghi et al ., 2022). Collation of metadata is crucial for such reuse of publicly available data since it can provide information about the samples assayed and can facilitate the acquisition of raw data.…”
Section: Introductionmentioning
confidence: 99%