2023
DOI: 10.1093/bioinformatics/btac667
|View full text |Cite
|
Sign up to set email alerts
|

Metadata retrieval from sequence databases with ffq

Abstract: Motivation Several genomic databases host data and metadata for an ever-growing collection of sequence datasets. While these databases have a shared hierarchical structure, there are no tools specifically designed to leverage it for metadata extraction. Results We present a command-line tool, called ffq, for querying user-generated data and metadata from sequence databases. Given an accession or a paper’s DOI, ffq efficiently… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3

Relationship

2
7

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 31 publications
0
11
0
Order By: Relevance
“…Our open-source Python and command-line program gget enables efficient and easy programmatic access to information stored in a diverse collection of large, public genomic reference databases. gget works alongside existing tools that fetch user-generated sequencing data (Gálvez-Merchán et al, 2022) to replace ineffective, error-prone manual web access during genomic data analysis. While the gget modules were motivated by experience with tedious single-cell RNA-seq data analysis tasks (Supplementary Figure 1), we anticipate their utility for a wide range of bioinformatics tasks.…”
Section: Discussionmentioning
confidence: 99%
“…Our open-source Python and command-line program gget enables efficient and easy programmatic access to information stored in a diverse collection of large, public genomic reference databases. gget works alongside existing tools that fetch user-generated sequencing data (Gálvez-Merchán et al, 2022) to replace ineffective, error-prone manual web access during genomic data analysis. While the gget modules were motivated by experience with tedious single-cell RNA-seq data analysis tasks (Supplementary Figure 1), we anticipate their utility for a wide range of bioinformatics tasks.…”
Section: Discussionmentioning
confidence: 99%
“…For RNA-seq data, we downloaded metaSRA [73] version 1.8 to identify samples associated with potential age and sex information. We then used ffq [74] to fetch sample accession data from the Sequence Read Archive (SRA) [41] to match the sample identifiers used in metaSRA to the run identifiers used in refine.bio. We manually checked these labels as well by reading sample descriptions obtained from SRA.…”
Section: Methodsmentioning
confidence: 99%
“…The status quo is ad hoc ; there are a variety of different distribution mechanisms, and none is particularly machine-friendly. Much genomic metadata is deposited onto data-oriented databases, such as GEO or dbGap, where metadata is notoriously difficult to process, leading to a variety of dedicated tools for that purpose ( Davis and Meltzer, 2007 ; Chen et al, 2019 ; Gumienny, 2019 ; Choudhary, 2019 ; Ewels et al, 2020 ; Cannizzaro et al, 2021 ; Gálvez-Merchán et al, 2022 ; Garcia et al, 2022 ; Khoroshevskyi et al, 2023 ). Distribution is sometimes intentionally restricted on the basis of privacy.…”
Section: Challenges To Sharing Genomic Metadatamentioning
confidence: 99%