2023
DOI: 10.1093/bioinformatics/btac836
|View full text |Cite
|
Sign up to set email alerts
|

Efficient querying of genomic reference databases with gget

Abstract: Motivation A recurring challenge in interpreting genomic data is the assessment of results in the context of existing reference databases. With the increasing number of command line and Python users, there is a need for tools implementing automated, easy programmatic access to curated reference information stored in a diverse collection of large, public genomic databases. Results gget is a free and open-source command line to… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 22 publications
(22 citation statements)
references
References 36 publications
0
22
0
Order By: Relevance
“…During the generation of the reference index with ‘kb ref’, the D-list option may be used to mask host genomic and/or transcriptomic sequences, as further discussed in this manuscript. Here, human genomic sequences fetched from Ensembl using gget 45 are masked using the D-list. The reference index only needs to be generated once, and precomputed PalmDB reference indices for human and mouse hosts are available here: https://tinyurl.com/aaxyy8v8.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…During the generation of the reference index with ‘kb ref’, the D-list option may be used to mask host genomic and/or transcriptomic sequences, as further discussed in this manuscript. Here, human genomic sequences fetched from Ensembl using gget 45 are masked using the D-list. The reference index only needs to be generated once, and precomputed PalmDB reference indices for human and mouse hosts are available here: https://tinyurl.com/aaxyy8v8.…”
Section: Resultsmentioning
confidence: 99%
“…Cluster 'Undefined 1' was omitted because it only contained 12 cells. Gene names and descriptions for Ensembl IDs without annotations were obtained using gget 45 .…”
Section: Macaque Cell Clustering and Cell Type Assignmentmentioning
confidence: 99%
“…While ffq facilitates downloading of data from numerous genomic databases, the results retrieved are only useful to the extent that the metadata uploaded is meaningful and complete. Meaningful and complete user-generated data underlies the curation of genomic references essential for comparative genomic data analysis ( Luebbert and Pachter, 2023 ). Unfortunately, there is little to no standardization of user-uploaded sequencing metadata ( Rajesh et al , 2021 ; Wang et al , 2019 ), and metadata descriptions can become exceedingly complex for current multiplexed experiments, where different assays with distinct data types are combined.…”
Section: Discussionmentioning
confidence: 99%
“…All datasets analyzed in this paper from previous publications are publicly available. The CellxGene blood atlas dataset were downloaded from CellxGene census (https://chanzuckerberg.github.io/cellxgene-census/) via gget [7]. The drug perturbation dataset was downloaded from Kaggle (https://www.kaggle.com/ competitions/open-problems-single-cell-perturbations).…”
Section: Data Availabilitymentioning
confidence: 99%