2022
DOI: 10.1093/bioinformatics/btac492
|View full text |Cite
|
Sign up to set email alerts
|

Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments

Abstract: Motivation The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. Results As a solution, we propose Needle, a fast and space-efficient index which can be built f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 29 publications
1
4
0
Order By: Relevance
“…Means of k-mer counts best correlated with qPCR abundance and transcript-per-million (TPM) measured from RNA-seq reads by Kallisto [10] (Fig 2A,B, Fig S2), while sums of k-mer counts best correlated with raw RNA-seq counts (Fig 2C, Fig S2). Correlation coefficients (CC) with Kallisto counts were around 0.8, in line with previous reports [6]. We found that quantification accuracy could be substantially improved by masking query k-mers with multiple instances in the human genome (Methods).…”
Section: Resultssupporting
confidence: 89%
See 2 more Smart Citations
“…Means of k-mer counts best correlated with qPCR abundance and transcript-per-million (TPM) measured from RNA-seq reads by Kallisto [10] (Fig 2A,B, Fig S2), while sums of k-mer counts best correlated with raw RNA-seq counts (Fig 2C, Fig S2). Correlation coefficients (CC) with Kallisto counts were around 0.8, in line with previous reports [6]. We found that quantification accuracy could be substantially improved by masking query k-mers with multiple instances in the human genome (Methods).…”
Section: Resultssupporting
confidence: 89%
“…To determine the optimal counting scheme, we used the SEQC/MAPQC dataset in which the abundance of 1000 transcripts was evaluated in 16 samples both by qPCR and Illumina RNA-seq [9]. Means of k-mer counts best correlated with qPCR abundance and transcript-per-million (TPM) measured from RNA-seq reads by Kallisto [10] Correlation coefficients (CC) with Kallisto counts were around 0.8, in line with previous reports [6]. We found that quantification accuracy could be substantially improved by masking query k-mers with multiple instances in the human genome (Methods).…”
Section: Accuracy Of Rna Expression Measurementioning
confidence: 54%
See 1 more Smart Citation
“…Indexes capable of taking abundance into account are an ongoing major challenge, which has seen significant contributions in recent years, see for example [10, 7, 4]. The method we propose in this article is based solely on the presence/absence of k -mers; an interesting prospect for future work would be to take into account abundances – i.e.…”
Section: Discussionmentioning
confidence: 99%
“…Bloom filters. Sketching approaches such as sourmash [20], or Needle [9] typically suffer from high false negative rates when short sequences are queried, and are thus out of the scope of this work. Methods based on exact representations (e.g.…”
mentioning
confidence: 99%