2018
DOI: 10.1186/s40537-018-0114-y
|View full text |Cite
|
Sign up to set email alerts
|

StreamAligner: a streaming based sequence aligner on Apache Spark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 31 publications
0
5
0
Order By: Relevance
“…Various DSPFs have been proposed for special purposes, such as multimedia streaming framework [12], P2P live framework [13], and fraud detection framework [14]. To process genomics data in a fast and efficient way, a novel sequence aligner was implemented on Apache Spark [15]. e multiquery component of Apache Flink was optimized for big data [16].…”
Section: Review On Streaming Frameworkmentioning
confidence: 99%
“…Various DSPFs have been proposed for special purposes, such as multimedia streaming framework [12], P2P live framework [13], and fraud detection framework [14]. To process genomics data in a fast and efficient way, a novel sequence aligner was implemented on Apache Spark [15]. e multiquery component of Apache Flink was optimized for big data [16].…”
Section: Review On Streaming Frameworkmentioning
confidence: 99%
“…For instance, we can find in the literature several solutions for estimating the number of k-mers in genomic datasets, such as KmerStream [ 28 ], ntCard [ 29 ], KmerEstimate [ 30 ] and Khmer [ 31 ]. Other tools are focused on sequence alignment (StreamAligner [ 32 ], StreamBWA [ 33 ]), metagenomics profiling (Flint [ 34 ]) and DNA analysis (SparkGA2 [ 35 ]). These latter examples are all implemented on top of the legacy Spark Streaming API instead of using Spark Structured Streaming as in our approach.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore the raw data (containing reads of any length) produced by a sequencing machine can be considered a static data set. Additionally, because the reads are generated individually, it would be possible to design an indexing algorithm that is built incrementally in real time [35]. Once built, a set of indexed reads can be rapidly queried for sequences of interest, such as structural variations, pathogenic variants, or viruses.…”
Section: Aiding Computationmentioning
confidence: 99%
“…However, if we take a broader look at the data sets involved in WGS analysis, we can see that a read set generated for a genome is unchanged during analysis, with the exception of preprocessing and error correction. Reads are reported sequentially and, thus, it is entirely possible to design an indexing algorithm that is built incrementally in real time as the reads are outputted by the sequencing machine [35].…”
Section: Box 2 Indexing a Set Of Readsmentioning
confidence: 99%