2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID) 2020
DOI: 10.1109/ccgrid49817.2020.00-39
|View full text |Cite
|
Sign up to set email alerts
|

SparkLeBLAST: Scalable Parallelization of BLAST Sequence Alignment Using Spark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
1
1
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…BLAST implements a highly optimized memory management layer based on memorymapped I/O to read the sequence database. However, recent studies, including [21], have shown that paging significantly degrades BLAST's performance when the database does not fit in memory. While distributing the sequence database across multiple nodes [21], [22] circumvents paging, it introduces high network overhead for processing significantly large output.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…BLAST implements a highly optimized memory management layer based on memorymapped I/O to read the sequence database. However, recent studies, including [21], have shown that paging significantly degrades BLAST's performance when the database does not fit in memory. While distributing the sequence database across multiple nodes [21], [22] circumvents paging, it introduces high network overhead for processing significantly large output.…”
Section: Discussionmentioning
confidence: 99%
“…However, recent studies, including [21], have shown that paging significantly degrades BLAST's performance when the database does not fit in memory. While distributing the sequence database across multiple nodes [21], [22] circumvents paging, it introduces high network overhead for processing significantly large output. Alternatively, we explore leveraging UMap with optimized parameters to mitigate paging overhead.…”
Section: Discussionmentioning
confidence: 99%
“…Given expanding amount of data, providing fast and biologically valuable sequence alignment tools via high-performance computing (HPC) and algorithmic innovations has been a highly active area of bioinformatics research, particularly in the context of rapidly expanding databases. For example, several sequence alignment programs have relied on contributing algorithmic improvements (e.g., HMMER [4], DIAMOND [5], CaBLAST [6]) while others have focused on improving parallelization to take advantage of emerging high-performance computing (HPC) platforms and programming paradigms (e.g., cuBLASTP [7], muBLASTP [8], mpiBLAST [9], SparkBLAST [10], and SparkLeBLAST [11]). Both DIAMOND [5] and CaBLAST [6] improve the execution time of sequence alignment by compressing the sequence database.…”
Section: Introductionmentioning
confidence: 99%