2023
DOI: 10.1002/cpe.7635
|View full text |Cite
|
Sign up to set email alerts
|

Performance comparison of Dask and Apache Spark on HPC systems for neuroimaging

Abstract: SummaryThe general increase in data size and data sharing motivates the adoption of Big Data strategies in several scientific disciplines. However, while several options are available, no particular guidelines exist for selecting a Big Data engine. In this paper, we compare the runtime performance of two popular Big Data engines with Python APIs, Apache Spark, and Dask, in processing neuroimaging pipelines. Our experiments use three synthetic neuroimaging applications to process the 606 GB BigBrain image and a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 17 publications
0
1
0
Order By: Relevance
“…However, adapting MPI-dependent models to Spark involves significant changes to exploit cloud computing's high-performance capabilities effectively [49][50][51]. Efforts to integrate Spark with existing HPC architectures or modify it for enhanced performance are ongoing, with research focusing on extending Spark's utility for complex, high-throughput computing tasks typically handled by MPI [52]. This transition highlights the necessity of adapting high-performance computing paradigms to fit hybrid cloud environments, ensuring efficient data handling and computation.…”
Section: 、Computational Processesmentioning
confidence: 99%
“…However, adapting MPI-dependent models to Spark involves significant changes to exploit cloud computing's high-performance capabilities effectively [49][50][51]. Efforts to integrate Spark with existing HPC architectures or modify it for enhanced performance are ongoing, with research focusing on extending Spark's utility for complex, high-throughput computing tasks typically handled by MPI [52]. This transition highlights the necessity of adapting high-performance computing paradigms to fit hybrid cloud environments, ensuring efficient data handling and computation.…”
Section: 、Computational Processesmentioning
confidence: 99%