24th High Performance Computing Symposium 2016
DOI: 10.22360/springsim.2016.hpc.031
|View full text |Cite
|
Sign up to set email alerts
|

SPIDAL Java: High Performance Data Analytics with Java and MPI on Large Multicore HPC Clusters

Abstract: Within the last few years, there have been significant contributions to Java-based big data frameworks and libraries such as Apache Hadoop, Spark, and Storm. While these systems are rich in interoperability and features, developing high performance big data analytic applications is challenging. Also, the study of performance characteristics and high performance optimizations is lacking in the literature for these applications. By contrast, these features are well documented in the High Performance Computing (H… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 21 publications
0
2
0
Order By: Relevance
“…There are synergies between HPC and big data systems, and authors 29,30 among others 31 have expressed the need to enhance these systems by taking ideas from each other. In previous work 32,33 we have identified the general implications of threads and processes, cache, memory management in NUMA 34 , as well as multi-core settings for machine learning algorithms with MPI. DataMPI 35 uses MPI to build Hadoop-like systems while 36 uses MPI communications in Spark for better performance.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There are synergies between HPC and big data systems, and authors 29,30 among others 31 have expressed the need to enhance these systems by taking ideas from each other. In previous work 32,33 we have identified the general implications of threads and processes, cache, memory management in NUMA 34 , as well as multi-core settings for machine learning algorithms with MPI. DataMPI 35 uses MPI to build Hadoop-like systems while 36 uses MPI communications in Spark for better performance.…”
Section: Related Workmentioning
confidence: 99%
“…An MPI programmer has to consider low-level details such as I/O, memory hierarchy and efficient execution of threads to write a parallel application that scales to large numbers of nodes. With the increasing availability of multi-core and many-core systems, the burden on the programmer to get the best available performance has increased dramatically 32,33 . Asynchronous many task systems are such as HPX 42 , Legion 41 , DAGuE 38 are developed to hide some of these complexities.…”
Section: Mpi For Big Datamentioning
confidence: 99%