SPIDAL Java: High Performance Data Analytics with Java and MPI on Large Multicore HPC Clusters

Ekanayake, Saliya; Kamburugamuve, Supun; Fox, Geoffrey

doi:10.22360/springsim.2016.hpc.031

Cited by 5 publications

(2 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are synergies between HPC and big data systems, and authors 29,30 among others 31 have expressed the need to enhance these systems by taking ideas from each other. In previous work 32,33 we have identified the general implications of threads and processes, cache, memory management in NUMA 34 , as well as multi-core settings for machine learning algorithms with MPI. DataMPI 35 uses MPI to build Hadoop-like systems while 36 uses MPI communications in Spark for better performance.…”

Section: Related Workmentioning

confidence: 99%

“…An MPI programmer has to consider low-level details such as I/O, memory hierarchy and efficient execution of threads to write a parallel application that scales to large numbers of nodes. With the increasing availability of multi-core and many-core systems, the burden on the programmer to get the best available performance has increased dramatically 32,33 . Asynchronous many task systems are such as HPX 42 , Legion 41 , DAGuE 38 are developed to hide some of these complexities.…”

Section: Mpi For Big Datamentioning

confidence: 99%

See 1 more Smart Citation

Twister2: Design of a big data toolkit

Kamburugamuve

Govindarajan

Wickramasinghe

et al. 2019

Concurrency and Computation

View full text Add to dashboard Cite

Data-driven applications are essential to handle the ever-increasing volume, velocity, and veracity of data generated by sources such as the Web and Internet of Things (IoT) devices. Simultaneously, an event-driven computational paradigm is emerging as the core of modern systems designed for database queries, data analytics, and on-demand applications. Modern big data processing runtimes and asynchronous many task (AMT) systems from high performance computing (HPC) community have adopted dataflow event-driven model. The services are increasingly moving to an event-driven model in the form of Function as a Service (FaaS) to compose services. An event-driven runtime designed for data processing consists of well-understood components such as communication, scheduling, and fault tolerance. Different design choices adopted by these components determine the type of applications a system can support efficiently. We find that modern systems are limited to specific sets of applications because they have been designed with fixed choices that cannot be changed easily. In this paper, we present a loosely coupled component-based design of a big data toolkit where each component can have different implementations to support various applications. Such a polymorphic design would allow services and data analytics to be integrated seamlessly and expand from edge to cloud to HPC environments.

show abstract