Accelerating Spark RDD Operations with Local and Remote GPU Devices

Yasuhiro, Ohno; Morishima, Shin; Matsutani, Hiroki

doi:10.1109/icpads.2016.0108

Cited by 18 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The parallel batch processing technique also demonstrates the efficiency of the proposed technique for 8 GB of data, as it results in performance improvements ranging from 1.52𝑥 (Hilbert) to 2.34𝑥 (Montecarlo). The results shown in Figure 8 showcase that the proposed batch processing techniques enable applications to utilize volumes of data that exceed the physical memory capacity of the hardware device; a capability can be extremely beneficial, especially for Java-based Big Data frameworks such as Apache Spark [25,28] and Flink [3,5].…”

Section: Going Beyond the Gpu Memory Capacitymentioning

confidence: 99%

Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processing

Blanaru

Stratikopoulos

Fumero

et al. 2022

Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

View full text Add to dashboard Cite

During the last decade, managed runtime systems have been constantly evolving to become capable of exploiting underlying hardware accelerators, such as GPUs and FPGAs. Regardless of the programming language and their corresponding runtime systems, the majority of the work has been focusing on the compiler front trying to tackle the challenging task of how to enable just-in-time compilation and execution of arbitrary code segments on various accelerators. Besides this challenging task, another important aspect that defines both functional correctness and performance of managed runtime systems is that of automatic memory management. Although automatic memory management improves productivity by abstracting away memory allocation and maintenance, it hinders the capability of using specific memory regions, such as pinned memory, in order to perform data transfer times between the CPU and hardware accelerators.In this paper, we introduce and evaluate a series of memory optimizations specifically tailored for heterogeneous managed runtime systems. In particular, we propose: (i) transparent and automatic "parallel batch processing" for overlapping data transfers and computation between the host and hardware accelerators in order to enable pipeline parallelism, and (ii) "off-heap pinned memory" in combination with parallel batch processing in order to increase the performance of data transfers without posing any on-heap overheads. These two techniques have been

show abstract

Section: Going Beyond the Gpu Memory Capacitymentioning

confidence: 99%

Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processing

Blanaru

Stratikopoulos

Fumero

et al. 2022

Proceedings of the 18th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments

View full text Add to dashboard Cite

show abstract

“…In addition to the traditional Spark architecture discussed so far, some work has been done to support additional hardware acceleration. One paper [42] proposed a Spark modification to support the use of GPUs by invoking CUDA kernels for computationally intensive tasks. Several caching methods were investigated to reduce network overhead, and experiments using both local and remote GPUs showed significant speed improvements.…”

Section: Mapreduce Architecture With Sparkmentioning

confidence: 99%

Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions

Sleeman¹,

Krawczyk²

2021

Preprint

View full text Add to dashboard Cite

Learning from imbalanced data is among the most challenging areas in contemporary machine learning. This becomes even more difficult when considered the context of big data that calls for dedicated architectures capable of high-performance processing. Apache Spark is a highly efficient and popular architecture, but it poses specific challenges for algorithms to be implemented for it. While oversampling algorithms are an effective way for handling class imbalance, they have not been designed for distributed environments. In this paper, we propose a holistic look on oversampling algorithms for imbalanced big data. We discuss the taxonomy of oversampling algorithms and their mechanisms used to handle skewed class distributions. We introduce a Spark library with 14 state-of-the-art oversampling algorithms implemented and evaluate their efficacy via extensive experimental study. Using binary and multi-class massive data sets, we analyze the effectiveness of oversampling algorithms and their relationships with different types of classifiers. We evaluate the trade-off between accuracy and time complexity of oversampling algorithms, as well as their scalability when increasing the size of data. This allows us to gain insight into the usefulness of specific components of oversampling algorithms for big data, as well as formulate guidelines and recommendations for designing future resampling approaches for massive imbalanced data. Our library can be downloaded from https://github.com/fsleeman/spark-class-balancing.git.

show abstract

“…We obtained 19 papers 20‐22,32,75‐89 . These papers were published in five distinct journals publications, 13 conferences/workshops and one PhD thesis.…”

Section: Related Workmentioning

confidence: 99%

“…Regarding Apache Spark, it is known that the intermediate datasets, the Resilient Distributed Datasets (RDDs), are stored in the distributed memory over the set machines under consideration. To speed up compute‐intensive operations, the work in Reference 82 focuses on GPU devices, and also a modified Spark framework to invoke CUDA kernels for these compute‐intensive operations was deployed. The implementation transformed RDDs into array structures and transferred to the GPU devices, which actually can be remote to the host machine since the number of local GPU devices is limited.…”

Section: Related Workmentioning

confidence: 99%

“…They carried out a series of experiments, showing that the Spark with GPU version outperforms the original one, mainly due to the caching policy that minimizes the data transfer amount for remote GPU. Although 82 seems to be a promising solution, it is only focused on GPU devices. It is a fact that GPUs' usage is increasing over the last years; however, many scientific applications are still not adapted to be executed in GPUs, thus reducing the applicability of this solution in a scientific workflow.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Towards optimizing the execution of spark scientific workflows using machine learning‐based parameter tuning

Oliveira

Porto

Boeres

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Summary In the last few years, Apache Spark has become a de facto the standard framework for big data systems on both industry and academy projects. Spark is used to execute compute‐ and data‐intensive workflows in distinct areas like biology and astronomy. Although Spark is an easy‐to‐install framework, it has more than one hundred parameters to be set, besides domain‐specific parameters of each workflow. In this way, to execute Spark‐based workflows efficiently, the user has to fine‐tune a myriad of Spark and workflow parameters (eg, partitioning strategy, the average size of a DNA sequence, etc.). This configuration task cannot be manually performed in a trial‐and‐error manner since it is tedious and error‐prone. This article proposes an approach that focuses on generating interpretable predictive machine learning models (ie, decision trees), and then extract useful rules (ie, patterns) from these models that can be applied to configure parameters of future executions of the workflow and Spark for nonexperts users. In the experiments presented in this article, the proposed parameter configuration approach led to better performance in processing Spark workflows. Finally, the approach introduced here reduced the number of parameters to be configured by identifying the most relevant domain‐specific ones related to the workflow performance in the predictive model.

show abstract

Accelerating Spark RDD Operations with Local and Remote GPU Devices

Cited by 18 publications

References 7 publications

Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processing

Enabling pipeline parallelism in heterogeneous managed runtime environments via batch processing

Imbalanced Big Data Oversampling: Taxonomy, Algorithms, Software, Guidelines and Future Directions

Towards optimizing the execution of spark scientific workflows using machine learning‐based parameter tuning

Contact Info

Product

Resources

About