Task-based programming in COMPSs to converge from HPC to big data

Conejero, Javier; Corella, Sandra; Badía, Rosa M.; Labarta, Jesús

doi:10.1177/1094342017701278

Cited by 29 publications

(18 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A recent work [34] compared COMPSs performance in Java applications to Apache Spark, using a cluster architecture normally associated with HPC applications (e.g., low-latency networks and shared network disks). In this work, our integration allow us to take COMPSs into a cluster usually adopted in Data Science scenarios, with only traditional networking hardware and with disks distributed among the cluster nodes.…”

Section: Related Workmentioning

confidence: 99%

“…As previously mentioned in Section 2, a recent work [34] compared COMPSs performance in Java applications to Apache Spark, using a cluster architecture normally associated with HPC applications. Since our integration allows COMPSs to better interface with a cluster architecture more frequently found in Data Science scenarios, in this study we present a comparison of the two systems under those conditions, Table 4 presents a performance comparison between COMPSs and Spark using the applications Grep, Wordcount, KMeans and KNN.…”

Section: Spark Versus Compssmentioning

confidence: 99%

See 1 more Smart Citation

Upgrading a high performance computing environment for massive data processing

Ponce

Santos

Meira

et al. 2019

J Internet Serv Appl

View full text Add to dashboard Cite

High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs's use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Spark Versus Compssmentioning

confidence: 99%

Upgrading a high performance computing environment for massive data processing

Ponce

Santos

Meira

et al. 2019

J Internet Serv Appl

View full text Add to dashboard Cite

show abstract

“…COMPSs is used in production at MareNostrum supercomputer and has been used to implement different real world applications, specially in the area of BioInformatics and Computational Genomics [10] [13] [17], Big Data analytics [29] and as building block for several scientific cyber-infrastructures [45] [22]. Other examples of applications developed with COMPSs can be found in [7].…”

Section: Compss Overviewmentioning

confidence: 99%

Transparent Orchestration of Task-based Parallel Applications in Containers Platforms

et al. 2018

View full text Add to dashboard Cite

This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a usertransparent way. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, which provides a straightforward way to develop task-based parallel applications from sequential codes, and containers management platforms that ease the deployment of applications in computing environments (as Docker, Mesos or Singularity). This framework provides scientists and developers with an easy way to implement parallel distributed applications and deploy them in a one-click fashion. We have built a prototype which integrates COMPSs with different containers engines in different scenarios: i) a Docker cluster, ii) a Mesos cluster, and iii) Singularity in an HPC cluster. We have evaluated the overhead in the building phase, deployment and execution of two benchmark applications compared to a Cloud testbed based on KVM and OpenStack and to the usage of bare metal nodes. We have observed an important gain in comparison to cloud environments during the building and deployment phases. This enables better adaptation of resources with respect to the computational load. In contrast, we detected an extra overhead during the execution, which is mainly due to the multi-host Docker networking.

show abstract

“…PySpark is a binding to the widely extended framework Spark [17]. A previous paper compares several Big Data algorithms using the native version of both COMPSs 1 and Spark runtimes [18], showing that COMPSs is able to get better or competitive results in comparison to Spark.…”

Section: Introductionmentioning

confidence: 99%

Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs

Amela

Ramon-Cortes

Ejarque

et al. 2018

Oil Gas Sci. Technol. – Rev. IFP Energies nouvelles

Self Cite

View full text Add to dashboard Cite

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.

show abstract

Task-based programming in COMPSs to converge from HPC to big data

Cited by 29 publications

References 12 publications

Upgrading a high performance computing environment for massive data processing

Upgrading a high performance computing environment for massive data processing

Transparent Orchestration of Task-based Parallel Applications in Containers Platforms

Executing linear algebra kernels in heterogeneous distributed infrastructures with PyCOMPSs

Contact Info

Product

Resources

About