Implementing Apache Spark jobs execution and Apache Spark cluster creation for Openstack Sahara

Aleksiyants, A; Borisenko, Oleg; Турдаков, Денис; Sher, Arseny; Кузнецов, Сергей

doi:10.15514/ispras-2015-27(5)-3

Cited by 4 publications

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We show a specific example 'Word count' which is one of the most common data analytics in table II. The system proceeds the analytics' tasks related to map procedure with 'map' and 'flatMap' API supported by the framework and reduce procedure with 'reduceByKey' and 'collect' API [16]. # Save and load model clusters.save(sc,"target/org/apache/spark/PythonKMeans Example/KMeansModel") sameModel = KMeansModel.load(sc, "target/org/apache/spark/ PythonKMeansExample/KMeansModel") Table III shows an example 'K-means' which is more complex than 'Word count'.…”

Section: ) Execution Procedures Of Big Data Analyticsmentioning

confidence: 99%

Optimizing Parallelism of Big Data Analytics at Distributed Computing System

Myung¹,

Yu²,

Lee³

2017

International Journal on Advanced Science, Engineering and Information Technology

View full text Add to dashboard Cite

Since advent of information revolution, there have been a lot of interest at big data analytics as well as big data. In the big data analytics, it is essential that not only extracting valuable information from the big data but also processing the data rapidly. Therefore, the distributed computing systems which process the analytics concurrently with parallel programming model based distributed processing framework as well as provide data analytics related libraries get attention of researchers. Several big data analytics programming models are studied that implemented for processing and generating huge data sets. However, developing the big data analytics in the distributed computing systems with utilizing parallel processing framework needs expertise in each area. In this paper, we demonstrate there is huge gap among usages of processing units if the big data analytics are naively executed at the distributed system. And we also prove that applying proper parallelism of those methods results in 1.5 to 3.3 times improvement of execution time compared to default parallelism.

show abstract

Section: ) Execution Procedures Of Big Data Analyticsmentioning

confidence: 99%

Optimizing Parallelism of Big Data Analytics at Distributed Computing System

Myung¹,

Yu²,

Lee³

2017

International Journal on Advanced Science, Engineering and Information Technology

View full text Add to dashboard Cite

show abstract

Information technology for the deployment of the OpenStack cloud environment

Shevchenko

Chengar

Kokodey

2020

IOP Conf. Ser.: Mater. Sci. Eng.

View full text Add to dashboard Cite

This article discusses the information technology of implementing and adapting cloud services for a research laboratory using the OpenStack open environment as an example. The structure of the cloud computing environment is developed. An example of the implementation of the installation and configuration of client and server applications using three computers is given. The proposed technology will allow the user to deploy and configure the OpenStack environment with basic services for installing virtual machines with an internal network and a set of external addresses.

show abstract