2019
DOI: 10.24017/science.2019.1.2
|View full text |Cite
|
Sign up to set email alerts
|

Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java

Abstract: Nowadays with the technology revolution the term of big data is a phenomenon of the decade moreover, it has a significant impact on our applied science trends. Exploring well big data tool is a necessary demand presently. Hadoop is a good big data analyzing technology, but it is slow because the Job result among each phase must be stored before the following phase is started as well as to the replication delays. Apache Spark is another tool that developed and established to be the real model for analyzing big … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 14 publications
0
5
0
Order By: Relevance
“…Hadoop, Spark, NO-SQL, Sklearn and Weka libraries, Hive, Cloud, and Rapid Miner technologies are gaining popularity. These technologies are computer software tools for extracting, managing, and analyzing data from a massively complex and large data collection that traditional management tools would never be able to handle [29][30][31][32][33][34]. However, in such a setting, selecting among a variety of technologies may be time consuming and difficult.…”
Section: Big Data Technologiesmentioning
confidence: 99%
“…Hadoop, Spark, NO-SQL, Sklearn and Weka libraries, Hive, Cloud, and Rapid Miner technologies are gaining popularity. These technologies are computer software tools for extracting, managing, and analyzing data from a massively complex and large data collection that traditional management tools would never be able to handle [29][30][31][32][33][34]. However, in such a setting, selecting among a variety of technologies may be time consuming and difficult.…”
Section: Big Data Technologiesmentioning
confidence: 99%
“…However, multi-threaded lightweight processes can run on Spark inside Java virtual machine (JVM). Spark can upload and download the data from Apache Hadoop by accessing Hadoop distributed file system (HDFS) since it works on top of the existing Hadoop cluster [18]. The management of several operations is quite simple with Apache Spark by providing a data pipeline method.…”
Section: Alternating Least Squares With Sparkmentioning
confidence: 99%
“…However, Spark supports four programming environments which are Java, Scala, Python, and R [20], [21]. Using Scala of Spark increases the speed computation of the algorithms and completes them in less time as compared to Java furthermore, the favorites of Scala noticed in supervised ML algorithms such as regression and unsupervised ML algorithms like clustering [22].…”
Section: Apache Sparkmentioning
confidence: 99%