2015
DOI: 10.1007/s11227-015-1447-3
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing the Hadoop MapReduce Framework with high-performance storage devices

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(8 citation statements)
references
References 7 publications
0
8
0
Order By: Relevance
“…Hadoop is a multi-tasking system which can process multiple data sets for multiple jobs in a multi-user environment across multiple machines at the same time [42] [43]. Each MapReduce job consists of multiple processes submitting I/Os concurrently for Map, Shuffle and Reduce stages, each having skewed I/O requirements [44] [45] [46]. Hadoop Distributed File System (HDFS) uses a block-structured file system to deliver reliable storage [13] [43].YARN (Yet Another Resource Negotiator) is used for per-application based resource negotiating agent and is a centralized platform to ensure consistency and data manageability.…”
Section: Hadoop Ecosystem and Mapreducementioning
confidence: 99%
“…Hadoop is a multi-tasking system which can process multiple data sets for multiple jobs in a multi-user environment across multiple machines at the same time [42] [43]. Each MapReduce job consists of multiple processes submitting I/Os concurrently for Map, Shuffle and Reduce stages, each having skewed I/O requirements [44] [45] [46]. Hadoop Distributed File System (HDFS) uses a block-structured file system to deliver reliable storage [13] [43].YARN (Yet Another Resource Negotiator) is used for per-application based resource negotiating agent and is a centralized platform to ensure consistency and data manageability.…”
Section: Hadoop Ecosystem and Mapreducementioning
confidence: 99%
“…Due to the physical limitations of HDDs, there have been recent efforts [1,2,11,[18][19][20][21] in incorporating flash based storage such as SSDs in data centers. The high-speed, non-volatile storage devices like SSDs typically referred to as SCMs access data via electrical signals, as opposed to physical disk arm movement in the case of HDDs [3,9].…”
Section: Secondary Storage (Block Device) Characteristicsmentioning
confidence: 99%
“…As deletion or erase happens in the granularity of blocks, therefore a single page update requires a complete block erase and out-of-place write. These result into unwanted phenomenons such as write amplification (wear-leveling) and garbage collection (faulty block management) [9,19,24]. These activities consume a lot of CPU time as well as the SSD controller and the File System have additional jobs such as book-keeping than simple data access.…”
Section: Secondary Storage (Block Device) Characteristicsmentioning
confidence: 99%
“…In this paper, a large data platform is built to collect and store real-time dynamic steelmaking production data, and the Hadoop distributed file system (HDFS) is used to realize the virtual resource storage of large data in the steelmaking process. The CNN convolution neural network algorithm [3] is used to predict the composition of steel slag; the prediction is made to provide a basic basis for the application of the steel slag resource application recommendation system. Based on this, a resource utilization system of steel slag based on the background of big data is established, which will eventually provide new ideas for steel slag treatment and application in steel companies.…”
Section: Introductionmentioning
confidence: 99%