2019 IEEE International Conference on Big Data, Cloud Computing, Data Science &Amp; Engineering (BCD) 2019
DOI: 10.1109/bcd.2019.8885148
|View full text |Cite
|
Sign up to set email alerts
|

Survey of Data Locality in Apache Hadoop

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 56 publications
0
6
0
Order By: Relevance
“…This study has developed a novel method to extend the locality to all stages of MR, and minimize the overhead of the shuffle. The authors call it deep data locality (DDL) as opposed to the traditional map-only locality that the authors call shallow data locality (SDL) [25]. Furthermore, this study investigated two different types of DDL methods.…”
Section: Proposed Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…This study has developed a novel method to extend the locality to all stages of MR, and minimize the overhead of the shuffle. The authors call it deep data locality (DDL) as opposed to the traditional map-only locality that the authors call shallow data locality (SDL) [25]. Furthermore, this study investigated two different types of DDL methods.…”
Section: Proposed Approachmentioning
confidence: 99%
“…Furthermore, this study investigated two different types of DDL methods. First, the block-based DDL reduces the RLM and reduces the data transfer in multicore processors [27]. In multicore processors, all the cores share the same local disk, therefore there is no data transfer overhead between the cores.…”
Section: Proposed Approachmentioning
confidence: 99%
See 1 more Smart Citation
“…In MapReduce frameworks (e.g. Hadoop), most of the existing research focused on optimizing data locality from the aspect of task scheduling [17], [21], [9]. In particular, Guo et al [16] assign map task to maximize the number of node-local tasks, and Shang et al [25] dispatch reduce task to minimize the network resources consumption.…”
Section: Introductionmentioning
confidence: 99%
“…The best known among such platforms is Apache Hadoop [6]. Hadoop is a set of software utilities [7], the core of which is a distributed file system that stores data in certain formats, and a data processor that implements the MapReduce processing model [8].…”
Section: Introductionmentioning
confidence: 99%