The MapReduce programming model is introduced for bigdata processing, where the data nodes perform both data storing and computation. Thus, we need to understand different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. The core affinity defines mapping between a set of cores and a given task. The core affinity can be decided based on resource requirements of a task because this largely affects the efficiency of computation, memory, and I/O resource utilization. In this paper, we analyze the impact of core affinity on the file upload performance of Hadoop Distributed File System (HDFS). Our study can provide the insight into the process scheduling issues on big-data processing systems. We also suggest a framework for dynamic core affinity based on our observations and show that a preliminary implementation can improve the throughput more than 40% compared with default Linux system.
Summary
As the number of cores equipped in a single computing node is rapidly increasing, utilizing many cores for contemporary applications in an efficient manner is a challenging issue. We need to consider both parallelization and locality to fully exploit many cores for multifarious operations of emerging applications. In particular, big data applications perform computation and I/O intensive operations alternately. For instance, Apache Hadoop MapReduce assumes local persistent storage for each computing node. Thus, unlike traditional parallel programming models, the MapReduce framework performs not only networking but also storage I/O. In this study, we aim to improve the locality of network and storage I/O operations on many‐core systems by partitioning cores for I/O system calls and event handlers. In order to implement fine‐grained many‐core partitioning, we decouple the system call context from the user‐level process by suggesting message‐based system calls. The suggested design provides user‐level transparency and does not require any kernel‐level modifications. In addition, we propose a scheme that dynamically decides the core affinity of system calls and event handlers by considering locality, run‐time loads, and hardware architectures. The experimental results show that the proposed many‐core partitioning can improve the locality of network and storage I/O operations in an integrated manner for MapReduce applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.