Data Redistribution Algorithms for Heterogeneous Processor Rings

Renard, Hélène; Robert, Yves; Vivien, Frédéric

doi:10.1177/1094342006061887

Cited by 8 publications

(8 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, the overhead of transferring unprocessed data from slow nodes to fast ones is high based on the large volume of data to be moved. To solve this limitation and enhance the MapReduce performance in a cluster environment, we extended the data redistribution algorithm, which aimed to partition a large data set into small fragments being distributed across multiple nodes in a cluster that arises due to dynamic data insertions and deletions. Approach 2: The Number of Map and Reduce Tasks for a Job May Cause Performance Problems in Map Reduce . The dependence among reduce and map tasks can slow down the performance of clusters by an imbalanced workload, while some nodes are underutilized and others are overloaded.…”

Section: Problem Statementmentioning

confidence: 99%

“…However, MapReduce consists of different interleaving stages, each requiring different I/O workloads and patterns. A novel approach was taken by Blanas et al for adaptively tuning the disk pairs’ schedulers in both the hypervisor and the virtual machines during the execution of a single MapReduce job and compare the performance improvement on a sort benchmark with Hadoop to achieve the shortest execution time of MapReduce using various parameters proposed a cost model to predict the total execution time of jobs and their optimal assignments, and Scheduling Algorithm MapReduce (SAMR) proposed a dynamic task calculate process that adapts to the continuously varying environment automatically. One of the most important requirements for effective performance tuning is to discover those important parameters that are related to tuning a job for all features.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

Premchaiswadi

Romsaiyud

2012

Int. J. Intell. Syst.

View full text Add to dashboard Cite

Data‐intensive applications process large volumes of data using a parallel processing method. MapReduce is a programming model designed for data‐intensive applications for massive data sets and an execution framework for large‐scale data processing on clusters of commodity servers. While fault tolerance, easy programming structure, and high scalability are considered strong points of MapReduce; however its configuration parameters must be fine‐tuned to the specific deployment, which makes it more complex in configuration and performance. This paper explains tuning of the Hadoop configuration parameters, which directly affect MapReduce's job workflow performance under various conditions to achieve maximum performance. On the basis of the empirical data we collected, it became apparent that three main methodologies can affect the execution time of MapReduce running on cluster systems. Therefore, in this paper, we present a model that consists of three main modules: (1) Extending a data redistribution technique in order to find the high‐performance nodes, (2) Utilizing the number of map/reduce slots in order to make it more efficient in terms of execution time, and (3) Developing a new hybrid routing schedule shuffle phase in order to define the scheduler task while memory management level is reduced.

show abstract

Section: Problem Statementmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

Premchaiswadi

Romsaiyud

2012

Int. J. Intell. Syst.

View full text Add to dashboard Cite

show abstract

“…Lots of important projects [6,7] involves splitting load into identical and independent tasks. Second, some researchers design algorithms for certain particular cases [3], or restrict the platform architecture [4]. Finally, data redistribution is designed to equilibrate finishing times and load.…”

Section: Related Workmentioning

confidence: 99%

“…The existing algorithms fail to consider either unbalanced load on workers, or the computation phase in optimizations. In addition, some of existing algorithms suffer from restriction to the platforms of certain type, for example, Moore Based Binary-Search Algorithm (MBBSA) on star platforms [2] and redistribution algorithms on ring platforms [3]. Despite the existence of redistribution algorithms on tree platforms, communication and computation time were neglected, for instance, M.Y.Wu [4].…”

Section: Introductionmentioning

confidence: 99%

Scheduling and Data Redistribution Strategies on Tree Platforms

Zhang

Qiao

Liu

et al. 2009

2009 International Joint Conference on Computational Sciences and Optimization

View full text Add to dashboard Cite

Effective task scheduling holds the key to achieving high performance grid application. Aimed at the problem of scheduling and data redistribution on tree platforms, this paper assumes that all tasks are situated at the participating workers. The attempt to perform the processing of tasks in a given makespan results in the necessity of tasks redistribution. The paper proposes a linear program model and mechanism on tree platforms on which the redistribution mechanism is proved, and features two heuristic algorithms for scheduling and tasks redistribution. One is MBBSA on the tree platform (MBBSA-TP) that involves directly applying MBBSA on tree platforms. Another is partially optimal scheduling and redistributing algorithm (POSRA) that involves indirectly using MBBSA on tree platforms. The paper also analyzes the complexity of the two algorithms. A large number of simulation experiments for algorithms demonstrate that POSRA presents advantage over MBBSA-TP.

show abstract

“…Unfortunately already simple redistribution problems are NP complete [8]. For this reason, optimal algorithms can be designed only for particular cases, as it is done in [13]. In their research, the authors restrict the platform architecture to ring topologies, both uni-directional and bidirectional.…”

mentioning

confidence: 99%

Scheduling and Data Redistribution Strategies on Star Platforms

Marchal

Rehn

Robert

et al. 2007

15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07)

Self Cite

View full text Add to dashboard Cite

In this work we are interested in the problem of scheduling and redistributing data on master-slave platforms. We consider the case were the workers possess initial loads, some of which having to be redistributed in order to balance their completion times.We assume that the data consists of independent and identical tasks. As the general case is NP-complete in the strong sense, we propose three heuristics. Simulations consolidate the theoretical results.

show abstract

Data Redistribution Algorithms for Heterogeneous Processor Rings

Cited by 8 publications

References 37 publications

Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

Scheduling and Data Redistribution Strategies on Tree Platforms

Scheduling and Data Redistribution Strategies on Star Platforms

Contact Info

Product

Resources

About