2018
DOI: 10.1109/access.2018.2859826
|View full text |Cite
|
Sign up to set email alerts
|

BigRoots: An Effective Approach for Root-Cause Analysis of Stragglers in Big Data System

Abstract: Stragglers are commonly believed to have a great impact on the performance of big data system. However, the reason to cause straggler is complicated. Previous works mostly focus on straggler detection, schedule level optimization and coarse-grained cause analysis. These methods cannot provide valuable insights to help users optimize their programs. In this paper, we propose BigRoots, a general method incorporating both framework and system features for root-cause analysis of stragglers in big data system. BigR… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(10 citation statements)
references
References 16 publications
0
10
0
Order By: Relevance
“…The authors in [28] use offline log analysis methods to identify the root cause of outliers in a large-scale cluster consisting of thousands of nodes by tracking the resource utilization. Similarly, Zhou et al [29] use a simple but efficient rule based method to identify the root cause of stragglers. Along with these similar works, there are some researchers using statistical and machine learning methods for root-cause analysis.…”
Section: Related Workmentioning
confidence: 99%
“…The authors in [28] use offline log analysis methods to identify the root cause of outliers in a large-scale cluster consisting of thousands of nodes by tracking the resource utilization. Similarly, Zhou et al [29] use a simple but efficient rule based method to identify the root cause of stragglers. Along with these similar works, there are some researchers using statistical and machine learning methods for root-cause analysis.…”
Section: Related Workmentioning
confidence: 99%
“…Many reasons for such stragglers to occur including load imbalance, scheduling inefficiencies, data locality, communication overheads hardware heterogeneity [12,13]. There have also been efforts looking to address one or more of these concerns to mitigate the problem [14][15][16]. Sesipite all of these prior efforts are important and useful in overcoming this problem, we believe that a rigorous set of analytical tools is needed in order to better understand the consequences of stragglers on the performance slowdown in big data [17,18].…”
Section: Mapreduce Framework and Stragglersmentioning
confidence: 99%
“…Such jobs and subsequent tasks are scheduled onto different machines in a parallelized manner to accelerate job completion and are often divided into phases creating a Direct Acyclic Graph (DAG) [83]. Application frameworks (such as MapReduce) attempt to sub-divide jobs so that tasks will approximately complete within the same timeframe for each phase [84]. This is achieved by providing a sub-set of data (known as shards) to each task, and allocating the appropriate resources to tasks (CPU, memory, etc).…”
Section: Straggler Definition and Impactmentioning
confidence: 99%