2021
DOI: 10.1002/smr.2413
|View full text |Cite
|
Sign up to set email alerts
|

TraceRank: Abnormal service localization with dis‐aggregated end‐to‐end tracing data in cloud native systems

Abstract: Modern cloud native applications are generally built with a microservice architecture. To tackle various performance problems among a large number of services and machines, an end‐to‐end tracing tool is always equipped in these systems to track the execution path of every single request. However, it is nontrivial to conduct root cause analysis of anomalies with such a large volume of tracing data. This paper proposes a novel system named TraceRank to identify and locate abnormal services causing performance pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 57 publications
0
6
0
Order By: Relevance
“…utilized in many related works [10], [11], [15], [21], [25], [63]. Sock Shop consists of seven microservices, and Train Ticket consists of 41 microservices, positioning it as one of the largest public microservices benchmarks.…”
Section: Empirical Study On Microservices Data 1) Datasetmentioning
confidence: 99%
“…utilized in many related works [10], [11], [15], [21], [25], [63]. Sock Shop consists of seven microservices, and Train Ticket consists of 41 microservices, positioning it as one of the largest public microservices benchmarks.…”
Section: Empirical Study On Microservices Data 1) Datasetmentioning
confidence: 99%
“…Trace Anomaly Score: The Spectrum-based fault localization (SBFL) technique 25,26 has been widely used in software testing, which is mainly based on the coverage information of current program elements to assess the degree of program anomaly. Inspired by this, we calculate the anomaly score by using the SBFL technique, which is also mentioned in previous approaches 9,11,27,28 .…”
Section: Notation Definitionsmentioning
confidence: 99%
“…In this section, we use the PageRank algorithm to locate root cause of abnormal services, which has proven good performance in anomaly propagation and root cause localization 11,13,32,33 . The PageRank algorithm is a well-known approach for web analysis that aims to rank the importance of web pages.…”
Section: Abnormal Services Rankingmentioning
confidence: 99%
See 1 more Smart Citation
“…This is significant problem for a data center that hosts microservices in the scale of thousands. Data centers like this can produce tens of terabytes of trace data per day [YHC21], requiring a huge storage capacity. A major challenge in distributed tracing is to reduce this storage requirement by sampling the traces that can be potentially useful for further analysis [LCPAM19].…”
Section: Overviewmentioning
confidence: 99%