Pinpointing performance inefficiencies via lightweight variance profiling

Su, Pengfei; Jiao, Shuyin; Chabbi, Milind; Liu, Xu

doi:10.1145/3295500.3356167

Cited by 9 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Using HTM without developers' knowledge can prove unwelcome because developers often demand full visibility into their programs. Developers are becoming performance and variance sensitive [55,66,81], and an accidental regression can become hard to diagnose. As a side effect, the choice of source-code patch demands us to be surgicalinjecting large, complicated HTM-handling boilerplate code is a non-starter.…”

Section: Gocc Overviewmentioning

confidence: 99%

Optimistic Concurrency Control for Real-world Go Programs (Extended Version with Appendix)

Zhang¹,

Chabbi²,

Welc³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We present a source-to-source transformation framework, GOCC, that consumes lock-based pessimistic concurrency programs in the Go language and transforms them into optimistic concurrency programs that use Hardware Transactional Memory (HTM). The choice of the Go language is motivated by the fact that concurrency is a first-class citizen in Go, and it is widely used in Go programs. GOCC performs rich interprocedural program analysis to detect and filter lock-protected regions and performs AST-level code transformation of the surrounding locks when profitable. Profitability is driven by both static analyses of critical sections and dynamic analysis via execution profiles. A custom HTM library, using perceptron, learns concurrency behavior and dynamically decides whether to use HTM in the rewritten lock/unlock points. Given the rich history of transactional memory research but its lack of adoption in any industrial setting, we believe this workflow, which ultimately produces source-code patches, is more apt for industry-scale adoption. Results on widely adopted Go libraries and applications demonstrate significant (up to 10×) and scalable performance gains resulting from our automated transformation while avoiding major performance regressions.

show abstract

Section: Gocc Overviewmentioning

confidence: 99%

Optimistic Concurrency Control for Real-world Go Programs (Extended Version with Appendix)

Zhang¹,

Chabbi²,

Welc³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Traditional performance profiling tools [17,23,27,37,54,55,58,59], developed over decades, are essential for diagnosing performance issues. These tools are typically programming language specific and automatically instrument the code to collect information about the cumulative runtimes of functions in the code base.…”

Section: Coarse-grained: Performance Profilersmentioning

confidence: 99%

tprof

Huang

Zhu

2021

Proceedings of the ACM Symposium on Cloud Computing

View full text Add to dashboard Cite

The traditional approach for performance debugging relies upon performance profilers (e.g., gprof, VTune) that provide average function runtime information. These aggregate statistics help identify slow regions affecting the entire workload, but they are ill-suited for identifying slow regions that only impact a fraction of the workload, such as tail latency effects. This paper takes a new approach to performance profiling by utilizing distributed tracing systems (e.g., Dapper, Zipkin, Jaeger). Since traces provide detailed timing information on a per-request basis, it is possible to group and aggregate tracing data in many different ways to identify the slow parts of the system. Our new approach to trace aggregation uses the structure embedded within traces to hierarchically group similar traces and calculate increasingly detailed aggregate statistics based on how the traces are grouped. We also develop an automated tool for analyzing the hierarchy of statistics to identify the most likely performance issues. Our case study across two complex distributed systems illustrates how our tool is able to find multiple performance issues that lead to 10× and 28× performance improvements in terms of average and tail latency, respectively. Our comparison with a state-of-the-art industry tool shows that our tool can pinpoint performance slowdowns more accurately than current approaches. CCS Concepts• Software and its engineering → Software testing and debugging; • General and reference → Performance.

show abstract