Hardware-Based Profiling: An Effective Technique for Profile-Driven Optimization

Conte, Thomas M.; Patel, Burzin A.; Menezes, Kishore N.; Cox, Jeremiah

doi:10.1007/bf03356747

Cited by 26 publications

(22 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Coscheduling could also benefit from per-thread utilization metrics for each shared resource. Others call for cache-line monitors to measure locality and contention in caches, buses, and NUMA systems [33][34][35], using bits for memory state checking [36], using branch predictor history for path profiles [37]. Eyerman et al propose a fundamentally different HPM architecture to collect more meaningful CPI stacks for OoO machines.…”

Section: Issues Facing Hpmmentioning

confidence: 99%

See 1 more Smart Citation

Hardware Performance Monitoring for the Rest of Us: A Position and Survey

Moseley

Vachharajani

Jalby

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Section: Issues Facing Hpmmentioning

confidence: 99%

“…A significant amount of recent research is devoted solely to collecting unbiased edge, path, and call stack profiles [31,37,46,47], and one such paper even won the best paper award at PLDI 2009 [2]. This ostensibly simple feature should have long ago become a commodity, programmable and accessible from user space.…”

Section: A Pragmatic Propositionmentioning

confidence: 99%

Hardware Performance Monitoring for the Rest of Us: A Position and Survey

Moseley

Vachharajani

Jalby

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

“…For example, the Morph system [22] collects profiles via statistical sampling of the program counter on clock interrupts. Alternatively, Conte et al proposed sampling the contents of the branch-prediction hardware using kernel-mode instructions to infer an edge profile [5]. In particular, the tags and target addresses stored in the branch target buffer (BTB) serve to identify an arc in an application, and the branch history stored by the branch predictor can be used to estimate each edge's weight.…”

Section: Related Workmentioning

confidence: 99%

Taming hardware event samples for FDO compilation

Chen

Vachharajani

Hundt

et al. 2010

Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization

View full text Add to dashboard Cite

Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling to generate estimated edge profiles overcomes these drawbacks. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO. In this paper, we use multiple hardware event profiles and supervised learning techniques to generate heuristics for improved precision of basic-block-level sample profiles, and to further improve the smoothing algorithms used to construct edge profiles. We demonstrate that sampling-based FDO can achieve an average of 78% of the performance gains obtained using instrumentation-based exact edge profiles for SPEC2000 benchmarks, matching or beating instrumentation-based FDO in many cases. The overhead of collection is only 0.74% on average, while compiler based instrumentation incurs 6.8%-53.5% overhead (and 10x overhead on an industrial web search application), and dynamic instrumentation incurs 28.6%-1639.2% overhead.

show abstract

“…Other profiling approaches rely on hardware integrated within the microprocessor to assist software developers in profiling an executing program [7][23] [24] [28]. Such hardwareassisted profiling approaches utilize event counters or branch execution statistics to identify application hotspots [7] or frequently executed execution paths [23].…”

Section: Previous Workmentioning

confidence: 99%

“…Such hardwareassisted profiling approaches utilize event counters or branch execution statistics to identify application hotspots [7] or frequently executed execution paths [23]. Although these hardware-assisted profiling approaches may incur lower overheads compared to software-based profiling methods, the runtimes overheads cannot be ignored and incur similar ramifications.…”

Section: Previous Workmentioning

confidence: 99%