Using Model Trees for Computer Architecture Performance Analysis of Software Applications

Ould-Ahmed-Vall, ElMoustapha; Woodlee, J.; Yount, Charles; Doshi, Kshitij; Abraham, Santosh G.

doi:10.1109/ispass.2007.363742

Cited by 41 publications

(31 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Despite much research in the area, the two most widely used applications of HPM continue to be architectural characterization [26,27] and application performance tuning. This is likely because the hardware best supports these applications.…”

Section: Issues Facing Hpmmentioning

confidence: 99%

Hardware Performance Monitoring for the Rest of Us: A Position and Survey

Moseley

Vachharajani

Jalby

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Section: Issues Facing Hpmmentioning

confidence: 99%

Hardware Performance Monitoring for the Rest of Us: A Position and Survey

Moseley

Vachharajani

Jalby

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

“…Empirical modeling seems to be the most widely used analytical modeling technique today and was employed for modeling out-of-order processors only, to the best of our knowledge. Some prior proposals consider linear regression models for analysis purposes [Joseph et al 2006a], nonlinear regression for performance prediction [Joseph et al 2006b], spline-based regression for power and performance prediction [Lee and Brooks 2006], neural networks [Dubach et al 2007;Ipek et al 2006], or model trees [Ould-Ahmed-Vall et al 2007].…”

Section: Analytical Modelingmentioning

confidence: 99%

“…A mechanistic model has the advantage of directly displaying the performance effects of individual mechanisms, expressed in terms of program characteristics such as interinstruction dependence profiles and fine-grained instruction mix; machine parameters such as processor width, number of functional units, and pipeline depth; and program-machine interaction characteristics such as cache miss rates and branch misprediction rates. Mechanistic modeling is in contrast to the more common empirical models that use machine learning techniques and/or statistical methods (e.g., neural networks, regression) to infer a performance model [Dubach et al 2007;Ipek et al 2006;Joseph et al 2006aJoseph et al , 2006bLee and Brooks 2006;Ould-Ahmed-Vall et al 2007;Mariani et al 2013]. Empirical modeling involves running a large number of detailed cycle-accurate simulations to infer or fit a performance model.…”

Section: Introductionmentioning

confidence: 99%

Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance

Breughe

Eyerman

Eeckhout

2015

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Superscalar in-order processors form an interesting alternative to out-of-order processors because of their energy efficiency and lower design complexity. However, despite the reduced design complexity, it is nontrivial to get performance estimates or insight in the application-microarchitecture interaction without running slow, detailed cycle-level simulations, because performance highly depends on the order of instructions within the application's dynamic instruction stream, as in-order processors stall on interinstruction dependences and functional unit contention. To limit the number of detailed cycle-level simulations needed during design space exploration, we propose a mechanistic analytical performance model that is built from understanding the internal mechanisms of the processor.The mechanistic performance model for superscalar in-order processors is shown to be accurate with an average performance prediction error of 3.2% compared to detailed cycle-accurate simulation using gem5. We also validate the model against hardware, using the ARM Cortex-A8 processor and show that it is accurate within 10% on average. We further demonstrate the usefulness of the model through three case studies:(1) design space exploration, identifying the optimum number of functional units for achieving a given performance target; (2) program-machine interactions, providing insight into microarchitecture bottlenecks; and (3) -We added modeling of an arbitrary number of functional units of any type in contrast to a fixed number in the ISPASS paper (i.e., 4 ALUs and 1 unit for all other types). -We completely revised the modeling of interinstruction dependences and unified it with the functional unit contention modeling. -We added modeling of memory-level parallelism, which has a nonnegligible impact on performance for some benchmarks that were not evaluated in the ISPASS paper. -We validated the model against hardware.-We added a case study on sizing the number of functional units. -We reevaluated all other case studies using the new model and revealed new insights about the interaction between dependences and functional unit contention.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee.

show abstract

“…Empirical modeling seems to be the most widely used analytical modeling technique today, and was employed for modeling out-of-order processors only, to the best of our knowledge. Some prior proposals consider linear regression models for analysis purposes [14]; non-linear regression for performance prediction [15]; spline-based regression for power and performance prediction [19]; neural networks [4,13]; or model trees [26].…”

Section: Analytical Modelingmentioning

confidence: 99%

“…A mechanistic model has the advantage of directly displaying the performance effects of individual mechanisms, expressed in terms of program characteristics (such as instruction mix and inter-instruction dependency profiles), machine parameters (such as processor width, number of functional units, pipeline depth), and programmachine interaction characteristics such as cache miss rates and branch misprediction rates. Mechanistic modeling is in contrast to the more common empirical models which use machine learning techniques and/or statistical methods, e.g., neural networks, regression, etc., to infer a performance model [4,13,14,15,19,26]. Empirical modeling involves running a large number of detailed cycle-accurate simulations to infer or fit a performance model.…”

Section: Introductionmentioning

confidence: 99%

A mechanistic performance model for superscalar in-order processors

Breughe

Eyerman

Eeckhout

2012

2012 IEEE International Symposium on Performance Analysis of Systems &Amp; Software

View full text Add to dashboard Cite

Mechanistic processor performance modeling builds an analytical model from understanding the underlying mechanisms in the processor and provides fundamental insight in program-microarchitecture interactions, as well as microarchitecture structure scaling trends and interactions. Whereas prior work in mechanistic performance modeling focused on superscalar out-of-order processors, this paper presents a mechanistic performance model for superscalar in-order processors. We find mechanistic modeling for inorder processors to be more challenging compared to outof-order processors because the latter are designed to hide latencies, and hence from a modeling perspective, detailed modeling of instruction execution latencies and dependencies is not required.The proposed mechanistic performance model for superscalar in-order processors models the impact of non-unit instruction execution latencies, inter-instruction dependencies, cache/TLB misses and branch mispredictions, and achieves an average performance prediction error of 2.5% compared to detailed cycle-accurate simulation. We extensively evaluate the model's accuracy and we demonstrate its usefulness through three applications: (i) we compare inorder versus out-of-order performance, (ii) we quantify the impact of compiler optimizations on in-order performance, and (iii) we perform a power/performance design space exploration.

show abstract

Using Model Trees for Computer Architecture Performance Analysis of Software Applications

Cited by 41 publications

References 9 publications

Hardware Performance Monitoring for the Rest of Us: A Position and Survey

Hardware Performance Monitoring for the Rest of Us: A Position and Survey

Mechanistic Analytical Modeling of Superscalar In-Order Processor Performance

A mechanistic performance model for superscalar in-order processors

Contact Info

Product

Resources

About