Fuse

Neill, Richard; Drebes, Andi; Pop, Antoniu

doi:10.1145/3148054

Cited by 10 publications

(4 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These methods require the entire trace of an application before providing corrections and cannot be run in real time. For example, Lv et al 37 use the Gumbel test for outlier detection, and Neill et al 38 use fork‐join aware agglomerative clustering to remove outlier points. These methods are unsuitable for dynamic control situations requiring online HPC correction.…”

Section: Related Workmentioning

confidence: 99%

“…At the end of that function, the old process can be identified via the prev parameter, while the newly-scheduled one can be already reached via current. We provide in Listing 3 the reference code to install a context switch callback by relying on kprobes-the install_kprobe function (lines [30][31][32][33][34][35][36][37][38][39][40][41][42]. In particular, we rely on a kretprobe to be notified when the finish_task_switch function is returning.…”

Section: Per Thread Profilingmentioning

confidence: 99%

See 1 more Smart Citation

Strategies and software support for the management of hardware performance counters

Carnà

Marotta²,

Pellegrini

et al. 2023

Softw Pract Exp

View full text Add to dashboard Cite

Hardware performance counters (HPCs) are facilities offered by most off‐the‐shelf CPU architectures. They are a vital support to post‐mortem performance profiling and are exploited by standard tools such as Linux or Intel V‐Tune. Nevertheless, an increasing number of application domains (e.g., simulation, task‐based high‐performance computing, or cybersecurity) are exploiting them to perform different activities, such as self‐tuning, autonomic optimization, and/or system inspection. This repurposing of HPCs can be difficult, for example, because of the overhead for extracting relevant information. This overhead might render any online or self‐tuning activity ineffective. This article discusses various practical strategies to exploit HPCs beyond post‐mortem profiling, suitable for different application contexts. The presented strategies are accompanied by a general primer on HPCs usage on Linux. We also provide reference x86 (both Intel and AMD) implementations targeting the Linux kernel, upon which we present an experimental assessment of the viability of our proposals.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Per Thread Profilingmentioning

confidence: 99%

Strategies and software support for the management of hardware performance counters

Carnà

Marotta²,

Pellegrini

et al. 2023

Softw Pract Exp

View full text Add to dashboard Cite

show abstract

“…Several works in the mainstream (high-performance) domain reason on the sources of variability in HEM values when executing several times the same piece of software. This covers from the operating system noise [119], application variability [7,121] and the particular HEM-Reading library, to the complexity of the hardware [184]. For instance, [119] focuses on the cycle count HEM and shows that its variability is often related to the executable layout and operating system issues.…”

Section: State-of-the-art On Hem Analysismentioning

confidence: 99%

“…In our work, we use no operating system and access directly, with no library, the HEMs (via the PMCs) so they are not subject to software-induced variability. In [121], authors focus on task-parallel programs in high-performance environments with highly-dynamic execution conditions, including dynamic task scheduling, that cause tasks to execute in different orders and in different cores across executions. Authors propose techniques to determine which HEM readings belong to each task and hence, combine them to derive all HEMs for a task.…”

Section: State-of-the-art On Hem Analysismentioning

confidence: 99%

Modelling and predicting extreme behavior in critical real-time systems with advanced statistics

Vilardell Moreno

View full text Add to dashboard Cite

(English) Critical Real-Time Embedded Systems (CRTES) are used in domains like transportation (e.g. avionics, automotive, space, and railway), healthcare, and industrial machinery. This subset of embedded systems requires undergoing a stringent Validation & Verification (V&V) process before they are allowed to enter in operation since any misbehavior can result in harm of humans or even fatalities. Software timing behavior is a key element to cover in the V&V process, providing with evidence that software runs timely. Software timing analysis, in turn, requires deriving bounds to each application task’s execution time. These bounds are referred to as Worst-Case Execution Time (WCET) estimates. As CRTES implement more complex safety-related functionalities in every new product, more complex software and consequently more performant computing hardware is used to satisfy the high-performance requirements. The side effect, however, of using more complex hardware and software is challenging state-of-the-art software timing analysis techniques. Measurement-Based Probabilistic Timing Analysis (MBPTA) techniques have been proposed to handle such hardware and software complexity providing tight and trustworthy WCET bounds (estimates). Specifically, Extreme Value Theory (EVT) has been used to provide with models for the most extreme occurrences in form of a probabilistic distribution. The output of the timing model is referred as a probabilistic WCET (pWCET). However trustworthy, EVT models can be cumbersome to apply and they sometimes can be exceedingly pessimistic which adds extra cost into timing budgets. This thesis investigates MBPTA techniques and develops novel methodologies within this framework in three distinct fronts. Firstly, by improving the tightness of pWCET models on sky-high quantiles with two models. A first one that combines risk analysis with EVT for a safe and accurate pWCET. And a second one that introduces Markov’s Inequality to the pWCET estimation problem, which provides with trustworthy guarantees with less requirements for its correct application. Secondly, in order to boost the use of data coming from performance monitoring counters - increasingly used by MBPTA techniques to tighten estimates-, this thesis shows two mathematically-based ways of merging multiple disjointed readings based on order statistics and copula models. Finally, this thesis proposes a model for the contention of competing tasks, when the timing profile obtained is limited, that allows to provide with more extreme WCET scenarios based on the dependencies between tasks. Summarizing, this thesis pushes the state-of-the-art forward in the V&V methodologies for CRTES in the framework of MBPTA in terms of WCET estimation and data gathering. (Español) Los Sistemas Críticos Embebidos en Tiempo Real (SCETR) se usan en ámbitos como el transporte (aviónica, automoción, ferrocarril, etc.), la sanidad y la industria. Este subconjunto de sistemas embebidos requiere un proceso de Validación y Verificación (VyV) antes de que se les permita entrar en funcionamiento,dado que un comportamiento erróneo puede provocar daños a los seres humanos o incluso víctimas mortales. El comportamiento temporal del software es un elemento clave que hay que cubrir en el proceso de VyV, ya que proporciona pruebas de que el software se ejecuta a tiempo. El análisis temporal del software, a su vez, requiere derivar límites al tiempo de ejecución de cada tarea de la aplicación. Estos límites se denominan estimaciones del Tiempo de Ejecución del Peor Caso (TEPC). A medida que los SCETR implementan funcionalidades más complejas relacionadas con la seguridad en cada nuevo producto, se utiliza un software más complejo y un hardware ide computación de mayor rendimiento para satisfacer los requisitos de alto rendimiento. Sin embargo, el efecto secundario del uso de hardware y software más complejos es un reto para las técnicas de análisis temporal de software más avanzadas. Se han propuesto técnicas de Análisis Temporal Probabilístico Basado en Mediciones (ATPBM) para manejar dicha complejidad de hardware y software, proporcionando límites (estimaciones) del TEPC ajustados y fiables. En concreto, se ha utilizado la Teoría de Valores Extremos (TVE) para proporcionar modelos para las ocurrencias más extremas en forma de distribución probabilística. El resultado del modelo de tiempo se denomina TEPC probabilístico (TEPCp). Sin embargo, aunque seguros, los modelos TVE pueden ser a veces excesivamente pesimistas, lo que añade un coste adicional a los presupuestos de temporización. Esta tesis investiga las técnicas ATPBM y desarrolla metodologías novedosas dentro de este marco en tres frentes distintos. En primer lugar, mejorando el ajuste de los modelos TEPCp en los cuantiles altísimos con dos modelos. El primero combinará el análisis de riesgos con la TVE para obtener un TEPCp más seguro y preciso. El segundo introducirá la desigualdad de Markov en el problema de estimación del TEPCp, que proporciona garantías más seguras. En segundo lugar, para potenciar el uso de los datos procedentes de los contadores de monitorización del rendimiento, esta tesis muestra dos formas, basadas en las matemáticas, de fusionar múltiples lecturas disjuntas a partir de estadísticos de orden y modelos de cópula. Por último, esta tesis propone un modelo para la contención de tareas en competencia cuando el perfil de tiempos obtenido es limitado. Este método permite prever escenarios TEPC más extremos basados en las dependencias entre tareas. Resumiendo, esta tesis impulsa el estado del arte en las metodologías de VyV para SCETR en el marco de ATPBM en términos de estimación de TEPC y recopilación de datos.

show abstract

Accurate and Complete Hardware Profiling for OpenMP

Neill

Drebes

Pop

2017

Scaling OpenMP for Exascale Performance and Portability

Self Cite

View full text Add to dashboard Cite

Analyzing the behavior of OpenMP programs and their interaction with the hardware is essential for locating performance bottlenecks and identifying performance optimization opportunities. However, current architectures only provide a small number of dedicated registers to quantify hardware events, which strongly limits the scope of performance analyses. Hardware event multiplexing can help cover more events, but incurs a significant loss of accuracy and introduces overheads that change the behavior of program execution significantly. In this paper, we present an implementation of our technique for building a unique, coherent profile that contains all available hardware events from multiple executions of the same OpenMP program, each monitoring only a subset of the available hardware events. Reconciliation of the execution profiles relies on a new labeling scheme for OpenMP that uniquely identifies each dynamic unit of work across executions under dynamic scheduling across processing units. We show that our approach yields significantly better accuracy and lower monitoring overhead per execution than hardware event multiplexing.

show abstract

Fuse

Cited by 10 publications

References 22 publications

Strategies and software support for the management of hardware performance counters

Strategies and software support for the management of hardware performance counters

Modelling and predicting extreme behavior in critical real-time systems with advanced statistics

Accurate and Complete Hardware Profiling for OpenMP

Contact Info

Product

Resources

About