The Slowdown or Race-to-idle Question: Workload-Aware Energy Optimization of SMT Multicore Platforms under Process Variation

Das, Anup; Merrett, Geoff V.; Al-Hashimi, Bashir M.

doi:10.3850/9783981537079_0018

Cited by 7 publications

(2 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An alternative paradigm of power-management is known as race-to-idle [25,26] or race-to-halt [27]. This is possible because modern processors are able to quickly power down regions of the CPU (power-gating).…”

Section: Dynamic Voltage and Frequency Scalingmentioning

confidence: 99%

Low-Overhead Reinforcement Learning-Based Power Management Using 2QoSM

Giardino

Schwyn

Ferri

et al. 2022

JLPEA

View full text Add to dashboard Cite

With the computational systems of even embedded devices becoming ever more powerful, there is a need for more effective and pro-active methods of dynamic power management. The work presented in this paper demonstrates the effectiveness of a reinforcement-learning based dynamic power manager placed in a software framework. This combination of Q-learning for determining policy and the software abstractions provide many of the benefits of co-design, namely, good performance, responsiveness and application guidance, with the flexibility of easily changing policies or platforms. The Q-learning based Quality of Service Manager (2QoSM) is implemented on an autonomous robot built on a complex, powerful embedded single-board computer (SBC) and a high-resolution path-planning algorithm. We find that the 2QoSM reduces power consumption up to 42% compared to the Linux on-demand governor and 10.2% over a state-of-the-art situation aware governor. Moreover, the performance as measured by path error is improved by up to 6.1%, all while saving power.

show abstract

Section: Dynamic Voltage and Frequency Scalingmentioning

confidence: 99%

Low-Overhead Reinforcement Learning-Based Power Management Using 2QoSM

Giardino

Schwyn

Ferri

et al. 2022

JLPEA

View full text Add to dashboard Cite

show abstract

“…While SMT-based multicore systems are emerging as the norm for achieving the computing power necessary [199,200], it is equally important to map these application to maximise energy efficiency [34,201]. With energy efficient computing emerging as an important paradigm, recent approaches adopted to using linear programming or heuristics to map threads-to-cores based on a quantifiable relative gain metric.…”

Section: Energy Efficient Schedulingmentioning

confidence: 99%

Energy optimising methodologies on heterogeneous data centres

Nishtala

View full text Add to dashboard Cite

In 2013, U.S. data centres accounted for 2.2% of the country's total electricity consumption, a figure that is projected to increase rapidly over the next decade. A significant proportion of power consumed within a data centre is attributed to the servers, and a large percentage of that is wasted as workloads compete for shared resources. Many data centres host interactive workloads (e.g., web search or e-commerce), for which it is critical to meet user expectations and user experience, called Quality of Service (QoS). There is also a wish to run both interactive and batch workloads on the same infrastructure to increase cluster utilisation and reduce operational costs and total energy consumption. Although much work has focused on the impacts of shared resource contention, it still remains a major problem to maintain QoS for both interactive and batch workloads. The goal of this thesis is twofold. First, to investigate how, and to what extent, resource contention has an effect on throughput and power of batch workloads via modelling. Second, we introduce a scheduling approach to determine on-the-fly the best configuration to satisfy the QoS for latency-critical jobs on any architecture. To achieve the above goals, we first propose a modelling technique to estimate server performance and power at runtime called Runtime Estimation of Performance and Power (REPP). REPP's goal is to allow administrators' control on power and performance of processors. REPP achieves this goal by estimating performance and power at multiple hardware settings (dynamic frequency and voltage states (DVFS), core consolidation and idle states) and dynamically sets these settings based on user-defined constraints. The hardware counters required to build the models are available across architectures, making it architecture agnostic. We also argue that traditional modelling and scheduling strategies are ineffective for interactive workloads. To manage such workloads, we propose Hipster that combines both a heuristic, and a reinforcement learning algorithm to manage interactive workloads. Hipster's goal is to improve resource efficiency while respecting the QoS of interactive workloads. Hipster achieves its goal by exploring the multicore system and DVFS. To improve utilisation and make the best usage of the available resources, Hipster can dynamically assign remaining cores to batch workloads without violating the QoS constraints for the interactive workloads. We implemented REPP and Hipster in real-life platforms, namely 64-bit commercial (Intel SandyBridge and AMD Phenom II X4 B97) and experimental hardware (ARM big.LITTLE Juno R1). After obtaining extensive experimental results, we have shown that REPP successfully estimates power and performance of several single-threaded and multiprogrammed workloads. The average errors on Intel, AMD and ARM architectures are, respectively, 7.1%, 9.0%, 7.1% when predicting performance, and 8.1%, 6.5%, 6.0% when predicting power. Similarly, we show that when compared to prior work, Hipster improves the QoS guarantee for Web-Search from 80% to 96%, and for Memcached from 92% to 99%, while reducing the energy consumption by up to 18% on the ARM architecture. En el año 2013, los centros de cálculo de los EEUU consumieron el 2,2% del consumo total de electricidad en ese país. Las proyecciones futuras indican que esta cantidad se incrementará rápidamente durante la próxima década. Una cantidad significativa del consumo de un centro de cálculo corresponde al funcionamiento de los servidores, y un alto porcentaje de este consumo se desperdicia mientras los trabajos compiten en el uso de recursos compartidos. Una gran cantidad de los centros de cálculo se utilizan para ejecutar trabajos interactivos, para los cuales es muy importante cumplir con las expectativas de los usuarios y proporcionar una alta calidad de servicio (CDS). En estos centros, se intentan ejecutar aplicaciones interactivas i en batch en la misma infraestructura para incrementar su utilización, y reducir los costes de mantenimiento y la energía total consumida. Aunque se dedican muchos esfuerzos al impacto de la compartición de recursos en el rendimiento de las aplicaciones, todavía se mantiene el problema de garantizar un determinado nivel de CDS para los dos tipos de trabajos, interactivos y en batch. Los objetivos de esta tesis doctoral son, enprimerlugar, investigar mediante técnicas de modelado, cómo y hasta que punto la contención debida a la compartición de recursos tiene un efecto en la ejecución y el consumo en trabajos batch. Ensegundolugar, la tesis presenta una técnica de planificación para determinar dinámicamente la mejor configuración para satisfacer una CDS en trabajos interactivos con un límite de latencia preestablecido, en cualquier arquitectura Para conseguir los objetivos propuestos, primero proponemos una técnica de modelización para estimar dinámicamente el rendimiento y el consumo de los servidores, que recibe por nombre Runtime Estimation of Performance and Power (REPP). El objetivo que perseguimos con la política de planificación REPP es permitir a los administradores obtener el control del consumo y el rendimiento de los procesadores. REPP consigue este objetivo a través de la estimación del rendimiento de las aplicaciones y su consumo al variar los niveles de energía del procesador, y dinámicamente cambia la configuración del sistema respetando las condiciones dadas por el usuario. Este modelado se realiza en base a un conjunto de contadores de eventos del procesador, que se han seleccionado de forma que están disponibles en las arquitecturas más comunes, haciendo que REPP sea independiente de la arquitectura En este trabajo de tesis doctoral, también defendemos que los métodos tradicionales de modelado y las estrategias de planificación usadas en estos entornos, no son efectivas para trabajos interactivos. Para tratar correctamente a estos trabajos, proponemos Hipster, una política de planificación que combina una heurística y un algoritmo basado en aprendizaje por refuerzo. El objetivo que fijamos con Hipster es mejorar la eficiencia en el uso de los recursos, al mismo tiempo que se respeta la calidad de servicio data a los trabajos interactivos. Hipster consigue sus objetivos con la exploración del funcionamiento del sistema y la variación de la frecuencia y el voltaje de los procesadores Hemos implementado REPP y Hipster en plataformas comerciales de 64bit (Intel y AMD) y experimentales (ARM big.LITTLE). Hemos obtenido resultados experimentales en estas plataformas y hemos demostrado que REPP realiza estimaciones de consumo y rendimiento de aplicaciones secuenciales y de trabajos formados por varias aplicaciones. El error medio en las arquitecturas Intel, AMD y ARM son, respectivamente, del 7,1%,9,0% y 7,1% en la predicción del rendimiento, y del 8,1%,6,5% y 6,0% en la predicción del consumo. De forma similar, demostramos que al comparar Hipster con los trabajos previos, nuestro algoritmo mejora la calidad de servicio para el servicio de búsqueda en la web, entre el 80% y el 96%, y para la aplicación Memcached del 92% al 99%, al tiempo que reduce el consumo de energia hasta el 18% en ARM

show abstract