MLP-Aware Instruction Queue Resizing: The Key to Power-Efficient Performance

Petoumenos, Pavlos; Psychou, Georgia; Gonzalez, Juan Manuel Cebrian; Aragón, Juan L.

doi:10.1007/978-3-642-11950-7_11

Cited by 15 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More specifically, we will analyze the effects on peak temperature of MLP-based instruction window (ROB) resizing [18] and ALU selection based on instruction criticality (from ALUs placed on different layers) while varying the number of cores. Figure 5 shows the effects on peak temperature of different instruction window (IW) sizes for a 4-layer vertical core design (Figure 1.d).…”

Section: Further Temperature Optimizationsmentioning

confidence: 99%

See 1 more Smart Citation

Token3D: Reducing Temperature in 3D Die-Stacked CMPs through Cycle-Level Power Control Mechanisms

Cebrian

Aragón

2011

Euro-Par 2011 Parallel Processing

Self Cite

View full text Add to dashboard Cite

Abstract. Nowadays, chip multiprocessors (CMPs) are the new standard design for a wide range of microprocessors: mobile devices (in the near future almost every smartphone will be governed by a CMP), desktop computers, laptop, servers, GPUs, APUs, etc. This new way of increasing performance by exploiting parallelism has two major drawbacks: off-chip bandwidth and communication latency between cores. 3D die-stacked processors are a recent design trend aimed at overcoming these drawbacks by stacking multiple device layers. However, the increase in packing density also leads to an increase in power density, which translates into thermal problems. Different proposals can be found in the literature to face these thermal problems such as dynamic thermal management (DTM), dynamic voltage and frequency scaling (DVFS), thread migration, etc. In this paper we propose the use of microarchitectural power budget techniques to reduce peak temperature. In particular, we first introduce Token3D, a new power balancing policy that takes into account temperature and layout information to balance the available per core power along other power optimizations for 3D designs. And second, we analyze a wide range of floorplans looking for the optimal temperature configuration. Experimental results show a reduction of the peak temperature of 2-26ºC depending on the selected floorplan.

show abstract

Section: Further Temperature Optimizationsmentioning

confidence: 99%

“…Entries are disabled by layer, so we disable entries in groups of 32. In order to decide the current IW size we use a dynamic MLP-based IW resizing mechanism as proposed in [18]. In Figure 5-left, we also show the distribution of the average IW size for different benchmark suites (represented with lines).…”

Section: Further Temperature Optimizationsmentioning

confidence: 99%

Token3D: Reducing Temperature in 3D Die-Stacked CMPs through Cycle-Level Power Control Mechanisms

Cebrian

Aragón

2011

Euro-Par 2011 Parallel Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Adaptive front-end throttling is orthogonal to, and can even leverage, most existing techniques, providing even greater savings. For example, fetch gating based on branch prediction confidence [6,49] and dynamic issue queue, reorder buffer, and load/store queue re-sizing [8,14,26,54,57] can be applied together with adaptive front-end throttling to achieve greater savings. Third, previous work either does not have a direct way to quantify the overhead of the throttling technique and the resulting energy savings, or gets this information relying on architecture-level modeling frameworks, such as Wattch [13] and McPAT [45], which are known to have limited accuracy.…”

Section: Discussionmentioning

confidence: 99%

“…Prior works [8,9,14,23,26,36,49,50,54,57] have proposed various energy-saving techniques that dynamically allocate datapath resources according to the needs of applications. These energysaving techniques suffer from two problems.…”

Section: Dynamic Core Scalingmentioning

confidence: 99%

See 1 more Smart Citation

Scrutinizing Resource Utilization for High Performance and Low Energy Computation

Zhang

View full text Add to dashboard Cite

Modern processors often suffer from inefficient resource utilization, which leads to inferior performance and energy efficiency. This dissertation scrutinizes the utilization of datapath and cache resources in superscalar processors for opportunities to improve performance and energy efficiency.Traditional superscalar processors usually employ a one-size-fits-all design approach that allocates a fixed amount of resources for all applications at all times to deliver the best overall performance. However, the one-size-fits-all approach is not always energy efficient, because both the application behavior and the use scenario are changing all the time and the demand for processor resources is also changing accordingly.To improve the utilization of datapath resources, this dissertation proposes an adaptive processor that dynamically allocates datapath resources based on the needs of applications and use scenarios. The adaptive processor is applied to two use cases to improve energy efficiency. In the first use case (front-end throttling (FET)), the adaptive processor dynamically throttles the frontend instruction delivery bandwidth as program behavior changes to optimize a target metric, being performance, energy, or an arbitrary trade-off between them. In the second use case (dynamic core scaling (DCS)), the adaptive processor extends performance-energy tradeoff capabilities in superscalar processors by scaling datapath resource rather than voltage. The adaptive processor ensures that programs run at a given percentage of their maximum speed and, at the same time, minimizes energy consumption by dynamically adjusting the active superscalar datapath resources. DCS is more effective in performance-energy tradeoffs than DVFS at the high performance end. When used together with DVFS, DCS significantly extends the range of performance-energy tradeoffs.Caches also suffer from inefficient utilization in modern processors. To minimize the access iv v latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup.However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. To improve the utilization of the L1 instruction cache, this dissertation proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need to be accessed. ETL incurs no performance penalty and insignificant hardware overhead, but dramatically reduces the read energy of L1 instruction cache.For memory intensive workloads, caches often suffer from thrashing, i.e., high-reuse blocks evicting each other from the cache due to the lack of space. To reduce thrashing, only a fraction of the working set should be kept in the cache, so that at least this fraction stays longer in the cache to enable reuse before eviction. However, prior insertion policies take an ad hoc approach to selecting that fraction, e.g., inserting blocks with high priority at ...

show abstract

Adaptive duty cycling based multi-hop PSMP for internet of multimedia things

Afzal

Alvi

Shah

2016

2016 13th IEEE Annual Consumer Communications &Amp; Networking Conference (CCNC)

View full text Add to dashboard Cite

In several use-cases of Internet of Things (loT), IEEE 802.11 based WLANs are more favorable due to superior data rate support even though their energy efficiency is not up to the mark. Particularly, wireless multimedia sensors based WLANs demand higher energy resources. In this regard, various IEEE 802.11 based power saving mechanisms are developed. IEEE 802.11n standard specifies Power Save Multiple Poll (PSMP) protocol. However, PSMP is infeasible for many loT based systems specifically in use-cases where multi-hop com munication is required. Moreover, PSMP scheduling mechanism lacks the capability to adapt to the dynamic Quality of Service (QoS) requirements in Internet of Multimedia Things (loMT). In this paper, a QoS aware Multi-Hop PSMP (mPSMP) protocol is proposed to enable energy efficient multimedia communication over loT. The mPSMP incorporate a traffic scheduling model to allocate channel resources in a time division multiple access manner. Therein, adaptive duty cycling is employed to minimize energy utilization, while assuring the required multimedia QoS for each node. The proposed protocol is implemented in NetworkSimulator-2 (NS-2). Analytical analysis and simulation study sug gests reduction in end-to-end delay and duty cycling along with significant improvement in energy efficiency of loMT devices.

show abstract

MLP-Aware Instruction Queue Resizing: The Key to Power-Efficient Performance

Cited by 15 publications

References 16 publications

Token3D: Reducing Temperature in 3D Die-Stacked CMPs through Cycle-Level Power Control Mechanisms

Token3D: Reducing Temperature in 3D Die-Stacked CMPs through Cycle-Level Power Control Mechanisms

Scrutinizing Resource Utilization for High Performance and Low Energy Computation

Adaptive duty cycling based multi-hop PSMP for internet of multimedia things

Contact Info

Product

Resources

About