Managing SMT resource usage through speculative instruction window weighting

Vandierendonck, Hans; Seznec, André

doi:10.1145/2019608.2019611

Cited by 9 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results in Figure 7 confirm that the degree of confidence of samples selected with workload stratification outper- 6 We did not apply balanced random sampling for 4 cores and 8 cores because the method we used for automatically defining a balanced sample works with the full workload population. In real situations this would not be a problem because detailed simulations are normally done after the workload sample is defined.…”

Section: Actual Degree Of Confidencementioning

confidence: 68%

“…Among the studies using class-based workload selection, very few are fully automatic. In a recent study, Vandierendonck and Seznec use cluster analysis to define 4 classes among the SPEC CPU2000 benchmarks [6]. Van Biesbrouck et al [7] described a fully automatic method to define workloads using microarchitectureindependent profiling data.…”

Section: B Systematic Methodsmentioning

confidence: 99%

“…For 4 cores and 8 cores, we have simulated 250 workloads. For a given sample size and for each sampling method 6 , we take 100 samples, each sample consisting of workloads that we have simulated with Zesto. We compute the per-sample throughput metric (here, the IPCT) for each of the 100 samples and for DIP and LRU.…”

Section: Actual Degree Of Confidencementioning

confidence: 99%

See 2 more Smart Citations

Selecting benchmark combinations for the evaluation of multicore throughput

Velásquez

Michaud

Seznec

2013

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

View full text Add to dashboard Cite

Abstract-Most high-performance processors today are able to execute multiple threads of execution simultaneously. Threads share processor resources, like the last-level cache, which may decrease throughput in a non obvious way, depending on threads' characteristics. Computer architects usually study multiprogrammed workloads by considering a set of benchmarks and some combinations of these benchmarks. Because detailed microarchitecture simulators are slow, we want a subset of combinations that is as small as possible, yet representative. However, there is no standard method for selecting such sample, and different authors have used different methods. It is not clear how the choice of a particular sample impacts the conclusions of a study. We propose and compare different sampling methods for defining multiprogrammed workloads for computer architecture studies. We evaluate their effectiveness with a case study, the comparison of several multicore last-level cache replacement policies. We show that random sampling, the simplest method, is a possible way to define a representative workload sample, provided the sample is large enough. We propose a method for estimating the required sample size based on fast approximate simulation. We propose a new method, workload stratification, which is very effective at reducing the sample size in situations where random sampling would require large samples.

show abstract

Section: Actual Degree Of Confidencementioning

confidence: 68%

Section: B Systematic Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Selecting benchmark combinations for the evaluation of multicore throughput

Velásquez

Michaud

Seznec

2013

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

View full text Add to dashboard Cite

show abstract

“…These include: avoiding register allocation by predicting transient values from branch misprediction in [4], an allocation technique on write buffer for efficient resource occupation by limiting the maximal number of write buffer entries that a thread is allowed to have in [5], improving SMT fetching with an estimation of outstanding work in the system for each thread in [6], early deallocation of registers in association with cache misses in [7], and another fetch policy by considering memorylevel parallelism in [9]. None of these techniques specifically addresses the contention in the issuing stage and most come with a significant requirement in extra hardware to implement the desired intelligence.…”

Section: Introductionmentioning

confidence: 99%

A Real-Time Per-Thread IQ-Capping Technique for Simultaneous Multi-threading (SMT) Processors

Sahba

Zhang

Hays

et al. 2014

2014 11th International Conference on Information Technology: New Generations

View full text Add to dashboard Cite

Effective use of critical resources among threads remains a challenge to Simultaneous multithreading (SMT) due to transient behaviors of threads. As Issue Queue (IQ) is regarded as one of most critical shared resources in the pipeline, putting a limit on its occupation by each thread may easily improve the overall throughput; however, such a limit (cap) should be set properly in real time to accommodate the transient behavior of each thread. We propose a simple dynamic algorithm to adjust the cap value for each thread in real time according to its activeness in terms of its dispatching and issuing activities. The simulation results show that the proposed technique not only achieves a significant improvement in IPC over the regular no-capping technique, but also demonstrates a performance superior to the fixed capping approach.

show abstract

“…Vandierendonck and Seznec [6] propose a new fetch throttling mechanism called Speculative Instruction Window Weighting. This mechanism fetches instructions from the thread with least amount of work left in the pipeline.…”

Section: Introductionmentioning

confidence: 99%

Hyper-Heuristics for Performance Optimization of Simultaneous Multithreaded Processors

Güney

Küçük

Özcan

2013

Information Sciences and Systems 2013

View full text Add to dashboard Cite

Abstract. In Simultaneous Multi-Threaded (SMT) processor datapaths, there are many datapath resources that are shared by multiple threads. Currently, there are a few heuristics that distribute these resources among threads for better performance. A selection hyper-heuristic is a search method which mixes a fixed set of heuristics to exploit their strengths while solving a given problem. In this study, we propose learning selection hyper-heuristics for predicting, choosing and running the best performing heuristic. Our initial test results show that hyper-heuristics may improve the performance of the studied workloads by around 2%, on the average. The peak performance improvement is observed to be 41% over the best performing heuristic, and more than 15% over all heuristics that are studied. Our best hyper-heuristic performs better than the state-ofthe art heuristics on almost 60% of the simulated workloads.

show abstract

Managing SMT resource usage through speculative instruction window weighting

Cited by 9 publications

References 25 publications

Selecting benchmark combinations for the evaluation of multicore throughput

Selecting benchmark combinations for the evaluation of multicore throughput

A Real-Time Per-Thread IQ-Capping Technique for Simultaneous Multi-threading (SMT) Processors

Hyper-Heuristics for Performance Optimization of Simultaneous Multithreaded Processors

Contact Info

Product

Resources

About