2007
DOI: 10.1007/s11227-007-0149-x
|View full text |Cite
|
Sign up to set email alerts
|

Exploring the performance limits of simultaneous multithreading for memory intensive applications

Abstract: Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that diversity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 24 publications
0
9
0
Order By: Relevance
“…This is quite predictable, since both threads have the same requirements for computational resources because they execute the same code. This is an inherent limitation of SMT machines and is also discussed in [3,14,16].…”
Section: Shared Memory Architecturesmentioning
confidence: 99%
“…This is quite predictable, since both threads have the same requirements for computational resources because they execute the same code. This is an inherent limitation of SMT machines and is also discussed in [3,14,16].…”
Section: Shared Memory Architecturesmentioning
confidence: 99%
“…Prefetching helper threads [7], [8]. run along the main application thread on an idle hardware context and speculatively prefetch data into a shared cache, following a technique known as Speculative Precomputation.…”
Section: Related Workmentioning
confidence: 99%
“…In the practical case of an SMT with two hardware threads, Helper Threading dictates that the second thread should perform some useful but different work from the main computation thread. The most interesting example of Helper Threading is Speculative Precomputation [5], [6], in which the helper thread precomputes memory accesses on behalf of the main computation thread, attacking in this way possible bottlenecks due to memory latency [7], [8].…”
Section: Introductionmentioning
confidence: 99%
“…Most of recent researches on hybrid SRAM and DRAM caches focus mainly on enhancing the overall performance of SRAM (resp., DRAM) by utilizing the merit of DRAM (resp., SRAM). There are also many papers devoted to investigating workload performance: (1) For multi-programmed workloads, prior work discussed the issues of relieving memory contention [10,11], workload balance [12,13] and power related optimization [14]; (2) To improve the performance of memory-intensive workloads, many solutions (e.g., architecture design [15][16][17], OS level method [18][19][20] and feedback control [21,22]) have also been proposed; (3) In the cache system, the improved cache architectures [4,9,23,24] and 3D-stacked DRAM technologies [25][26][27] are used to achieve better workload performance; and so on (a broader overview of related work will be covered in Section 2). Instead, little attention has been paid to designing a last level cache (LLC) scheduling scheme for multi-programmed workloads with different memory footprints.…”
Section: Introductionmentioning
confidence: 99%