2003
DOI: 10.1007/s00224-003-1086-6
|View full text |Cite
|
Sign up to set email alerts
|

Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach

Abstract: Explicit-multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with 1) a simple programming style that relies on fine-grained PRAM-style algorithms; 2) hardware support for low-overhead parallel threads, scalable load balancing, and efficient synchronization. The missing link between the algorithmic-programming level and the architecture level is provided by the first prototype XMT compiler. This paper also takes this new opportunit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
38
0

Year Published

2007
2007
2013
2013

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 22 publications
(38 citation statements)
references
References 27 publications
0
38
0
Order By: Relevance
“…To evaluate the RAP algorithm, we are using XMT 5 -a general-purpose manycore architecture [23]. A recent study showed that when configured to use the same chip area, XMT can outperform both an Intel Core 2 (speedups up to 13.83x [6]), AMD Opteron (speedups up to 8.56x [36]) and also an NVIDIA GTX280 GPU (speedups of up to 8.10x [5] on irregular workloads).…”
Section: Fig 1 Miss Handling Architecture (Mha) For a Banked Cache mentioning
confidence: 99%
See 2 more Smart Citations
“…To evaluate the RAP algorithm, we are using XMT 5 -a general-purpose manycore architecture [23]. A recent study showed that when configured to use the same chip area, XMT can outperform both an Intel Core 2 (speedups up to 13.83x [6]), AMD Opteron (speedups up to 8.56x [36]) and also an NVIDIA GTX280 GPU (speedups of up to 8.10x [5] on irregular workloads).…”
Section: Fig 1 Miss Handling Architecture (Mha) For a Banked Cache mentioning
confidence: 99%
“…The compiler can insert several short independent work units (or tasks) in a loop within a coarser task, effectively enabling the use of loop prefetching, at the possible cost of a less load-balanced execution. This compiler technique, called thread clustering [23], allowed us to evaluate the loop prefetching algorithm on all our benchmarks.…”
Section: Additional Optimizationsmentioning
confidence: 99%
See 1 more Smart Citation
“…We assume uniform traffic pattern, which is expected for the memory architecture described in [16], due to the use of a hashing mechanism [2,4,10,15].…”
Section: Cycle-accurate Validationmentioning
confidence: 99%
“…The XMT architecture eliminates local private caches in order to avoid cache coherence issues and uses hashing mechanism to avoid hot spots [16]. This dramatically increases the load on the interconnection network and makes the network traffic reasonably uniform, rendering the current interconnection networks ineffective.…”
Section: Impact On Single-chip Parallel Processingmentioning
confidence: 99%