2010
DOI: 10.1109/mm.2010.79
|View full text |Cite
|
Sign up to set email alerts
|

The SARC Architecture

Abstract: The SARC architecture is composed of multiple processor types and a set of user-managed direct memory access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system automatically allocates tasks on the heterogeneous cores and schedules the data transfers through the DMA engines. SARC's programming model supports various highly parallel applications, with matching support from specialized accelerator processors. On-chip parallel computation shows great promise for … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
42
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 47 publications
(43 citation statements)
references
References 13 publications
1
42
0
Order By: Relevance
“…The PRF registers are multidimensional, with arbitrary sizes and can be created / resized at runtime. Previous studies ( [4], [14]) have demonstrated that PRFs suit computationally intensive workloads such as Floyd, the Conjugate Gradient (CG) method and dense matrix multiplication. Moreover, PRFs could improve performance and efficiency in state of the art many-core computers, potentially saving area and power as shown in [5].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…The PRF registers are multidimensional, with arbitrary sizes and can be created / resized at runtime. Previous studies ( [4], [14]) have demonstrated that PRFs suit computationally intensive workloads such as Floyd, the Conjugate Gradient (CG) method and dense matrix multiplication. Moreover, PRFs could improve performance and efficiency in state of the art many-core computers, potentially saving area and power as shown in [5].…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, PRFs allow performance benefits when compared to the Cell processor for Floyd and the main kernel of the CG Method -sparse matrix vector multiplication [4]. The PRF programming interface allows high performance dense matrix multiplication with at least 35 times less instructions than a hand-crafted version for the Cell BE [14]. One of the objectives of the PRF, as part of the Scalable ARChitecture (SARC) project [14], is multi-core scalability.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Alternatively, the address translation mechanism can be augmented with a few extra bits that explicitly determine whether an address region contains cacheable or directly-addressed (scratchpad) data 1 , as shown in Figure 1. This is important when remote scratchpad regions are addressed, so that the hardware accesses them remotely, rather than locally caching them.…”
Section: Memory Access Semantics: Cache Scratchpad Communicationmentioning
confidence: 99%
“…In this work, we analyze the performance of such accelerators in a heterogeneous multicore processor with specialized workers -the SARC architecture [16]. Moreover, we consider critical parameters such as the available memory bandwidth and the memory latency.…”
Section: Introductionmentioning
confidence: 99%