2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation 2010
DOI: 10.1109/icsamos.2010.5642060
|View full text |Cite
|
Sign up to set email alerts
|

Interleaving granularity on high bandwidth memory architecture for CMPs

Abstract: Abstract-Memory bandwidth has always been a critical factor for the performance of many data intensive applications. The increasing processor performance, and the advert of single chip multiprocessors have increased the memory bandwidth demands beyond what a single commodity memory device can provide. The immediate solution is to use more than one memory device, and interleave data across them so they can be used in parallel as if they were a single device of higher bandwidth.In this paper we showed that fine-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 24 publications
(24 reference statements)
0
7
0
Order By: Relevance
“…There are many arbiters belonging to the class of LR servers, such as TDM; Round-Robin and its variants Weighted Round-Robin (WRR) [Katevenis et al 1991] and Deficit Round-Robin (DRR) [Shreedhar and Varghese 1996]; and priority-based arbiters with a rate regulator, such as Credit-Controlled Static Priority (CCSP) [Akesson et al 2008] and Priority Based Scheduler (PBS) [Steine et al 2009]. The LR abstraction enables modeling of many different arbiters and is compatible with a variety of formal analysis frameworks, such as dataflow analysis [Sriram and Bhattacharyya 2000] or network calculus [Cruz 1991]. …”
Section: Lr Serversmentioning
confidence: 98%
See 2 more Smart Citations
“…There are many arbiters belonging to the class of LR servers, such as TDM; Round-Robin and its variants Weighted Round-Robin (WRR) [Katevenis et al 1991] and Deficit Round-Robin (DRR) [Shreedhar and Varghese 1996]; and priority-based arbiters with a rate regulator, such as Credit-Controlled Static Priority (CCSP) [Akesson et al 2008] and Priority Based Scheduler (PBS) [Steine et al 2009]. The LR abstraction enables modeling of many different arbiters and is compatible with a variety of formal analysis frameworks, such as dataflow analysis [Sriram and Bhattacharyya 2000] or network calculus [Cruz 1991]. …”
Section: Lr Serversmentioning
confidence: 98%
“…Experimental Setup. The experimental setup consists of the optimization problem model implemented in the CPLEX optimization tool [CPLEX 2014]; implementation of our proposed heuristic, the First-fit and Interleave-all algorithms in C++, for a TDM arbiter; and a synthetic use-case generator. For a fair comparison with the heuristic, the First-fit and Interleave-all algorithms are also run with different TDM frame sizes to determine the optimal frame size with the lowest overallocation of rate (considering discretization of rate) and which satisfies the condition that the sum of rates allocated to all requestors in each channel is less than or equal to one.…”
Section: Optimal Heuristic and Existing Mapping Algorithms: Performmentioning
confidence: 99%
See 1 more Smart Citation
“…In a 64-core CMP, each group of 8 cores has access to main memory via a dedicated 58:5 memory controller, whereas in a 256-core CMP, each group of 16 cores has a dedicated memory controller. We have considered memory interleaving in our architecture and adapted its specific implementation from prior work [Cabarcas et al 2010]. A node (N) is defined as an entity consisting of 1 and 4 cores for the 64-core and 256-core CMPs, respectively.…”
Section: Ultranoc Architecture and Terminologymentioning
confidence: 99%
“…Parallel applications are usually very sensitive to synchronization latency and, therefore, hardware mechanisms are critical for CMPs; Castell is not an exception, as it is shown in Chapter 7. For an architecture with hundreds of cores, the accesses to shared resources can become a bottleneck if the synchronization mechanism is slow.…”
Section: Synchronization Modulementioning
confidence: 99%