Insights on memory controller scaling in multi-core embedded systems

Marino, Mario Donato; Li, Kuan‐Ching

doi:10.1504/ijes.2014.065000

Cited by 9 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Revisiting the operating system (OS) concept of address space used by Marino and Li's report [15], in RAM ON the novel concept of region is defined as an address space range dedicated to different sets of cores (CPU, GPU, or both), caches and respective interconnection. The inclusion of the two latter elements differentiate RAM ON from Non Uniform Memory Access (NUMAnode) mechanism in Linux OS.…”

mentioning

confidence: 99%

“…• An evaluation on system implications for larger numbers of MCs in heterogeneous regions via using opticaland RF-based interfaces (signal modulation) rather than traditional digital transmission (where, to transmit a "0" or a "1", the whole line should be entirely set to the respective level) developed in [15].…”

mentioning

confidence: 99%

“…• The methodology utilized to determine the performance when CPUs and GPUs are combined is an improved version over the proposed in [15]. The methodology consists of running each simulator (CPU and GPU) independently in regions that contain the maximum number of MCs allocated to each one.…”

mentioning

confidence: 99%

See 2 more Smart Citations

RAMON: Region-Aware Memory Controller

Marino

2018

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

Abstract-Recent implementations of heterogeneous multicore systems (CPU, GPU and hybrid) address the issue of communication latency between CPU and GPU memory systems by merging these two, so that they can share the same memory address space. In recent years, the combination of the escalation in the number of cores with the rise in memoryintensive applications has significantly increased bandwidth needs in both homogeneous and heterogeneous systems. Since tasks assigned to CPU and/or GPU cores will have different bandwidth demands, a two-tier memory system is needed. Hence in this paper, RAM ON is proposed as a configurable memory system where different address space regions are able to be dedicated to a different number of memory controllers (MCs), concurrently to supply different amounts of bandwidth to a different number of cores, providing different levels of memory parallelism. By having different address space regions -simply regions, each with a different number of MCs to match its bandwidth needs, memory interference per region is reduced. Our findings show that RAM ON is promising and improves bandwidth by a factor of 9x for CPU regions, 14.1x for GPU regions, and 4.5x for combined heterogeneous regions.

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

RAMON: Region-Aware Memory Controller

Marino

2018

IEEE Trans. VLSI Syst.

View full text Add to dashboard Cite

show abstract

“…In addition, methods that combine DVFS [22] to shallower transaction queues under different memory traffic intensities and applications are likely to be considered. Furthermore, heterogeneous systems [30] offer interesting opportunities to evaluate transaction queue size reduction.…”

Section: Conclusion and Future Plansmentioning

confidence: 99%

Implications of shallower memory controller transaction queues in scalable memory systems

Marino¹,

2015

J Supercomput

View full text Add to dashboard Cite

Scalable memory systems provide scalable bandwidth to the core growth demands in multicores and embedded systems processors. In these systems, as memory controllers (MCs) are scaled, memory traffic per MC is reduced, so transaction queues become shallower. As a consequence, there is an opportunity to explore transaction queue utilization and its impact on energy utilization. In this paper, we propose to evaluate the performance and energy-per-bit impact when reducing transaction queue sizes along with the MCs of these systems. Experimental results show that reducing 50% on the number of entries, bandwidth and energy-per-bit levels are not affected, whilst reducing aggressively of about 90%, bandwidth is similarly reduced while causing significantly higher energy-per-bit utilization.

show abstract

“…For example, assembly of human genome using PASHA software took around 21 hours on a 8-core workstation with 72 GB memory (Liu et al, 2011). Hardware accelerators like GPU and FPGAs are used along with processors to reduce this execution time by running the program on multiple computation units in parallel (Lin and Lin, 2014;Okuyama et al, 2012;Halstead et al, 2014;Marino and Li, 2014). Some bioinformatics applications along with assembly programs have been accelerated by FPGA-based accelerators.…”

Section: Introductionmentioning

confidence: 99%

Hardware acceleration of de novo genome assembly

Varma¹,

Paul

Balakrishnan

et al. 2017

IJES

View full text Add to dashboard Cite

Abstract:The cost of genome assembly has gone down drastically with the advent of next generation sequencing technologies. These new sequencing technologies produce large amounts of DNA fragments. Software programs are used to construct the genome from these DNA fragments. The assembly programs take significant amount of time to execute. To reduce the execution time, these programs are being parallelised to take advantage of many cores available in present day processor chips. Further, hardware accelerators have been developed which when used along with processors speed up the execution. Velvet is a commonly used software for de novo assembly. We propose a novel method to reduce the overall time of assembly by using FPGAs. In this method, we perform pre-processing of these short reads on FPGAs and process the output using Velvet to reduce the overall time for assembly. We show that using our technique we can get significant speed-ups.

show abstract

Insights on memory controller scaling in multi-core embedded systems

Cited by 9 publications

References 16 publications

RAMON: Region-Aware Memory Controller

RAMON: Region-Aware Memory Controller

Implications of shallower memory controller transaction queues in scalable memory systems

Hardware acceleration of de novo genome assembly

Contact Info

Product

Resources

About