The processor-memory bottleneck

Mahapatra, Nihar R.; Venkatrao, B.V.

doi:10.1145/357783.331677

Cited by 81 publications

(42 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Bild 2. Während das Rechenvermögen der Prozessoren in den letzten Jahren immer mehr angestiegen ist, konnte die Zugriffszeit auf den Arbeitsspeicher nur um durchschnittlich etwa 10 % gesteigert werden [10]. In der Folge ergibt sich insbesondere bei der massiven Anordnung von arithmetischen Recheneinheiten ein Missverhältnis zwischen der eigentlichen Berechnung (Multiplikation, Addition etc.)…”

Section: Modellierung Von Verbrennungsprozessenunclassified

Anwendung von massiv paralleler Berechnung mit Grafikkarten (GPGPU) für CFD‐Methoden im Brandschutz

Belaschk

Münch

2009

Bauphysik

View full text Add to dashboard Cite

Application of general-purpose computing on graphics processing units (GPGPU) in CFD techniques for fire safety simulations.The use of fire simulation programs based on computational fluid dynamics (CFD) techniques is becoming more and more widespread in practice. The increase in available computing power enables the effects of possible fire scenarios to be modelled in order to derive useful information for practical applications (e.g. analysis of the reliability of fire protection concepts). However, despite the progress in computing power the performance of currently available computers is inadequate for simulating a building fire including all relevant physical and chemical processes with maximum accuracy. The models for calculating the spread of fire and smoke implemented in the computer programs therefore always represent a compromise between practical computing efficiency and level of modelling detail. This paper illustrates the reasons for the high computing power demand of CFD techniques and describes potential problems and sources of error resulting from simplifications applied in the models. In addition, the paper presents a new technology approach that significantly increases the computing power of a PC using special software and standard 3D graphics cards. The Fire Dynamics Simulator (FDS) is used as an example to demonstrate how the required calculation time for a fire simulation on a PC can be reduced by a factor of 20 and more.

show abstract

Section: Modellierung Von Verbrennungsprozessenunclassified

Anwendung von massiv paralleler Berechnung mit Grafikkarten (GPGPU) für CFD‐Methoden im Brandschutz

Belaschk

Münch

2009

Bauphysik

View full text Add to dashboard Cite

show abstract

“…A major problem that hinders the reuse of kernels is their performance sensitivity to the diverse memory layout requirements of the underlying hardware. The root of the problem is the constantly increasing disparity between DRAM and processor speeds [14], which compels modern memory system designers to employ wider DRAM bursts and a high degree of memory interleaving to create sufficient bandwidth to supply operands to the numerous processing elements.…”

Section: Introductionmentioning

confidence: 99%

DL: A data layout transformation system for heterogeneous computing

Sung

Liu

Hwu

2012

2012 Innovative Parallel Computing (InPar)

View full text Add to dashboard Cite

For many-core architectures like the GPUs, efficient off-chip memory access is crucial to high performance; the applications are often limited by off-chip memory bandwidth. Transforming data layout is an effective way to reshape the access patterns to improve off-chip memory access behavior, but several challenges had limited the use of automated data layout transformation systems on GPUs, namely how to efficiently handle arrays of aggregates, and transparently marshal data between layouts required by different performance sensitive kernels and legacy host code. While GPUs have higher memory bandwidth and are natural candidates for marshaling data between layouts, the relatively constrained GPU memory capacity, compared to that of the CPU, implies that not only the temporal cost of marshaling but also the spatial overhead must be considered for any practical layout transformation systems. This paper presents DL, a practical GPU data layout transformation system that addresses these problems: first, a novel approach to laying out array of aggregate types across GPU and CPU architectures is proposed to further improve memory parallelism and kernel performance beyond what is achieved by human programmers using discrete arrays today. Our proposed new layout can be derived in situ from the traditional Array of Structure, Structure of Arrays, and adjacent Discrete Arrays layouts used by programmers. Second, DL has a run-time library implemented in OpenCL that transparently and efficiently converts, or marshals, data to accommodate application components that have different data layout requirements. We present insights that lead to the design of this highly efficient run-time marshaling library. In particular, the in situ transformation implemented in the library is comparable or faster than optimized traditional out-of-place transformations while avoiding doubling the GPU DRAM usage. Third, we show experimental results that the new layout approach leads to substantial performance improvement at the applications level even when all marshaling cost is taken into account.

show abstract

“…O ganho de desempenho de um sistemaé expressivo quando há uso de cache (Mahapatra, 1999). Entretanto, sua presença pode acabar prejudicando a previsibilidade e causar um aumento do consumo de energia do sistema.…”

Section: Justificativaunclassified

Otimização de memória cache em tempo de execução para o processador embarcado LEON3

Cuminato¹

View full text Add to dashboard Cite

The processor-memory bottleneck

Cited by 81 publications

References 6 publications

Anwendung von massiv paralleler Berechnung mit Grafikkarten (GPGPU) für CFD‐Methoden im Brandschutz

Anwendung von massiv paralleler Berechnung mit Grafikkarten (GPGPU) für CFD‐Methoden im Brandschutz

DL: A data layout transformation system for heterogeneous computing

Otimização de memória cache em tempo de execução para o processador embarcado LEON3

Contact Info

Product

Resources

About