Performance Estimation of GPUs with Cache

Parakh, Arun; Balakrishnan, M.; Paul, Kolin

doi:10.1109/ipdpsw.2012.328

Cited by 18 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another cache model [15] is part of a complete GPU model, but assumes hit and miss rates to be known. Furthermore, other work has used reuse distance to analyse non-GPU multi-core and many-core workloads [6,17,18].…”

Section: Related Workmentioning

confidence: 99%

“…The figure confirms the hypothesis, as the number of varied set bits (final row) corresponds to the number of bits included in the hashing function counting from the log 2 of the stride. For example, with a stride of 2 12 and 128 loads, bits 12-18 are included, of which only 4 bits (13,14,15,17) are used in the computation of the set index. …”

Section: Associativity Micro-benchmarkmentioning

confidence: 99%

See 1 more Smart Citation

A detailed GPU cache model based on reuse distance theory

Nugteren

Braak

Corporaal

et al. 2014

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Associativity Micro-benchmarkmentioning

confidence: 99%

A detailed GPU cache model based on reuse distance theory

Nugteren

Braak

Corporaal

et al. 2014

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

View full text Add to dashboard Cite

show abstract

“…The applications are launched without considering any adaptation to architectural features of new GPUs. Its consequence is that the cache unaware data accesses generate a large number of cache misses [16]. We can expect much better performance if the applications and MR frameworks are tuned according to the principle of locality.…”

Section: Code Restructuringmentioning

confidence: 99%

“…These counts are taken from an in-house developed cache simulator and address trace generator. This cache simulator [16] has already been verified with DineroVI.…”

Section: Code Restructuringmentioning

confidence: 99%

Performance Enhancemnt of Map-Reduce Framework on GPU

Parakh

Balakrishnan²,

Paul³

2013

Artificial Intelligence and Applications / 794: Modelling, Identification and Control / 795: Parallel and Distributed Computing

Self Cite

View full text Add to dashboard Cite

The objective of this work is to get benefit of advancement in GPU technologies in the state of art software framework. We have analyzed the existing map-reduce (MR) framework and modify the same for new GPU architectures. We have identified some significant possibilities for improvement. These improvements are mainly in the context of the different GPU architectures, which were introduced after the development of the MR framework. Our experiments show an average of 2.5x speedup of MR framework on these architectures. Cache reconfiguration is also investigated in this work. We have achieved performance benefit ranging from 10% to 200% for various cache sizes. Based on the above analysis, three techniques have been developed for the performance enhancement of MR framework. First, we exploited the concept of principle of locality by code restructure. We have saved over 32% cache miss per thread. Second, we have reduced the number of comparisons per thread in group phase. Our optimized group phase gives an average of 1.5x speed up. In third optimization, we have performed delayed writing during mapperCount function and make this function as cache sensitive. This reduces significant cache misses and improves the execution time by 10% to 25% for this function.

show abstract

“…A computing system comprises of a conventional CPU (Host) and at least one GPU (Device). The GPUs are massively parallel coprocessors/accelerators furnished with an extensive number of arithmetic execution units [7]. A CUDA source code comprises of various stages that are executed either on the CPU (Host) or a GPU (device).…”

Section: Introductionmentioning

confidence: 99%

Accelerating Training of Deep Neural Networks on GPU using CUDA

Rao¹,

Ramana²

2019

IJISA

View full text Add to dashboard Cite

The development of fast and efficient training algorithms for Deep Neural Networks has been a subject of interest over the past few years because the biggest drawback of Deep Neural Networks is enormous cost in computation and large time is consumed to train the parameters of Deep Neural Networks. This aspect motivated several researchers to focus on recent advancements of hardware architectures and parallel programming models and paradigms for accelerating the training of Deep Neural Networks. We revisited the concepts and mechanisms of typical Deep Neural Network training algorithms such as Backpropagation Algorithm and Boltzmann Machine Algorithm and observed that the matrix multiplication constitutes major portion of the work-load for the Deep Neural Network training process because it is carried out for a huge number of times during the training of Deep Neural Networks. With the advent of many-core GPU technologies, a matrix multiplication can be done very efficiently in parallel and this helps a lot training a Deep Neural Network not consuming time as it used to be a few years ago. CUDA is one of the high performance parallel programming models to exploit the capabilities of modern many-core GPU systems. In this paper, we propose to modify Backpropagation Algorithm and Boltzmann Machine Algorithm with CUDA parallel matrix multiplication and test on many-core GPU system. Finally we discover that the planned strategies achieve very quick training of Deep Neural Networks than classic strategies.

show abstract

Performance Estimation of GPUs with Cache

Cited by 18 publications

References 8 publications

A detailed GPU cache model based on reuse distance theory

A detailed GPU cache model based on reuse distance theory

Performance Enhancemnt of Map-Reduce Framework on GPU

Accelerating Training of Deep Neural Networks on GPU using CUDA

Contact Info

Product

Resources

About