2014 IEEE 28th International Parallel and Distributed Processing Symposium 2014
DOI: 10.1109/ipdps.2014.53
|View full text |Cite
|
Sign up to set email alerts
|

TBPoint: Reducing Simulation Time for Large-Scale GPGPU Kernels

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(10 citation statements)
references
References 12 publications
0
10
0
Order By: Relevance
“…Simulating an entire GPU application on a cycle-level simulator [32], [33] is often impractical and this is even more true for long-running SQNN training applications. To aid in successful simulation of long-running applications, prior works have attempted to identify representative regions within applications and porting them to simulators for CPUs [4], [18], [34]- [36] and GPUs [37], [38].…”
Section: A Enabling Network-level Simulation For Sqnnsmentioning
confidence: 99%
“…Simulating an entire GPU application on a cycle-level simulator [32], [33] is often impractical and this is even more true for long-running SQNN training applications. To aid in successful simulation of long-running applications, prior works have attempted to identify representative regions within applications and porting them to simulators for CPUs [4], [18], [34]- [36] and GPUs [37], [38].…”
Section: A Enabling Network-level Simulation For Sqnnsmentioning
confidence: 99%
“…Existing solutions in the CPU space sample randomly [35], periodically [16], [17], or based on application phase behavior [18]. TBPoint [40] very recently proposes sampling-in-time for GPGPU workloads. Although TBPoint achieves high accuracy while simulating 10 to 20 percent of the total kernel execution time, sampling workloads with high control/memory divergence behavior remains challenging.…”
Section: Revisiting Cpu Simulation Acceleration Techniques For Gpgpumentioning
confidence: 99%
“…Recently, Huang et al accelerate GPGPU architecture simulation by sampling thread blocks [40] using TBPoint. Sampling thread blocks is a good idea since CUDA encourages programmers to write programs with little communication between thread blocks.…”
Section: Related Workmentioning
confidence: 99%
“…We also validated our approach on a simulator and real hardware, whereas they only validated on a simulator. Huang et al [2014] use sampling technique to speed up GPU architecture simulation for CUDA applications where they achieve up to 10× speedup, whereas we achieve up to 7,284× speedup. Similarly, Lee and Ro [2013] parallelize the GPU architecture simulation, where they gain up to 4.15× speedup.…”
Section: Related Workmentioning
confidence: 99%