2004
DOI: 10.1007/978-3-540-24688-6_58
|View full text |Cite
|
Sign up to set email alerts
|

A Tool Suite for Simulation Based Analysis of Memory Access Behavior

Abstract: Abstract. In this paper, two tools are presented: an execution driven cache simulator which relates event metrics to a dynamically built-up call-graph, and a graphical front end able to visualize the generated data in various ways. To get a general purpose, easy-to-use tool suite, the simulation approach allows us to take advantage of runtime instrumentation, i.e. no preparation of application code is needed, and enables for sophisticated preprocessing of the data already in the simulation phase. In an ongoing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
0

Year Published

2005
2005
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 101 publications
(63 citation statements)
references
References 18 publications
0
63
0
Order By: Relevance
“…3. The callgraph is largely similar to the callgraphs given by other tools, such as callgrind [11], with the exception that the nodes are not only functions but also OpenMP constructs and user-defined regions, and the (runtime) nesting of those constructs is shown in the callgraph view. The callgraph that ompP records is the union of the callgraph of each thread.…”
Section: ] -----------------------------------------------mentioning
confidence: 92%
“…3. The callgraph is largely similar to the callgraphs given by other tools, such as callgrind [11], with the exception that the nodes are not only functions but also OpenMP constructs and user-defined regions, and the (runtime) nesting of those constructs is shown in the callgraph view. The callgraph that ompP records is the union of the callgraph of each thread.…”
Section: ] -----------------------------------------------mentioning
confidence: 92%
“…Cache simulations have been performed using Callgrind, [63] and for a density of ρσ 3 = 0.5, approximately half of the cache misses result from accesses to the contents of the neighbor lists.The remaining cache misses are associated with accesses to particle and event data.…”
Section: Benchmarkingmentioning
confidence: 99%
“…The analysis has been made using a variety of tools including igProf [2] , callgrind [3] [4] and AMD CodeAnalyst. For example, we improved by a factor 3 to 10, depending on the length and complexity of the class name, the performance of the TTree::SetBranchAddress and TTree::SetAddress routines.…”
Section: Performance Enhancementmentioning
confidence: 99%