Alchemist: A Transparent Dependence Distance Profiling Infrastructure

Zhang, Xiangyu; Navabi, Armand; Jagannathan, Suresh

doi:10.1109/cgo.2009.15

Cited by 67 publications

(31 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Alchemist [13] is a research parallelism discovery tool developed at Purdue University. It is built on top of Valgrind [14], an instrumentation framework for building dynamic analysis tools, to discover parallelism and issue corresponding recommendations.…”

Section: Alchemistmentioning

confidence: 99%

“…Tools for discovering parallelism [10,57,13,62,15,63] analyzes data dependences to identify the most promising parallelization opportunities. Runtime scheduling frameworks [64,65,66,67] analyzes data dependences to add more parallelism to programs by dispatching code sections in a more effective way.…”

Section: Data-dependence Analysismentioning

confidence: 99%

“…Instead of pair-wise dependences, Kemlin records only the length of the critical path. Alchemist [13], a tool that estimates the effectiveness of parallelizing program regions by asynchronously executing certain language constructs, profiles dependence distance instead of detailed dependences. Although these approaches profile data dependences with low overhead, the underlying profiling technique has difficulty in supporting other program analyses.…”

Section: Dynamic Approachesmentioning

confidence: 99%

See 2 more Smart Citations

Discovery of Potential Parallelism in Sequential Programs

Jannesari

Wolf

2013

2013 42nd International Conference on Parallel Processing

View full text Add to dashboard Cite

In the era of multicore processors, the responsibility for performance gains has been shifted onto software developers. Once improvements of the sequential algorithm have been exhausted, software-managed parallelism is the only option left. However, writing parallel code is still difficult, especially when parallelizing sequential code written by someone else. A key task in this process is the identification of suitable parallelization targets in the source code. Parallelism discovery tools help developers to find such targets automatically. Unfortunately, tools that identify parallelism during compilation are usually conservative due to the lack of runtime information, and tools relying on runtime information primarily suffer from high overhead in terms of both time and memory. This dissertation presents a generic framework for parallelism discovery based on dynamic program analysis, supporting various types of parallelism while incurring practically affordable overhead. The framework contains two main components: an efficient data-dependence profiler and a set of parallelism discovery algorithms based on a language-independent concept called Computational Unit.The data-dependence profiler serves as the foundation of the parallelism discovery framework. Traditional dependence profiling approaches introduce a tremendous amount of time and memory overhead. To lower the overhead, current methods limit their scope to the subset of the dependence information needed for the analysis they have been created for, sacrificing generality and discouraging reuse. In contrast, the profiler shown in this thesis addresses the problem via signature-based memory management and a lock-free parallel design. It produces detailed dependences not only for sequential but also for multi-threaded code without causing prohibitive overhead, allowing it to serve as a generic base for various program analysis techniques.Computational Units (CUs) provide a language-independent foundation for parallelism discovery. CUs are computations that follow the read-compute-write pattern. Unlike other concepts, they are not restricted to predefined language constructs. A program is represented as a CU graph, in which vertexes are CUs and edges are data dependences. This allows parallelism to be detected that spreads across multiple language constructs, taking code refactoring into consideration. The parallelism discovery algorithms cover both loop and task parallelism.Results of our experiments show that 1) the efficient data-dependence profiler has a very competitive average slowdown of around 80× with accuracy higher than 99.6%; 2) the frame- I would also like to thank my master students and student assistants for their coding support.Wolfram Gottschlich implemented the original version of the signature described in this thesis.Tuan Dung Nguyen implemented the lock-free parallel version of the DiscoPoP profiler. MichaelBeaumont tested the memory skipping technique. Daniel Fried implemented the method of characterizing DOALL loops using machin...

show abstract

Section: Alchemistmentioning

confidence: 99%

Section: Data-dependence Analysismentioning

confidence: 99%

Section: Dynamic Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Discovery of Potential Parallelism in Sequential Programs

Jannesari

Wolf

2013

2013 42nd International Conference on Parallel Processing

View full text Add to dashboard Cite

show abstract

“…SD 3 shows a 70× slowdown on average. Alchemist [16] is designed to identify dependences across loop iterations, loop boundaries and methods. It can be used offline by speculative systems, as it provides a very precise dependence analysis, analyzing complex data.…”

Section: Related Workmentioning

confidence: 99%

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Jimborean¹,

Clauss²,

Martinez³

et al. 2013

Euro-Par 2013 Parallel Processing

View full text Add to dashboard Cite

Abstract. We present a dynamic dependence analyzer whose goal is to compute dependences from instrumented execution samples of loop nests. The resulting information serves as a prediction of the execution behavior during the remaining iterations and can be used to select and apply a speculatively optimizing and parallelizing polyhedral transformation of the target sequential loop nest. Thus, a parallel lock-free version can be generated which should not induce any rollback if the prediction is correct. The dependence analyzer computes distance vectors and linear functions interpolating the memory addresses accessed by each memory instruction, and the values of some scalars. Phases showing a changing memory behavior are detected thanks to a dynamic adjustment of the instrumentation frequency. The dependence analyzer takes part of a whole framework dedicated to speculative parallelization of loop nests which has been implemented with extensions of the LLVM compiler and an x86-64 runtime system.

show abstract

“…Stack walking is too expensive when profile information is generated at a high frequency. Context sensitive optimizations [21,7] often specify how programs should behave in various contexts to achieve efficiency. For example, region-based memory management [7] tries to cluster memory allocations into large chunks, called regions, so that they can be explicitly managed; context sensitive region-based memory management specifies in which region an allocation should be performed under various contexts.…”

Section: Introductionmentioning

confidence: 99%

Precise calling context encoding

Sumner

Zheng

Weeratunge

et al. 2010

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1

Self Cite

View full text Add to dashboard Cite

Calling contexts are very important for a wide range of applications such as profiling, debugging, and event logging. Most applications perform expensive stack walking to recover contexts. The resulting contexts are often explicitly represented as a sequence of call sites and hence bulky. We propose a technique to encode the current calling context of any point during an execution. In particular, an acyclic call path is encoded into one number through only integer additions. Recursive call paths are divided into acyclic subsequences and encoded independently. We leverage stack depth in a safe way to optimize encoding: if a calling context can be safely and uniquely identified by its stack depth, we do not perform encoding. We propose an algorithm to seamlessly fuse encoding and stack depth based identification. The algorithm is safe because different contexts are guaranteed to have different IDs. It also ensures contexts can be faithfully decoded. Our experiments show that our technique incurs negligible overhead (1.89% on average). For most medium-sized programs, it can encode all contexts with just one number. For large programs, we are able to encode most calling contexts to a few numbers.

show abstract

Alchemist: A Transparent Dependence Distance Profiling Infrastructure

Cited by 67 publications

References 29 publications

Discovery of Potential Parallelism in Sequential Programs

Discovery of Potential Parallelism in Sequential Programs

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Precise calling context encoding

Contact Info

Product

Resources

About