Experience in the automatic parallelization of four Perfect-Benchmark programs

Eigenmann, Rudolf; Hoeflinger, Jay; Li, Z.; Padua, David

doi:10.1007/bfb0038658

Cited by 56 publications

(39 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other work has shown that current commercial parallelizing compilers perform poorly on real codes 4], and that a compiler can theoretically achieve good speedups for these codes 12,11]. Motivated by this, we examined the Perfect Benchmarks R to determine what symbolic analysis techniques are required to get the observed good speedups.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks

Blume

Eigenmann

1994

1994 International Conference on Parallel Processing (ICPP'94)

View full text Add to dashboard Cite

We have identi ed symbolic analysis techniques that will improve the e ectiveness o f p arallelizing Fortran compilers, with emphasis upon data dependence analysis. We have done this by comparing the automatically and manually parallelized versions of the Perfect Benchmarks R . T h e t e chniques include: symbolic data dependence tests for nonlinear expressions, c onstraint propagation, array summary information, and run time tests.

show abstract

Section: Resultsmentioning

confidence: 99%

“…We h a ve quanti ed and analyzed this situation in previous reports 4]. There have also been studies of potential improvements by manually parallelizing real programs and reporting the necessary transformation techniques 12,11]. Table 1 shows some of these results.…”

Section: Introductionmentioning

confidence: 99%

An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks

Blume

Eigenmann

1994

1994 International Conference on Parallel Processing (ICPP'94)

View full text Add to dashboard Cite

show abstract

“…(In contrast, privatization needs only to recognize privatizable variables by performing data dependence analysis, i.e., it is contingent only on the access pattern and not on the operations.) Since parallel methods are known for performing reductions operations (see, e.g., [12,17,18,20,44]), the difficulty encountered by compilers arises from recognizing the reduction statements. So far this problem has been handled at compile-time by syntactically pattern matching the loop statements with a template of a generic reduction, and then performing a data dependence analysis of the variable under scrutiny to guarantee that it is not used anywhere else in the loop except in the reduction statement [44].…”

Section: Definitionmentioning

confidence: 99%

A scalable method for run-time loop parallelization

Rauchwerger

Amato

Padua³

1995

Int J Parallel Prog

View full text Add to dashboard Cite

Current parallelizing compilers do a reasonable job of extracting parallelism from programs with regular, well behaved, statically analyzable access patterns. However, they cannot extract a significant fraction of the available parallelism if the program has a complex and/or statically insufficiently defined access pattern, e.g., simulation programs with irregular domains and/or dynamically changing interactions. Since such programs represent a large fraction of all applications, techniques are needed for extracting their inherent parallelism at run-time. In this paper we give a new run-time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that the iterations are executed in the correct order. Given the original loop, the compiler generates inspector code that performs run-time preprocessing of the loop's access pattern, and scheduler code that schedules (and executes) the loop iterations. The inspector is fully parallel, uses no synchronization, and can be applied to any loop (from which an inspector can be extracted). In addition, it can implement at run-time the two most effective transformations for increasing the amount of parallelism in a loop: array privatization and reduction parallelization (element-wise). The ability to identify privatizable and reduction variables is very powerful since it eliminates the data dependences involving these variables and thereby potentially increases the overall parallelism of the loop. We also describe a new scheme for constructing an optimal parallel execution schedule for the iterations of the loop. The schedule produced is a partition of the set of iterations into subsets called wavefronts so that there are no data dependences between iterations in a wavefront. Although the wavefronts themselves are constructed one after another, the computation of each wavefront is fully parallel and requires no synchronization. This new method has advantages over all previous run-time techniques for analyzing and scheduling partially parallel loops since none of them has all of these desirable properties.

show abstract

“…P arallelizing compilers have been playing an important role in this quest. The present project has its early roots in a compiler evaluation e ort of the late 80s, where we h a ve found that despite the success on kernel benchmarks, available compilers were not very e ective on large programs EHLP91,BE92]. New measurements on a representative set of real programs were made possible, thanks to the Perfect Benchmarks R e ort, which w as initiated by CSRD, with participation from many other institutions BCK + 89].…”

Section: Introductionmentioning

confidence: 99%

“…Based on these observations, we h a ve hand parallelized the program suite as a major new approach to identifying e ective program transformations EHLP91,EHJP92]. As a result we h a ve found that not only can real applications be parallelized e ectively, but the transformations can also be automated in a parallelizing compiler.…”

Section: Introductionmentioning

confidence: 99%

Polaris: Improving the effectiveness of parallelizing compilers

Blume

Eigenmann

Faigin

et al. 1995

Languages and Compilers for Parallel Computing

View full text Add to dashboard Cite

Abstract. It is the goal of the Polaris project to develop a new parallelizing compiler that will overcome limitations of current compilers. While current parallelizing compilers may succeed on small kernels, they often fail to extract any meaningful parallelism from large applications. After a study of application codes, it was concluded that by adding a few new techniques to current compilers, automatic parallelization becomes possible. The techniques needed are interprocedural analysis, scalar and array privatization, symbolic dependence analysis, and advanced induction and reduction recognition and elimination, along with run-time techniques to allow data dependent behavior.

show abstract

Experience in the automatic parallelization of four Perfect-Benchmark programs

Cited by 56 publications

References 7 publications

An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks

An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks

A scalable method for run-time loop parallelization

Polaris: Improving the effectiveness of parallelizing compilers

Contact Info

Product

Resources

About