Language and compiler support for auto-tuning variable-accuracy algorithms

Ansel, Jason; Wong, Yee Lok; Chan, Cy; Olszewski, Marek; Edelman, Alan; Amarasinghe, Saman

doi:10.1109/cgo.2011.5764677

Cited by 82 publications

(100 citation statements)

References 28 publications

Supporting

Mentioning

100

Contrasting

Order By: Relevance

“…The autotuner must produce an algorithm which meets a given accuracy target. These variable accuracy features are described in more detail in [4].…”

Section: Singular Value Decomposition (Svd)mentioning

confidence: 99%

Portable performance on heterogeneous architectures

et al. 2013

Self Cite

View full text Add to dashboard Cite

Trends in both consumer and high performance computing are bringing not only more cores, but also increased heterogeneity among the computational resources within a single machine. In many machines, one of the greatest computational resources is now their graphics coprocessors (GPUs), not just their primary CPUs. But GPU programming and memory models differ dramatically from conventional CPUs, and the relative performance characteristics of the different processors vary widely between machines. Different processors within a system often perform best with different algorithms and memory usage patterns, and achieving the best overall performance may require mapping portions of programs across all types of resources in the machine.To address the problem of efficiently programming machines with increasingly heterogeneous computational resources, we propose a programming model in which the best mapping of programs to processors and memories is determined empirically. Programs define choices in how their individual algorithms may work, and the compiler generates further choices in how they can map to CPU and GPU processors and memory systems. These choices are given to an empirical autotuning framework that allows the space of possible implementations to be searched at installation time. The rich choice space allows the autotuner to construct polyalgorithms that combine many different algorithmic techniques, using both the CPU and the GPU, to obtain better performance than any one technique alone. Experimental results show that algorithmic changes, and the varied use of both CPUs and GPUs, are necessary to obtain up to a 16.5x speedup over using a single program configuration for all architectures.

show abstract

“…The autotuner must produce an algorithm which meets a given accuracy target. These variable accuracy features are described in more detail in [4].…”

Section: Singular Value Decomposition (Svd)mentioning

confidence: 99%

Portable performance on heterogeneous architectures

et al. 2013

Self Cite

View full text Add to dashboard Cite

show abstract

“…The autotuner must then consider a two dimensional objective space, where its first objective is to meet the accuracy target (with a given level of confidence) and the second objective is to maximize performance. A detailed description of the variable accuracy features of PetaBricks is given in [5]. Figure 3 describes the usage of our system for input sensitive algorithm design.…”

Section: Variable Accuracymentioning

confidence: 99%

Autotuning algorithmic choice for input sensitivity

et al. 2015

Self Cite

View full text Add to dashboard Cite

A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations.

show abstract

“…Function recalculateIndex is shown in lines 35-46. Next, we add this incoming object x into the active container Γ i (lines [4][5][6][7][8][9][10][11][12][13][14][15]. If the element at newIndex is an abstract element (line 6), this abstraction is split into two separate abstractions (line 9) and then x is inserted between them (line 10).…”

Section: List Combomentioning

confidence: 99%

“…After inserting a concrete element o into the list (line 5), method add$CoCo adds an abstraction of o into all inactive lists (lines [6][7][8]. Once an abstraction is found in a retrieval (lines [12][13][14][15][16][17], it is concretized to get an array of concrete elements it represents (line 14), which are then inserted into the list to replace this abstraction.…”

Section: Fig 4 An Abstraction-concretization Example In Linkedlistmentioning

confidence: 99%

CoCo: Sound and Adaptive Replacement of Java Collections

2013

ECOOP 2013 – Object-Oriented Programming

View full text Add to dashboard Cite

Abstract. Inefficient use of Java containers is an important source of run-time inefficiencies in large applications. This paper presents an application-level dynamic optimization technique called CoCo, that exploits algorithmic advantages of Java collections to improve performance. CoCo dynamically identifies optimal Java collection objects and safely performs run-time collection replacement, both using pure Java code. At the heart of this technique is a framework that abstracts container elements to achieve efficiency and that concretizes abstractions to achieve soundness. We have implemented part of the Java collection framework as instances of this framework, and developed a static CoCo compiler to generate Java code that performs optimizations. This work is the first step towards achieving the ultimate goal of automatically optimizing away semantic inefficiencies.

show abstract

Language and compiler support for auto-tuning variable-accuracy algorithms

Cited by 82 publications

References 28 publications

Portable performance on heterogeneous architectures

Portable performance on heterogeneous architectures

Autotuning algorithmic choice for input sensitivity

CoCo: Sound and Adaptive Replacement of Java Collections

Contact Info

Product

Resources

About