OpenTuner

Ansel, Jason; Kamil, Shoaib; Veeramachaneni, Kalyan; Ragan-Kelley, Jonathan; Bosboom, Jeffrey; O’Reilly, Una-May; Amarasinghe, Saman

doi:10.1145/2628071.2628092

Cited by 391 publications

(50 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Enumerative solvers often rely on factoring the search space, aggressive pruning and lattice search. Factoring has been very successful for programming by example [8,10,17], and lattice search has been used in synchronization of concurrent data structures [23] and autotuning [2]. However, both factoring and lattice search require significant domain knowledge, so they are unsuitable for a general purpose system like Sketch.…”

Section: Related Workmentioning

confidence: 99%

Adaptive Concretization for Parallel Program Synthesis

Jeon

Qiu

Solar-Lezama

et al. 2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Program synthesis tools work by searching for an implementation that satisfies a given specification. Two popular search strategies are symbolic search, which reduces synthesis to a formula passed to a SAT solver, and explicit search, which uses brute force or random search to find a solution. In this paper, we propose adaptive concretization, a novel synthesis algorithm that combines the best of symbolic and explicit search. Our algorithm works by partially concretizing a randomly chosen, but likely highly influential, subset of the unknowns to be synthesized. Adaptive concretization uses an online search process to find the optimal size of the concretized subset using a combination of exponential hill climbing and binary search, employing a statistical test to determine when one degree of concretization is sufficiently better than another. Moreover, our algorithm lends itself to a highly parallel implementation, further speeding up search. We implemented adaptive concretization for Sketch and evaluated it on a range of benchmarks. We found adaptive concretization is very effective, outperforming Sketch in many cases, sometimes significantly, and has good parallel scalability.

show abstract

Section: Related Workmentioning

confidence: 99%

Adaptive Concretization for Parallel Program Synthesis

Jeon

Qiu

Solar-Lezama

et al. 2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…However, the challenges outlined in section 1.1 do not apply to the Hexagon family (which has an LLVM compiler, a single address space and data caching). Ansel et al [2014] and Mullapudi et al [2016] have demonstrated that automatic scheduling is possible for Halide through heuristic searches or model-based analysis, respectively. These approaches can likely be applied to DSPs as well, which would further reduce development times and increase portability.…”

Section: Related Halide Workmentioning

confidence: 99%

Extending Halide to Improve Software Development for Imaging DSPs

Vocke

Corporaal

Jordans

et al. 2017

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

MASTER Extending halide to improve software development for imaging DSPsVocke, S. Award date: 2016Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.• You may not further distribute the material or use it for any profit-making activity or commercial gain Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Specialized Digital Signal Processors (DSPs), which can be found in a wide range of modern devices, play an important role in power-efficient, high-performance image processing. Applications including camera sensor post-processing and computer vision can benefit from being (partially) mapped onto such DSPs. However, due to their specialized instruction sets and dependence on low-level code optimization, developing applications for DSPs is often more time-consuming and error-prone than for general-purpose processors. Halide is a domain-specific language (DSL) which enables low-effort development of portable, highperformance imaging pipelines -a combination of qualities which is currently hard, if not impossible to find among DSP programming models. I propose a set of extensions and modifications to Halide in order to support DSPs in combination with arbitrary C compilers, including a template solution to support diverse target instruction sets and heterogeneous scratchpad memories. Using a commercial Intel DSP, I demonstrate that this solution can be used to achieve performance comparable to tuned C code, while leading to a reduction in development time and code complexity. The results also show that DSPs are attractive alternatives to CPUs and GPUs for power-and area-efficient image processing using Halide. Categories and Subject Descriptors: [] ACM Reference Format:

show abstract

“…While the native method is usually called in place of the Python function provided by the programmer, the established best practice is to also provide a pure-Python implementation of each DSL, so that if the programmer ventures outside the subset of Python supported by an embedded DSL or runs the application on a platform without SEJITS installed, the application still executes as legal Python (albeit orders of magnitude more slowly, which is often fine for exploratory work on small problem sizes). The SEJITS framework provides the facilities for managing JIT compilation, caching the generated code for future calls, interfacing with autotuners such as OpenTuner [2], and so on.…”

Section: Background: Sejitsmentioning

confidence: 99%

“…The SEJITS framework is tightly integrated with OpenTuner [2], which allows the programmer to define a tuning harness for each specializer. While several individual specializers are integrated with OpenTuner, extending autotuning to meta-specialization involves much more complex tuning spaces.…”

Section: Autotuningmentioning

confidence: 99%

An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns

Truong

Markley

Fox

2014

Proceedings of the Second Workshop on Optimizing Stencil Computations

View full text Add to dashboard Cite

The SEJITS framework supports creating embedded domainspecific languages (DSELs) and code generators, a pair of which is called a specializer, with much less effort than creating a full DSL compiler-typically just a few hundred lines of code. SE-JITS' main benefit is allowing application writers to stay entirely in high-level languages such as Python by using specialized Python functions (that is, functions written in one of the Python-embedded DSELs) to generate code that runs at native speed. One existing SEJITS DSEL is Sepya [10], a Python DSEL for stencil computations that generates OpenMP and Cilk+ code competitive with existing DSL compilers such as Pochoir and Halide. We extend Sepya to generate OpenCL code for targetting GPUs, and in the process, extend SEJITS with support for meta-specializers, whose job is to enable and optimize the composition of existing specializers written by third parties. In this work, we demonstrate metaspecialization by detecting and removing extraneous data copies to and from the GPU to compose multiple specializer calls (stencil and non-stencil). We also explore the variants of loop fusion to further improve performance of composing these operations. The performance of the generated stencil code is 20× faster SciPy and competitive with existing stencil DSELs on realistic code excerpts. Since meta-specializers must compose and optimize specializers created by third parties, we extend SEJITS with support for metaspecializer hooks, allowing existing specializers to be incrementally enabled for meta-specialization without breaking backwards compatibility. The Sepya and SEJITS extensions together extend the range of platforms for which highly optimized code can be generated and open new possibilities for optimizing the composition of existing specializers.Scientific application authors face a trade off between productivity-being able to code in a natural language whose abstractions match their own problem domains-and high performance. In some simple cases, high-performance libraries bridge this gap (BLAS, OSKI), but for more sophisticated computations such as stencils, different problem instances share common structure and abstractions, but the computations are too different to encapsulate using libraries alone. This observation has led to other approaches such as the use of domain-specific languages (DSLs) for stencils. By focusing on expressing what computation is to be done rather than how to do it, DSL compilers can often apply optimizations specific to both the computation and the target hardware whose applicability would be difficult to infer from imperative code expressing lower-level operations. In particular, embedded DSLs (DSELs) have the additional advantage [7] that programs can take advantage of most or all features of the embedding language, and that complete programs in the embedding language can include "subprograms" in multiple distinct embedded languages.The SEJITS framework (SEJITS.org) provides the infrastructure for creating DSELS embedded in Python. The corres...

show abstract

OpenTuner

Cited by 391 publications

References 28 publications

Adaptive Concretization for Parallel Program Synthesis

Adaptive Concretization for Parallel Program Synthesis

Extending Halide to Improve Software Development for Imaging DSPs

An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns

Contact Info

Product

Resources

About