Bruce Collie scite author profile

Ginsbach

2019

Fast numerical libraries have been a cornerstone of scientific computing for decades, but this comes at a price. Programs may be tied to vendor specific software ecosystems resulting in polluted, non-portable code. As we enter an era of heterogeneous computing, there is an explosion in the number of accelerator libraries required to harness specialized hardware. We need a system that allows developers to exploit ever-changing accelerator libraries, without over-specializing their code.As we cannot know the behavior of future libraries ahead of time, this paper develops a scheme that assists developers in matching their code to new libraries, without requiring the source code for these libraries.Furthermore, it can recover equivalent code from programs that use existing libraries and automatically port them to new interfaces. It first uses program synthesis to determine the meaning of a library, then maps the synthesized description into generalized constraints which are used to search the program for replacement opportunities to present to the developer.We applied this approach to existing large applications from the scientific computing and deep learning domains. Using our approach, we show speedups ranging from 1.1× to over 10× on end to end performance when using accelerator libraries.

Modeling black-box components with probabilistic synthesis

Woodruff

2020

This paper is concerned with synthesizing programs based on black-box oracles: we are interested in the case where there exists an executable implementation of a component or library, but its internal structure is unknown. We are provided with just an API or function signature, and aim to synthesize a program with equivalent behavior.To attack this problem, we detail Presyn: a program synthesizer designed for flexible interoperation with existing programs and compiler toolchains. Presyn uses high-level imperative control-flow structures and a pair of cooperating predictive models to efficiently narrow the space of potential programs. These models can be trained effectively on small corpora of synthesized examples.We evaluate Presyn against five leading program synthesizers on a collection of 112 synthesis benchmarks collated from previous studies and real-world software libraries. We show that Presyn is able to synthesize a wider range of programs than each of them with less human input. We demonstrate the application of our approach to real-world code and software engineering problems with two case studies: accelerator library porting and detection of duplicated library reimplementations.CCS Concepts: • Software and its engineering → General programming languages; Automatic programming; Programming by example; Genetic programming.

Automatically harnessing sparse acceleration

Ginsbach

2020

Sparse linear algebra is central to many scientific programs, yet compilers fail to optimize it well. High-performance libraries are available, but adoption costs are significant. Moreover, libraries tie programs into vendor-specific software and hardware ecosystems, creating non-portable code.In this paper, we develop a new approach based on our specification Language for implementers of Linear Algebra Computations (LiLAC). Rather than requiring the application developer to (re)write every program for a given library, the burden is shifted to a one-off description by the library implementer. The LiLAC-enabled compiler uses this to insert appropriate library routines without source code changes.LiLAC provides automatic data marshaling, maintaining state between calls and minimizing data transfers. Appropriate places for library insertion are detected in compiler intermediate representation, independent of source languages.We evaluated on large-scale scientific applications written in FORTRAN; standard C/C++ and FORTRAN benchmarks; and C++ graph analytics kernels. Across heterogeneous platforms, applications and data sets we show speedups of 1.1× to over 10× without user intervention.

Program Lifting using Gray-Box Behavior

2021

M ³

Ginsbach²,

Woodruff

et al. 2020

Library migration is a challenging problem, where most existing approaches rely on prior knowledge. This can be, for example, information derived from changelogs or statistical models of API usage.This paper addresses a different API migration scenario where there is no prior knowledge of the target library. We have no historical changelogs and no access to its internal representation. To tackle this problem, this paper proposes a novel approach (M 3 ), where probabilistic program synthesis is used to semantically model the behavior of library functions. Then, we use an SMT-based code search engine to discover similar code in user applications. These discovered instances provide potential locations for API migrations.We evaluate our approach against 7 well-known libraries from varied application domains, learning correct implementations for 94 functions. Our approach is integrated with standard compiler tooling, and we use this integration to evaluate migration opportunities in 9 existing C/C++ applications with over 1MLoC. We discover over 7,000 instances of these functions, of which more than 2,000 represent migration opportunities. CCS CONCEPTS• Software and its engineering → Software maintenance tools; Software verification and validation; Compilers.