Vobla

Beaugnon, Ulysse; Kravets, Alexey; Haastregt, Sven van; Baghdadi, Riyadh; Tweed, David; Absar, Javed; Lokhmotov, Anton

doi:10.1145/2597809.2597818

Cited by 10 publications

(3 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…implementations, which originate from either hand-tuned libraries or other high-performance code generators. 9 We chose to compare against Caffe2 rather than against other optimization flows due to expressivity and automation limitations: XLA or Glow do not support custom layers, and Halide or TVM lack range inference and automatic parallelism discovery, which significantly complicates the expression of new layers such as KRU and WaveNet. The common set of comparable layers would be limited to matrix multiplications and convolutions, while one of the main contributions of TC is to enable exploration of new unconventional layers before super-optimized implementations are available.…”

Section: Performance Resultsmentioning

confidence: 99%

“…Polyhedral techniques have also been tailored for domain-specific purposes. State-of-the-art examples include the PolyMage [46] DSL for image processing pipelines and the PENCIL approach to the construction of parallelizing and compilers for DSLs [5,9]. PolyMage is a clear illustration of the benefits of operating at a high level of abstraction, closer to the mathematics of the domain of interest: While GCC/Graphite and LLVM/Polly struggle to recover affine control and flow from low-level code, PolyMage natively captures patterns amenable to domain-specific optimization, such as stencil-specific overlapped tiling with or without recomputation, and cache-conscious fusion and tiling heuristics; it also offers a more productive programming experience for end-users.…”

Section: Related Workmentioning

confidence: 99%

“…The polyhedral framework of compilation emerged as a natural candidate to design a versatile optimization flow satisfying the needs of the domain and target hardware. It has demonstrated strong results in domain-specific optimization [5,9,20,46], expert-driven meta-programming [6,15,26], embedding of third-party library code [40], and automatic generation of efficient code for heterogeneous targets [5,7,43,51,70,77]. We attempt to take the best of both worlds, defining a domain-specific language rich enough to capture full sub-graphs of modern Machine Learning (ML) models while enabling aggressive compilation competitive to native libraries.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

The Next 700 Accelerated Layers

Vasilache

Zinenko

Theodoridis

et al. 2019

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, and hardware acceleration of models represented as graphs of computational operators. These operators wrap high-performance libraries such as cuDNN or NNPACK. When the computation does not match any predefined library call, custom operators must be implemented, often at high engineering cost and performance penalty, limiting the pace of innovation. To address this productivity gap, we propose and evaluate: (1) a domain-specific language with a tensor notation close to the mathematics of deep learning; (2) a Just-In-Time optimizing compiler based on the polyhedral framework; (3) carefully coordinated linear optimization and evolutionary algorithms to synthesize high-performance CUDA kernels; (4) the transparent integration of our flow into PyTorch and Caffe2, providing the fully automatic synthesis of high-performance GPU kernels from simple tensor algebra. The performance is comparable to, and often exceeds the performance of, highly tuned libraries. CCS Concepts: • Software and its engineering → Compilers;

show abstract

Section: Performance Resultsmentioning

confidence: 99%