Protonu Basu scite author profile

The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper we provide detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high-performance optimizations targeting existing systems, point out their limitations and make suggestions for the future general-purpose/accelerated inference hardware. Also, we highlight the need for better co-design of algorithms, numerics and computing platforms to address the challenges of workloads often run in data centers.

show abstract

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

Yang¹,

Gayatri²,

Kurth³

et al. 2018

View full text Add to dashboard Cite

A script-based autotuning compiler system to generate high-performance CUDA code

Khan

Basu

Rudy

et al. 2013

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

This article presents a novel compiler framework for CUDA code generation. The compiler structure is designed to support autotuning, which employs empirical techniques to evaluate a set of alternative mappings of computation kernels and select the mapping that obtains the best performance. This article introduces a Transformation Strategy Generator, a meta-optimizer that generates a set of transformation recipes, which are descriptions of the mapping of the sequential code to parallel CUDA code. These recipes comprise a search space of possible implementations. This system achieves performance comparable and sometimes better than manually tuned libraries and exceeds the performance of a state-of-the-art GPU compiler.

show abstract

Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs

Zhao

Basu

Williams

et al. 2019

View full text Add to dashboard Cite

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid

Basu

Venkat

Hall

et al. 2013

View full text Add to dashboard Cite

This paper describes a compiler approach to communicationavoiding optimizations in geometric multigrid (GMG), one of the most popular methods for solving partial differential equations. Communication-avoiding optimizations reduce vertical communication through the memory hierarchy and horizontal communication across processes or threads, usually at the expense of introducing redundant computation. We focus on applying these optimizations to the smooth operator, which successively reduces the error and accounts for the largest fraction of the GMG execution time. Our compiler technology applies both novel and known transformations to derive an implementation comparable to manually-tuned code. To make the approach portable, an underlying autotuning system explores the tradeoff between reduced communication and increased computation, as well as tradeoffs in threading schemes, to automatically identify the best implementation for a particular architecture and at each computation phase. Results show that we are able to quadruple the performance of the smooth operation on the finest grids while attaining similar or better performance than manually-tuned code.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Protonu Basu

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability

A script-based autotuning compiler system to generate high-performance CUDA code

Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs

Compiler generation and autotuning of communication-avoiding operators for geometric multigrid

Contact Info

Product

Resources

About