Marat Dukhan scite author profile

This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to fit target latency and/or energy constraints. We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors. At the core of our algorithm lies an accuracy predictor built atop Gaussian Process with Bayesian optimization for iterative sampling. With a one-time building cost for the predictors, our algorithm produces state-of-the-art model architectures on different platforms under given constraints in just minutes. Our results show that adapting computation resources to building blocks is critical to model performance. Without the addition of any bells and whistles, our models achieve significant accuracy improvements against state-of-the-art hand-crafted and automatically designed architectures. We achieve 73.8% and 75.3% top-1 accuracy on ImageNet at 20ms latency on a mobile CPU and DSP. At reduced latency, our models achieve up to 8.5% (4.8%) and 6.6% (9.3%) absolute top-1 accuracy improvements compared to MobileNetV2 and MnasNet, respectively, on a mobile CPU (DSP), and 2.7% (4.6%) and 5.6% (2.6%) accuracy gains over ResNet-101 and ResNet-152, respectively, on an Nvidia GPU (Intel CPU).

show abstract

Fast Sparse ConvNets

Elsen

Dukhan

Gale

et al. 2020

View full text Add to dashboard Cite

Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks

Choi

Dukhan

Liu

et al. 2014

View full text Add to dashboard Cite

We conducted a microbenchmarking study of the time, energy, and power of computation and memory access on several existing platforms. These platforms represent candidate compute-node building blocks of future high-performance computing systems. Our analysis uses the "energy roofline" model, developed in prior work, which we extend in two ways. First, we improve the model's accuracy by accounting for power caps, basic memory hierarchy access costs, and measurement of random memory access patterns. Secondly, we empirically evaluate server-, mini-, and mobile-class platforms that span a range of compute and power characteristics. Our study includes a dozen such platforms, including x86 (both conventional and Xeon Phi), ARM, GPU, and hybrid (AMD APU and other SoC) processors. These data and our model analytically characterize the range of algorithmic regimes where we might prefer one building block to others. It suggests critical values of arithmetic intensity around which some systems may switch from being more to less time-and energy-efficient than others; it further suggests how, with respect to intensity, operations should be throttled to meet a power cap. We hope our methods can help make debates about the relative merits of these and other systems more quantitative, analytical, and insightful. I. INTRODUCTIONWe consider the problem of estimating how much time, energy, and power an abstract algorithm may require on a given machine. Our approach starts with an abstract cost model grounded in first principles of algorithm design. The model's utility derives from the way it facilitates quick and precise reasoning about potential time-efficiency, energy-efficiency, and power-efficiency. This paper applies the model to analyze candidate compute-node building blocks being considered for emerging and future HPC systems, which include high-end server and GPU platforms as well as low-end, low-power mobile platforms.Importantly, beyond specific findings and data, we emphasize the methodological aspects of this paper. In particular, architects may find our high-level approach to be a useful additional way to assess systems across computations; our analysis technique aims to provide more insight than a collection of blackbox benchmarks provides but without having to know too much detail about specific computations. Similarly, we hope algorithm designers may find ways to reason about algorithmic techniques for managing energy and power, and tradeoffs (if any) against time.

show abstract

Scaling up Hartree–Fock calculations on Tianhe-2

Chow

Liu

Misra

et al. 2015

The International Journal of High Performance Computing Applica

View full text Add to dashboard Cite

This paper presents a new optimized and scalable code for Hartree–Fock self-consistent field iterations. Goals of the code design include scalability to large numbers of nodes, and the capability to simultaneously use CPUs and Intel Xeon Phi coprocessors. Issues we encountered as we optimized and scaled up the code on Tianhe-2 are described and addressed. A major issue is load balance, which is made challenging due to integral screening. We describe a general framework for finding a well-balanced static partitioning of the load in the presence of screening. Work stealing is used to polish the load balance. Performance results are shown on Stampede and Tianhe-2 supercomputers. Scalability is demonstrated on large simulations involving 2938 atoms and 27,394 basis functions, utilizing 8100 nodes of Tianhe-2.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Marat Dukhan

Machine Learning at Facebook: Understanding Inference at the Edge

ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation

Fast Sparse ConvNets

Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks

Scaling up Hartree–Fock calculations on Tianhe-2

Contact Info

Product

Resources

About