Grace Dinh scite author profile

Grace Dinh

5Publications

50Citation Statements Received

76Citation Statements Given

How they've been cited

How they cite others

141

Affiliations

University of California System, University of California, Berkeley, Meta (Israel)

Publications

Order By: Most citations

Communication-Optimal Convolutional Neural Nets

Demmel¹,

Dinh²

2018

Preprint

View full text Add to dashboard Cite

Efficiently executing convolutional neural nets (CNNs) is important in many machinelearning tasks. Since the cost of moving a word of data, either between levels of a memory hierarchy or between processors over a network, is much higher than the cost of an arithmetic operation, minimizing data movement is critical to performance optimization. In this paper, we present both new lower bounds on data movement needed for both convolutional and pooling layers of CNNs, and optimal sequential algorithms that attain these lower bounds. In most common cases, our optimal algorithms can attain significantly more data reuse than matrix multiplication.

show abstract

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Huang

Kalaiah

Kang

et al. 2021

View full text Add to dashboard Cite

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Huang¹,

Kang²,

Dinh³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many of which feature a large number of processing elements laid out spatially, together with a multi-level memory hierarchy and flexible interconnect. While DNN accelerators can take advantage of data reuse and achieve high peak throughput, they also expose a large number of runtime parameters to the programmers who need to explicitly manage how computation is scheduled both spatially and temporally. In fact, different scheduling choices can lead to wide variations in performance and efficiency, motivating the need for a fast and efficient search strategy to navigate the vast scheduling space.To address this challenge, we present CoSA, a constrainedoptimization-based approach for scheduling DNN accelerators. As opposed to existing approaches that either rely on designers' heuristics or iterative methods to navigate the search space, CoSA expresses scheduling decisions as a constrained-optimization problem that can be deterministically solved using mathematical optimization techniques. Specifically, CoSA leverages the regularities in DNN operators and hardware to formulate the DNN scheduling space into a mixed-integer programming (MIP) problem with algorithmic and architectural constraints, which can be solved to automatically generate a highly efficient schedule in one shot. We demonstrate that CoSA-generated schedules significantly outperform state-of-the-art approaches by a geometric mean of up to 2.5× across a wide range of DNN networks while improving the time-to-solution by 90×.

show abstract

Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds

Dinh

Demmel

2020

View full text Add to dashboard Cite

Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds

Dinh¹,

Demmel²

2020

Preprint

View full text Add to dashboard Cite

Reducing communication -either between levels of a memory hierarchy or between processors over a network -is a key component of performance optimization (in both time and energy) for many problems, including dense linear algebra [BCD + 14], particle interactions [DGK + 13], and machine learning [DD18, GAB + 18]. For these problems, which can be represented as nestedloop computations, previous tiling based approaches [CDK + 13, DR16] have been used to find both lower bounds on the communication required to execute them and optimal rearrangements, or blockings, to attain such lower bounds. However, such general approaches have typically assumed the problem sizes are large, an assumption that is often not met in practice. For instance, the classical (# arithmetic operations)/(cache size) 1/2 lower bound for matrix multiplication [HK81, BCD + 14] is not tight for matrix-vector multiplications, which must read in at least O(# arithmetic operations) words of memory; similar issues occur for almost all convolutions in machine learning applications, which use extremely small filter sizes (and therefore, loop bounds).In this paper, we provide an efficient way to both find and obtain, via an appropriate, efficiently constructible blocking, communication lower bounds and matching tilings which attain these lower bounds for nested loop programs with arbitrary loop bounds that operate on multidimensional arrays in the projective case, where the array indices are subsets of the loop indices. Our approach works on all such problems, regardless of dimensionality, size, memory access patterns, or number of arrays, and directly applies to (among other examples) matrix multiplication and similar dense linear algebra operations, tensor contractions, n-body pairwise interactions, pointwise convolutions, and fully connected layers.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Grace Dinh

Communication-Optimal Convolutional Neural Nets

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators

Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds

Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds

Contact Info

Product

Resources

About