Naums Mogers scite author profile

Naums Mogers

2Publications

8Citation Statements Received

42Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Edinburgh

Publications

Order By: Most citations

Automatic generation of specialized direct convolutions for mobile GPUs

Mogers

Radu

et al. 2020

View full text Add to dashboard Cite

Convolutional Neural Networks (CNNs) are a powerful and versatile tool for performing computer vision tasks in both resource constrained settings and server-side applications. Most GPU hardware vendors provide highly tuned libraries for CNNs such as Nvidia's cuDNN or ARM Compute Library. Such libraries are the basis for higher-level, commonly-used, machine-learning frameworks such as PyTorch or Caffe, abstracting them away from vendor-specific implementation details. However, writing optimized parallel code for GPUs is far from trivial. This places a significant burden on hardware-specific library writers which have to continually play catch-up with rapid hardware and network evolution.To reduce effort and reduce time to market, new approaches are needed based on automatic code generation, rather than manual implementation. This paper describes such an approach for direct convolutions using Lift, a new data-parallel intermediate language and compiler. Lift uses a high-level intermediate language to express algorithms which are then automatically optimized using a system of rewrite-rules. Direct convolution, as opposed to the matrix multiplication approach used commonly by machine-learning frameworks, uses an order of magnitude less memory, which is critical for mobile devices. Using Lift, we show that it is possible to generate automatically code that is ×10 faster than the direct convolution while using ×3.6 less space than the GEMM-based convolution of the very specialized ARM Compute Library on the latest generation of ARM Mali GPU.

show abstract

Mapping parallelism in a functional IR through constraint satisfaction: a case study on convolution for mobile GPUs

Mogers

Radu

et al. 2022

View full text Add to dashboard Cite

Graphics Processing Units (GPUs) are notoriously hard to optimize for manually. What is needed are good automatic code generators and optimizers. Accelerate, Futhark and Lift have demonstrated that using a functional approach is well suited to solve this challenge. Lift, for instance, uses a system of rewrite rules with a multi-stage approach. Algorithmic optimizations are first explored, followed by hardware-specific optimizations such as using shared memory and mapping parallelism.While the algorithmic exploration leads to correct transformed programs by construction, it is not necessarily true for the latter phase. Exploiting shared memory and mapping parallelism while ensuring correct synchronization is a delicate balancing act, and is hard to encode in a rewrite system. Currently, Lift relies on heuristics with ad-hoc mechanisms to check for correctness.This paper proposes to extract parallelization constraints automatically from a functional IR and use a solver to identify valid rewriting. Using a convolutional neural network on a mobile GPU as a use case, this approach matches the performance of the ARM Compute Library GEMM convolution and the TVM-generated kernel consuming between 2× and 3.6× less memory. Furthermore, a speedup of 12× is achieved over the ARM Compute Library direct convolution implementation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Naums Mogers

Automatic generation of specialized direct convolutions for mobile GPUs

Mapping parallelism in a functional IR through constraint satisfaction: a case study on convolution for mobile GPUs

Contact Info

Product

Resources

About