Alena Shilova scite author profile

Alena Shilova

3Publications

24Citation Statements Received

50Citation Statements Given

How they've been cited

How they cite others

Affiliations

Lomonosov Moscow State University, University of Bordeaux, Inria Bordeaux - Sud-Ouest Research Centre

Publications

Order By: Most citations

Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training

Beaumont

Eyraud-Dubois

Shilova

2020

View full text Add to dashboard Cite

Training Deep Neural Networks is known to be an expensive operation, both in terms of computational cost and memory load. Indeed, during training, all intermediate layer outputs (called activations) computed during the forward phase must be stored until the corresponding gradient has been computed in the backward phase. These memory requirements sometimes prevent to consider larger batch sizes and deeper networks, so that they can limit both convergence speed and accuracy. Recent works have proposed to offload some of the computed forward activations from the memory of the GPU to the memory of the CPU. This requires to determine which activations should be offloaded and when these transfers from and to the memory of the GPU should take place. We prove that this problem is NP-hard in the strong sense, and we propose two heuristics based on relaxations of the problem. We perform extensive experimental evaluation on standard Deep Neural Networks. We compare the performance of our heuristics against previous approaches from the literature, showing that they achieve much better performance in a wide variety of situations.

show abstract

Pipelined Model Parallelism: Complexity Results and Memory Considerations

Beaumont

Eyraud-Dubois

Shilova

2021

View full text Add to dashboard Cite

The training phase in Deep Neural Networks has become an important source of computing resource usage and because of the resulting volume of computation, it is crucial to perform it efficiently on parallel architectures. Even today, data parallelism is the most widely used method, but the associated requirement to replicate all the weights on the totality of computation resources poses problems of memory at the level of each node and of collective communications at the level of the platform. In this context, the model parallelism, which consists in distributing the different layers of the network over the computing nodes, is an attractive alternative. Indeed, it is expected to better distribute weights (to cope with memory problems) and it does not imply large collective communications since only forward activations are communicated. However, to be efficient, it must be combined with a pipelined / streaming approach, which leads in turn to new memory costs. The goal of this paper is to model these memory costs in detail, to analyze the complexity of the associated throughput optimization problem under memory constraints and to show that it is possible to formalize this optimization problem as an Integer Linear Program (ILP).

show abstract

Training on the Edge: The why and the how

Kukreja

Shilova²,

Beaumont³

et al. 2019

View full text Add to dashboard Cite

Edge computing is the natural progression from Cloud computing, where, instead of collecting all data and processing it centrally, like in a cloud computing environment, we distribute the computing power and try to do as much processing as possible, close to the source of the data. There are various reasons this model is being adopted quickly, including privacy, and reduced power and bandwidth requirements on the Edge nodes. While it is common to see inference being done on Edge nodes today, it is much less common to do training on the Edge. The reasons for this range from computational limitations, to it not being advantageous in reducing communications between the Edge nodes. In this paper, we explore some scenarios where it is advantageous to do training on the Edge, as well as the use of checkpointing strategies to save memory.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alena Shilova

Optimal GPU-CPU Offloading Strategies for Deep Neural Network Training

Pipelined Model Parallelism: Complexity Results and Memory Considerations

Training on the Edge: The why and the how

Contact Info

Product

Resources

About