Oleh Shliazhko scite author profile

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models don’t fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.

show abstract

Survey on Efficient Training of Large Neural Networks

Gusak

Cherniuk

Shilova

et al. 2022

View full text Add to dashboard Cite

Pretrained models have produced great success in both Computer Vision (CV) and Natural Language Processing (NLP). This progress leads to learning joint representations of vision and language pretraining by feeding visual and linguistic contents into a multi-layer transformer, Visual-Language Pretrained Models (VLPMs). In this paper, we present an overview of the major advances achieved in VLPMs for producing joint representations of vision and language. As the preliminaries, we briefly describe the general task definition and genetic architecture of VLPMs. We first discuss the language and vision data encoding methods and then present the mainstream VLPM structure as the core content. We further summarise several essential pretraining and fine-tuning strategies. Finally, we highlight three future directions for both CV and NLP researchers to provide insightful guidance.

show abstract

Using RuGPT3-XL Model for RuNormAS competition

Anton¹,

Shliazhko²,

Katricheva³

et al. 2021

View full text Add to dashboard Cite

The paper presents a fine-tuning methodology of the RuGPT3-XL (Generative Pretrained Transformer-3 for Russian) language model for the normalization of text spans task. The solution is presented in a competition for two tasks: Normalization of Named Entities (Named entities) and Normalization of a wider class of text spans, including the normalization of different parts of speech (Generic spans). The best solution has achieved 0.9645 accuracy on the Generic spans task and 0.9575 on the Named entities task. The presented solutions are in the public domain at https://github.com/RussianNLP/RuNormAS-solution

show abstract

Using Generative Pretrained Transformer-3 Models for Russian News Clustering and Title Generation tasks

Tikhonova¹,

SberDevices²,

Shavrina³

et al. 2021

View full text Add to dashboard Cite

The paper presents a methodology for news clustering and news headline generation based on the zero-shot approach and minimal tuning of the RuGPT-3 architecture (Generative Pretrained Transformer 3 for Russian). The solution is presented in a competition for news clustering, headline selection and generation. The following approaches are described: 1) zero-shot unsupervised classification based on pairwise news perplexity: the method requires no training or model fine-tuning and yields 0.7 F1-measure. 2) fine-tuning: news headlines generation with the best result 0.292 ROUGE and 0.596 BLEU.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Oleh Shliazhko

mGPT: Few-Shot Learners Go Multilingual

Legal Judgment Prediction: A Survey of the State of the Art

Survey on Efficient Training of Large Neural Networks

Using RuGPT3-XL Model for RuNormAS competition

Using Generative Pretrained Transformer-3 Models for Russian News Clustering and Title Generation tasks

Contact Info

Product

Resources

About