Eldar Kurtic scite author profile

Eldar Kurtic

5Publications

27Citation Statements Received

21Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Sarajevo

Publications

Order By: Most citations

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Kurtic¹,

Campos²,

Nguyen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Pre-trained Transformer-based language models have become a key building block for natural language processing (NLP) tasks. While these models are extremely accurate, they can be too large and computationally intensive to run on standard deployments. A variety of compression methods, including distillation, quantization, structured and unstructured pruning are known to be applicable to decrease model size and increase inference speed. In this context, this paper's contributions are two-fold. We begin with an in-depth study of the accuracycompression trade-off for unstructured weight pruning in the context of BERT models, and introduce Optimal BERT Surgeon (O-BERT-S), an efficient and accurate weight pruning method based on approximate second-order information, which we show to yield state-of-theart results in terms of the compression/accuracy trade-off. Specifically, Optimal BERT Surgeon extends existing work on second-order pruning by allowing for pruning blocks of weights, and by being applicable at BERT scale. Second, we investigate the impact of this pruning method when compounding compression approaches for Transformer-based models, which allows us to combine state-of-the-art structured and unstructured pruning together with quantization, in order to obtain highly compressed, but accurate models. The resulting compression framework is powerful, yet general and efficient: we apply it to both the fine-tuning and pre-training stages of language tasks, to obtain state-of-the-art results on the accuracycompression trade-off with relatively simple compression recipes. For example, we obtain 10x model size compression with < 1% relative drop in accuracy to the dense BERT-base, 10x end-to-end CPU-inference speedup with < 2% relative drop in accuracy, and 29x inference speedups with < 7.5% relative accuracy drop.

show abstract

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Frantar¹,

Kurtic²,

Alistarh³

2021

Preprint

View full text Add to dashboard Cite

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Kurtic¹,

Campos²,

Nguyen³

et al. 2022

View full text Add to dashboard Cite

Implementation of algorithm for detection of single phase fault with electric arc on dsPIC30F4013 microcontroller

Korjenic

Kurtic

Aksamovic

2018

View full text Add to dashboard Cite

SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

Nikdan¹,

Tommaso²,

Iofinova³

et al. 2023

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Eldar Kurtic

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

Implementation of algorithm for detection of single phase fault with electric arc on dsPIC30F4013 microcontroller

SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks

Contact Info

Product

Resources

About