Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Li, Zonglin; You, Chun‐Xiang; Bhojanapalli, Srinadh; Li, Daliang; Rawat, Ankit Singh; Reddi, Sashank J.; Yao, Ke; Chern, Felix; Yu, Felix X.; Guo, Ruiqi; Kumar, Sanjiv

doi:10.48550/arxiv.2210.06313

2022

DOI: 10.48550/arxiv.2210.06313

|View full text |Cite

Preprint

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Zonglin Li¹,

Chun‐Xiang You²,

Srinadh Bhojanapalli³

et al.

Abstract: This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse. By activation map we refer to the intermediate output of the multi-layer perceptrons (MLPs) after a ReLU activation function, and by "sparse" we mean that on average very few entries (e.g., 3.0% for T5-Base and 6.3% for ViT-B16) are nonzero for each input to MLP. Moreover, larger Transformers with more layers and wider MLP hidden dimensions are sparser as measured by the p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Preprint1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

A hybrid Transformers-based convolutional neural network model for keratoconus detection in Scheimpflug-based dynamic corneal deformation videos

Abdelmotaal,

Hazarbasanov,

Salouti

et al. 2024

Preprint

View full text Add to dashboard Cite

Objective To assess the performance of a hybrid Transformer-based convolutional neural network (CNN) model for automated detection of keratoconus in stand-alone Scheimpflug-based dynamic corneal deformation videos (DCDV). Design Retrospective cohort study. Methods We used transfer learning for feature extraction from DCDVs. These feature maps were augmented by self-attention to model long-range dependencies before classification to directly identify keratoconus. Model performance was evaluated by objective accuracy metrics based on DCDVs from two independent cohorts with 275 and 546 subjects. Main outcome measures: Area under the receiver operating characteristics curve (AUC), accuracy, specificity, sensitivity, and F1 score. Results The sensitivity and specificity of the model in detecting keratoconus were 93% and 84%, respectively. The AUC of the keratoconus probability score based on the external validation database was 0.97. Conclusions The hybrid Transformer-based model was highly sensitive and specific in discriminating normal from keratoconic eyes using DCDV(s) at levels that may prove useful in clinical practice. Translational Relevance The hybrid Transformer-based model can detect keratoconus from non-invasive corneal videos directly without requiring corneal topography or tomography exhibiting potential application in corneal research and clinical practice.

show abstract

A hybrid Transformers-based convolutional neural network model for keratoconus detection in Scheimpflug-based dynamic corneal deformation videos

Abdelmotaal,

Hazarbasanov,

Salouti

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

Cited by 1 publication

References 35 publications

A hybrid Transformers-based convolutional neural network model for keratoconus detection in Scheimpflug-based dynamic corneal deformation videos

A hybrid Transformers-based convolutional neural network model for keratoconus detection in Scheimpflug-based dynamic corneal deformation videos

Contact Info

Product

Resources

About