Mohammad Saffar scite author profile

Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic sparse attention patterns that avoid allocating computation and memory to attend to content unrelated to the query of interest. This work builds upon two lines of research: It combines the modeling flexibility of prior work on content-based sparse attention with the efficiency gains from approaches based on local, temporal sparse attention. Our model, the Routing Transformer, endows self-attention with a sparse routing module based on online k-means while reducing the overall complexity of attention to O( n1.5 d) from O( n2 d) for sequence length n and hidden dimension d. We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15.8 vs 18.3 perplexity), as well as on image generation on ImageNet-64 (3.43 vs 3.44 bits/dim) while using fewer self-attention layers. Additionally, we set a new state-of-the-art on the newly released PG-19 data-set, obtaining a test perplexity of 33.2 with a 22 layer Routing Transformer model trained on sequences of length 8192. We open-source the code for Routing Transformer in Tensorflow. 1

show abstract

Efficient Content-Based Sparse Attention with Routing Transformers

Roy¹,

Saffar²,

Vaswani³

et al. 2020

Preprint

View full text Add to dashboard Cite

A Scale and Translation Invariant Approach for Early Classification of Spatio-Temporal Patterns Using Spiking Neural Networks

Rekabdar

Nicolescu

Saffar

et al. 2015

Neural Process Lett

View full text Add to dashboard Cite

This paper addresses the problem of encoding and classifying spatio-temporal patterns, which are typical for human actions or gestures. The proposed method has the following main contributions: (i) it requires a very small number of training examples, (ii) it accepts variable sized input patterns, (iii) it is invariant to scale and translation, and (iv) it enables early recognition, from only partial information of the pattern. The underlying representation employed is a spiking neural network with axonal conductance delay. We designed a novel approach for mapping spatio-temporal patterns to spike trains, which are used to stimulate the network. The pattern features emerge in the network as a result of this stimulation in the form of polychronous neuronal groups, which are used for classification. The proposed method is validated on a set of gestures representing the digits from 0 to 9, extracted from video data of a human drawing the corresponding digits. The paper presents a comparison with several other standard pattern recognition approaches. The results show that the proposed approach significantly outperforms these methods, it is invariant to scale and translation, and it has the ability to recognize patterns from only partial information.

show abstract

Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning

Babaeizadeh¹,

Saffar²,

Hafner³

et al. 2020

Preprint

View full text Add to dashboard Cite

Model-based reinforcement learning (MBRL) methods have shown strong sample efficiency and performance across a variety of tasks, including when faced with high-dimensional visual observations. These methods learn to predict the environment dynamics and expected reward from interaction and use this predictive model to plan and perform the task. However, MBRL methods vary in their fundamental design choices, and there is no strong consensus in the literature on how these design decisions affect performance. In this paper, we study a number of design decisions for the predictive model in visual MBRL algorithms, focusing specifically on methods that use a predictive model for planning. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. A big exception to this finding is that predicting future observations (i.e., images) leads to significant task performance improvement compared to only predicting rewards. We also empirically find that image prediction accuracy, somewhat surprisingly, correlates more strongly with downstream task performance than reward prediction accuracy. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks (that require exploration) will perform the same as the best-performing models when trained on the same training data. Simultaneously, in the absence of exploration, models that fit the data better usually perform better on the downstream task as well, but surprisingly, these are often not the same models that perform the best when learning and exploring from scratch. These findings suggest that performance and exploration place important and potentially contradictory requirements on the model.

show abstract

The Cell

Bagasra¹,

McLean²,

Saffar³

2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mohammad Saffar

Efficient Content-Based Sparse Attention with Routing Transformers

Efficient Content-Based Sparse Attention with Routing Transformers

A Scale and Translation Invariant Approach for Early Classification of Spatio-Temporal Patterns Using Spiking Neural Networks

Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning

The Cell

Contact Info

Product

Resources

About