Simple Attention-Based Representation Learning for Ranking Short Social Media Posts

Shi, Peng; Rao, Jinfeng; Lin, Jimmy

doi:10.18653/v1/n19-1229

Cited by 8 publications

(4 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks [9]. Evidence of NLP community moving towards attention-based models can be found by more attention-based neural networks developed by companies like Amazon [8], Facebook [16], and Salesforce [2]. The novel approach of Transformer is the first model to eliminate recurrence completely with self-attention to handle the dependencies between input and output.…”

Section: Related Workmentioning

confidence: 99%

Ftrans

Pandey

Fang

et al. 2020

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

109

View full text Add to dashboard Cite

In natural language processing (NLP), the "Transformer" architecture was proposed as the first transduction model replying entirely on self-attention mechanisms without using sequence-aligned recurrent neural networks (RNNs) or convolution, and it achieved significant improvements for sequence to sequence tasks. The introduced intensive computation and storage of these pre-trained language representations has impeded their popularity into computation and memory constrained devices. The field-programmable gate array (FPGA) is widely used to accelerate deep learning algorithms for its high parallelism and low latency. However, the trained models are still too large to accommodate to an FPGA fabric. In this paper, we propose an efficient acceleration framework, Ftrans, for transformer-based large scale language representations. Our framework includes enhanced block-circulant matrix (BCM)-based weight representation to enable model compression on large-scale language representations at the algorithm level with few accuracy degradation, and an acceleration design at the architecture level. Experimental results show that our proposed framework significantly reduce the model size of NLP models by up to 16 times. Our FPGA design achieves 27.07× and 81 × improvement in performance and energy efficiency compared to CPU, and up to 8.80× improvement in energy efficiency compared to GPU.

show abstract

Section: Related Workmentioning

confidence: 99%

Ftrans

Pandey

Fang

et al. 2020

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design

109

View full text Add to dashboard Cite

show abstract

“…In contrast, MP-HCNN was explicitly designed with characteristics of tweets in mind: it significantly outperforms previous neural ranking models (see original paper for comparisons, not repeated here). We also copied results from Shi et al (2018), who reported even higher effectiveness than MP-HCNN.…”

Section: Searching Social Media Postsmentioning

confidence: 99%

Simple Applications of BERT for Ad Hoc Document Retrieval

Yang,

Zhang,

Lin

2019

Preprint

Self Cite

View full text Add to dashboard Cite

Following recent successes in applying BERT to question answering, we explore simple applications to ad hoc document retrieval. This required confronting the challenge posed by documents that are typically longer than the length of input BERT was designed to handle. We address this issue by applying inference on sentences individually, and then aggregating sentence scores to produce document scores. Experiments on TREC microblog and newswire test collections show that our approach is simple yet effective, as we report the highest average precision on these datasets by neural approaches that we are aware of.

show abstract

“…The first two blocks of the table are copied from Rao et al (2019), who compared bag-of-words baselines (QL and RM3) to several popular neural ranking models as well as MP-HCNN, the model they introduced. The results of Rao et al (2019) were further improved in Shi et al (2018); in all cases, the neural models include interpolation with the original document scores. We see that Birch yields a large jump in effectiveness across all Microblog collections.…”

Section: Trec 2011-2014 Microblog Tracksmentioning

confidence: 99%

Applying BERT to Document Retrieval with Birch

Yilmaz¹,

Wang²,

Yang³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

Self Cite

View full text Add to dashboard Cite

We present Birch, a system that applies BERT to document retrieval via integration with the open-source Anserini information retrieval toolkit to demonstrate end-to-end search over large document collections. Birch implements simple ranking models that achieve stateof-the-art effectiveness on standard TREC newswire and social media test collections. This demonstration focuses on technical challenges in the integration of NLP and IR capabilities, along with the design rationale behind our approach to tightly-coupled integration between Python (to support neural networks) and the Java Virtual Machine (to support document retrieval using the open-source Lucene search library). We demonstrate integration of Birch with an existing search interface as well as interactive notebooks that highlight its capabilities in an easy-to-understand manner.

show abstract

Simple Attention-Based Representation Learning for Ranking Short Social Media Posts

Cited by 8 publications

References 27 publications

Ftrans

Ftrans

Simple Applications of BERT for Ad Hoc Document Retrieval

Applying BERT to Document Retrieval with Birch

Contact Info

Product

Resources

About