Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Ding, Zixiang; Xia, Rui; Yu, Jianfei; Li, Xiang; Yang, Jian

doi:10.1007/978-3-319-99501-4_24

Cited by 57 publications

(30 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In WSJ, 81 h long SI-284 set is used for training, "dev93" set is used for validation, and "eval92" set is used for evaluation. As we failed to reproduce the powerful baseline model used in [16,17], the baseline model proposed in [16,17] is modified by adding dense connections [42][43][44] between layers of the encoder and using subword output unit, in order to get a new baseline model in our work with comparable performance. Dense connection is adopted because it facilitates the training of deep models.…”

Section: Dataset and Model Setupmentioning

confidence: 99%

Segment boundary detection directed attention for online end-to-end speech recognition

Hou

Guo

Yan

et al. 2020

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Attention-based encoder-decoder models have recently shown competitive performance for automatic speech recognition (ASR) compared to conventional ASR systems. However, how to employ attention models for online speech recognition still needs to be explored. Different from conventional attention models wherein the soft alignment is obtained by a pass over the entire input sequence, attention models for online recognition must learn online alignment to attend part of input sequence monotonically when generating output symbols. Based on the fact that every output symbol is corresponding to a segment of input sequence, we propose a new attention mechanism for learning online alignment by decomposing the conventional alignment into two parts: segmentation-segment boundary detection with hard decision-and segment-directed attention-information aggregation within the segment with soft attention. The boundary detection is conducted along the time axis from left to right, and a decision is made for each input frame about whether it is a segment boundary or not. When a boundary is detected, the decoder generates an output symbol by attending the inputs within the corresponding segment. With the proposed attention mechanism, online speech recognition can be realized. The experimental results on TIMIT and WSJ dataset show that our proposed attention mechanism achieves comparable online performance with state-of-the-art models.

show abstract

Section: Dataset and Model Setupmentioning

confidence: 99%

Segment boundary detection directed attention for online end-to-end speech recognition

Hou

Guo

Yan

et al. 2020

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

show abstract

“…In our proposed framework, we utilize the densely connected Bi-LSTM (Ding et al, 2018) (DC-Bi-LSTM) model. A DC-Bi-LSTM model consists of multiple Bi-LSTM layers, where representation of each layer is estimated by concatenat-ing its hidden states and all the preceding layers' hidden states.…”

Section: Proposed Stance Detection Frameworkmentioning

confidence: 99%

Untitled

Siddiqua

Chy

Aono

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

Stance detection in twitter aims at mining user stances expressed in a tweet towards a single or multiple target entities. To tackle this problem, most of the prior studies have been explored the traditional deep learning models, e.g., LSTM and GRU. However, in compared to these traditional approaches, recently proposed densely connected Bi-LSTM and nested LSTMs architectures effectively address the vanishing-gradient and overfitting problems as well as dealing with long-term dependencies. In this paper, we propose a neural ensemble model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation. We also employ a multi-kernel convolution on top of them to extract the higher-level tweet representations. Results of extensive experiments on single and multi-target stance detection datasets show that our proposed method achieves substantial improvement over the current state-ofthe-art deep learning based methods.

show abstract

“…However, the majority of the existing related approaches attempts to utilize the traditional neural network models to detect the stance of a tweet, whereas recently introduced deep learning models such as densely connected Bi-LSTM [6] and nested LSTMs [7] achieved significant improvements to address the vanishing-gradient and overfitting problems as well as dealing with long-term dependencies effectively. Therefore, to bridge this research gap, in this paper, we propose a neural network model that adopts these LSTM variants with attention mechanism and multikernel convolution in a unified architecture.…”

Section: Related Workmentioning

confidence: 99%

“…In our proposed architecture, we utilize the densely connected Bi-LSTM [6] (DC-Bi-LSTM) model. A DC-Bi-LSTM model consists of multiple Bi-LSTM layers, where the representation of each layer is estimated by concatenating its hidden states and all the preceding layers' hidden states.…”

Section: Densely Connected Bi-lstmmentioning

confidence: 99%

“…The main contribution of this paper is that we propose a neural network model that combines the attention based densely connected Bi-LSTM [6] and nested LSTMs [7] models with the multi-kernel convolution in a unified Table 1 Example tweets for the target "Donald Trump" to illustrate the difference between stance detection and sentiment analysis.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tweet Stance Detection Using Multi-Kernel Convolution and Attentive LSTM Variants

Siddiqua

Chy

Aono

2019

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

Umme Aymun SIDDIQUA †a) , Abu Nowshed CHY †b) , Nonmembers, and Masaki AONO †c) , Member SUMMARY Stance detection in twitter aims at mining user stances expressed in a tweet towards a single or multiple target entities. Detecting and analyzing user stances from massive opinion-oriented twitter posts provide enormous opportunities to journalists, governments, companies, and other organizations. Most of the prior studies have explored the traditional deep learning models, e.g., long short-term memory (LSTM) and gated recurrent unit (GRU) for detecting stance in tweets. However, compared to these traditional approaches, recently proposed densely connected bidirectional LSTM and nested LSTMs architectures effectively address the vanishinggradient and overfitting problems as well as dealing with long-term dependencies. In this paper, we propose a neural network model that adopts the strengths of these two LSTM variants to learn better long-term dependencies, where each module coupled with an attention mechanism that amplifies the contribution of important elements in the final representation. We also employ a multi-kernel convolution on top of them to extract the higherlevel tweet representations. Results of extensive experiments on single and multi-target benchmark stance detection datasets show that our proposed method achieves substantial improvement over the current state-of-the-art deep learning based methods.

show abstract

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Cited by 57 publications

References 20 publications

Segment boundary detection directed attention for online end-to-end speech recognition

Segment boundary detection directed attention for online end-to-end speech recognition

Untitled

Tweet Stance Detection Using Multi-Kernel Convolution and Attentive LSTM Variants

Contact Info

Product

Resources

About