2018
DOI: 10.1007/978-3-319-99501-4_24
|View full text |Cite
|
Sign up to set email alerts
|

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Abstract: Deep neural networks have recently been shown to achieve highly competitive performance in many computer vision tasks due to their abilities of exploring in a much larger hypothesis space. However, since most deep architectures like stacked RNNs tend to suffer from the vanishing-gradient and overfitting problems, their effects are still understudied in many NLP tasks. Inspired by this, we propose a novel multi-layer RNN model called densely connected bidirectional long short-term memory (DC-Bi-LSTM) in this pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 57 publications
(30 citation statements)
references
References 20 publications
0
30
0
Order By: Relevance
“…In WSJ, 81 h long SI-284 set is used for training, "dev93" set is used for validation, and "eval92" set is used for evaluation. As we failed to reproduce the powerful baseline model used in [16,17], the baseline model proposed in [16,17] is modified by adding dense connections [42][43][44] between layers of the encoder and using subword output unit, in order to get a new baseline model in our work with comparable performance. Dense connection is adopted because it facilitates the training of deep models.…”
Section: Dataset and Model Setupmentioning
confidence: 99%
“…In WSJ, 81 h long SI-284 set is used for training, "dev93" set is used for validation, and "eval92" set is used for evaluation. As we failed to reproduce the powerful baseline model used in [16,17], the baseline model proposed in [16,17] is modified by adding dense connections [42][43][44] between layers of the encoder and using subword output unit, in order to get a new baseline model in our work with comparable performance. Dense connection is adopted because it facilitates the training of deep models.…”
Section: Dataset and Model Setupmentioning
confidence: 99%
“…In our proposed framework, we utilize the densely connected Bi-LSTM (Ding et al, 2018) (DC-Bi-LSTM) model. A DC-Bi-LSTM model consists of multiple Bi-LSTM layers, where representation of each layer is estimated by concatenat-ing its hidden states and all the preceding layers' hidden states.…”
Section: Proposed Stance Detection Frameworkmentioning
confidence: 99%
“…However, the majority of the existing related approaches attempts to utilize the traditional neural network models to detect the stance of a tweet, whereas recently introduced deep learning models such as densely connected Bi-LSTM [6] and nested LSTMs [7] achieved significant improvements to address the vanishing-gradient and overfitting problems as well as dealing with long-term dependencies effectively. Therefore, to bridge this research gap, in this paper, we propose a neural network model that adopts these LSTM variants with attention mechanism and multikernel convolution in a unified architecture.…”
Section: Related Workmentioning
confidence: 99%
“…In our proposed architecture, we utilize the densely connected Bi-LSTM [6] (DC-Bi-LSTM) model. A DC-Bi-LSTM model consists of multiple Bi-LSTM layers, where the representation of each layer is estimated by concatenating its hidden states and all the preceding layers' hidden states.…”
Section: Densely Connected Bi-lstmmentioning
confidence: 99%
See 1 more Smart Citation