2022
DOI: 10.3390/app12167968
|View full text |Cite
|
Sign up to set email alerts
|

Informative Language Encoding by Variational Autoencoders Using Transformer

Abstract: In natural language processing (NLP), Transformer is widely used and has reached the state-of-the-art level in numerous NLP tasks such as language modeling, summarization, and classification. Moreover, a variational autoencoder (VAE) is an efficient generative model in representation learning, combining deep learning with statistical inference in encoded representations. However, the use of VAE in natural language processing often brings forth practical difficulties such as a posterior collapse, also known as … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 24 publications
(65 reference statements)
0
4
0
Order By: Relevance
“…However, when the length of the input time series is too long, the LSTM network will also encounter problems such as long training time, slow parameter update and even gradient disappearance (Hochreiter, 1991; Dandwate et al , 2023). At this time, LSTM needs to spend more time training a long sequence, and when using the backpropagation mechanism to update model parameters, a longer sequence will also cause the gradient to gradually decrease to close to zero, and the parameters cannot be updated, thereby losing the key information in the previous time step and the information in front of the long sequence, that is, long-term dependency problems cannot be dealt with (Ok et al , 2022; Mercan et al , 2023). For large LSTM models, a reasonable input sequence length is between 100 and 500 (Xiao et al , 2019).…”
Section: The Proposed Modulation Classification Methodsmentioning
confidence: 99%
“…However, when the length of the input time series is too long, the LSTM network will also encounter problems such as long training time, slow parameter update and even gradient disappearance (Hochreiter, 1991; Dandwate et al , 2023). At this time, LSTM needs to spend more time training a long sequence, and when using the backpropagation mechanism to update model parameters, a longer sequence will also cause the gradient to gradually decrease to close to zero, and the parameters cannot be updated, thereby losing the key information in the previous time step and the information in front of the long sequence, that is, long-term dependency problems cannot be dealt with (Ok et al , 2022; Mercan et al , 2023). For large LSTM models, a reasonable input sequence length is between 100 and 500 (Xiao et al , 2019).…”
Section: The Proposed Modulation Classification Methodsmentioning
confidence: 99%
“…Biesner et al [44] integrate the token vectors of each word from a Transformer encoder using an RNN to obtain a single latent variable. On the other hand, Ok et al [45] encodes the token vector of the sentence representation into a latent variable through a simple S5.…”
Section: Transformer-based Vaementioning
confidence: 99%
“…Biesner et al [41] integrate the token vectors of each word from a Transformer encoder using an RNN to obtain a single latent variable. On the other hand, Ok et al [42] encodes the token vector of the sentence representation into a latent variable through a simple linear layer. Since sequential processing with RNNs undermines the advantage of parallel processing offered by the Transformer, our model is based on the approach of Ok et al to construct the VAE model.…”
Section: Transformer-based Vaementioning
confidence: 99%