2020
DOI: 10.1109/tnnls.2019.2947563
|View full text |Cite
|
Sign up to set email alerts
|

Major–Minor Long Short-Term Memory for Word-Level Language Model

Abstract: Language model plays an important role in natural language processing (NLP) systems like machine translation, speech recognition, learning token embeddings, natural language generation and text classification. Recently, the multi-layer Long Short-Term Memory (LSTM) models have been demonstrated to achieve promising performance on word-level language modeling. For each LSTM layer, larger hidden size usually means more diverse semantic features, which enables the language model to perform better. However, we hav… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 22 publications
0
10
0
Order By: Relevance
“…In this group of experiments, we use a classic recurrent network (i.e., LSTM) and an open dataset (i.e., Penn Treebank). Same as the previous works [22], [23], [32], [33], we take the perplexity to measure the performance of all the comparison algorithms.…”
Section: ) Image Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…In this group of experiments, we use a classic recurrent network (i.e., LSTM) and an open dataset (i.e., Penn Treebank). Same as the previous works [22], [23], [32], [33], we take the perplexity to measure the performance of all the comparison algorithms.…”
Section: ) Image Classificationmentioning
confidence: 99%
“…Since β 1t ≤ β 1 and from Equation(32), we can further attain the following bound for E 2 :E 2 ≤ α 2 (1 − β1) n i=1 T j=1 T −j k=1 β 1(T −k+1) g 2 j,i T T j=1 (1 − β2j) T −j k=1 β 2(T −k+1) g 2 j,i + δ ≤ α 2 (1 − β1) + ζδ, and ω 0 = ζδ, we obtain Ψ = ω j − ω j−1 ω j . (In addition, for any a ≥ b > 0, the inequality 1 + x ≤ e x implies that…”
mentioning
confidence: 94%
“…It is easy to find the Model-SLD is designed based on Model 2 with the addition of One-Dimensional Convolution Network 30 and LSTM (Long and Short Time Network). 31,32…”
Section: The Application Of Vibration Prediction Networkmentioning
confidence: 99%
“…In this group of experiments, we use a classic recurrent network (i.e., LSTM) and an open dataset (i.e., Penn Treebank). Same as the previous works [16], [17], [22], [23], we use perplexity to measure the performance of all the comparison algorithms. Note that the lower perplexity is better.…”
Section: Language Modelingmentioning
confidence: 99%
“…In addition, for term E 1 of Equation ( 21), and from Equation ( 6), we can obtain (22) where i ∈ {1, . .…”
Section: T } For Decision Point Xt Generated By the Proposed Algorith...mentioning
confidence: 99%