2017
DOI: 10.1609/aaai.v31i1.10994
|View full text |Cite
|
Sign up to set email alerts
|

Incrementally Learning the Hierarchical Softmax Function for Neural Language Models

Abstract: Neural network language models (NNLMs) have attracted a lot of attention recently. In this paper, we present a training method that can incrementally train the hierarchical softmax function for NNMLs. We split the cost function to model old and update corpora separately, and factorize the objective function for the hierarchical softmax. Then we provide a new stochastic gradient based method to update all the word vectors and parameters, by comparing the old tree generated based on the old corpus and the new tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 55 publications
(15 citation statements)
references
References 15 publications
0
14
0
1
Order By: Relevance
“…Incremental learning is a learning process where the new data is continuously coming from the environment [1,30,31]. Most studies of incremental learning focus on supervised learning.…”
Section: Incremental Learningmentioning
confidence: 99%
“…Incremental learning is a learning process where the new data is continuously coming from the environment [1,30,31]. Most studies of incremental learning focus on supervised learning.…”
Section: Incremental Learningmentioning
confidence: 99%
“…As the pioneer work, Li et al [28] propose Learning without Forgetting (LwF) by using only the new-coming examples for the new task's training, while preserving the responses on the existing tasks to prevent catastrophic forgetting. Peng et al [34] present to train the hierarchical softmax function for deep language models for the new-coming tasks. FSLL [30] is proposed to perform on the few-shot setting by selecting very few parameters from the model.…”
Section: Life-long Learningmentioning
confidence: 99%
“…H-softmax technique substitutes the softmax layer with the hierarchical layer that considers words as leaves. This helps in decomposing the probability of one word into sequence of probability calculations which eliminates the need of expensive normalization of words [36,37]. It thus increases the speed of word prediction.…”
Section: Analysis Modelmentioning
confidence: 99%