Incrementally Learning the Hierarchical Softmax Function for Neural Language Models

Peng, Hao; Li, Jianxin; Song, Yangqiu; Liu, Yao‐Peng

doi:10.1609/aaai.v31i1.10994

Cited by 55 publications

(15 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Incremental learning is a learning process where the new data is continuously coming from the environment [1,30,31]. Most studies of incremental learning focus on supervised learning.…”

Section: Incremental Learningmentioning

confidence: 99%

Unbiased and Efficient Self-Supervised Incremental Contrastive Learning

Cheng¹,

Li²,

Peng³

et al. 2023

Preprint

View full text Add to dashboard Cite

Contrastive Learning (CL) has been proved to be a powerful selfsupervised approach for a wide range of domains, including computer vision and graph representation learning. However, the incremental learning issue of CL has rarely been studied, which brings the limitation in applying it to real-world applications. Contrastive learning identifies the samples with the negative ones from the noise distribution that changes in the incremental scenarios. Therefore, only fitting the change of data without noise distribution causes bias, and directly retraining results in low efficiency. To bridge this research gap, we propose a self-supervised Incremental Contrastive Learning (ICL) framework consisting of (i) a novel Incremental InfoNCE (NCE-II) loss function by estimating the change of noise distribution for old data to guarantee no bias with respect to the retraining, (ii) a meta-optimization with deep reinforced Learning Rate Learning (LRL) mechanism which can adaptively learn the learning rate according to the status of the training processes and achieve fast convergence which is critical for incremental learning. Theoretically, the proposed ICL is equivalent to retraining, which is based on solid mathematical derivation. In practice, extensive experiments in different domains demonstrate that, without retraining a new model, ICL achieves up to 16.7× training speedup and 16.8× faster convergence with competitive results.

show abstract

“…Incremental learning is a learning process where the new data is continuously coming from the environment [1,30,31]. Most studies of incremental learning focus on supervised learning.…”

Section: Incremental Learningmentioning

confidence: 99%

Unbiased and Efficient Self-Supervised Incremental Contrastive Learning

Cheng¹,

Li²,

Peng³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…As the pioneer work, Li et al [28] propose Learning without Forgetting (LwF) by using only the new-coming examples for the new task's training, while preserving the responses on the existing tasks to prevent catastrophic forgetting. Peng et al [34] present to train the hierarchical softmax function for deep language models for the new-coming tasks. FSLL [30] is proposed to perform on the few-shot setting by selecting very few parameters from the model.…”

Section: Life-long Learningmentioning

confidence: 99%

Learning with Recoverable Forgetting

Ye¹,

Fu²,

Song³

et al. 2022

Preprint

View full text Add to dashboard Cite

Life-long learning aims at learning a sequence of tasks without forgetting the previously acquired knowledge. However, the involved training data may not be life-long legitimate due to privacy or copyright reasons. In practical scenarios, for instance, the model owner may wish to enable or disable the knowledge of specific tasks or specific samples from time to time. Such flexible control over knowledge transfer, unfortunately, has been largely overlooked in previous incremental or decremental learning methods, even at a problem-setup level. In this paper, we explore a novel learning scheme, termed as Learning wIth Recoverable Forgetting (LIRF), that explicitly handles the task-or sample-specific knowledge removal and recovery. Specifically, LIRF brings in two innovative schemes, namely knowledge deposit and withdrawal, which allow for isolating user-designated knowledge from a pre-trained network and injecting it back when necessary. During the knowledge deposit process, the specified knowledge is extracted from the target network and stored in a deposit module, while the insensitive or general knowledge of the target network is preserved and further augmented. During knowledge withdrawal, the taken-off knowledge is added back to the target network. The deposit and withdraw processes only demand for a few epochs of finetuning on the removal data, ensuring both data and time efficiency. We conduct experiments on several datasets, and deomnstrate that the proposed LIRF strategy yields encouraging results with gratifying generalization capability.

show abstract

“…H-softmax technique substitutes the softmax layer with the hierarchical layer that considers words as leaves. This helps in decomposing the probability of one word into sequence of probability calculations which eliminates the need of expensive normalization of words [36,37]. It thus increases the speed of word prediction.…”

Section: Analysis Modelmentioning

confidence: 99%

Emotion Recognition and Classification of Film Reviews Based on Deep Learning and Multimodal Fusion

Sun

2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

In terms of cross-cultural exchanges, the film is not only an important embodiment of a country’s cultural soft power but also the most direct and favorable way of communication. The advent of the all-around well-off era has propelled people’s demand for spiritual, cultural, and entertainment which promotes the vigorous development of the film culture industry. The expansion and development of China’s film market, domestic films, are a reflection and extension of China’s culture and ideas. It plays an extremely important role in enhancing cultural self-confidence and cultural output. In order to better grasp the emotional tendency of the audience, understand the viewing needs, and put forward suggestions on domestic film production, it is very necessary to analyze the emotion of film reviews and dig deep into semantics. Since the evaluation of film works considers many factors that are complex and changeable, the choice of model plays a significant role in the process of emotion analysis. The deep learning model represented by a deep neural network has high tolerance to sentence noise, has strong information discrimination, and features self-learning ability. It also has great advantages in emotion classification tasks. This study conducts an in-depth study and research on the traditional emotion analysis methods and finally puts forward an effective emotion analysis framework that combines the traditional emotion analysis method and deep learning network. This framework enhances the text vectorization representation and emotion classification model by performing emotion analysis. The effectiveness is verified by corresponding experiments which justify the superiority of the approach.

show abstract

Incrementally Learning the Hierarchical Softmax Function for Neural Language Models

Cited by 55 publications

References 15 publications

Unbiased and Efficient Self-Supervised Incremental Contrastive Learning

Unbiased and Efficient Self-Supervised Incremental Contrastive Learning

Learning with Recoverable Forgetting

Emotion Recognition and Classification of Film Reviews Based on Deep Learning and Multimodal Fusion

Contact Info

Product

Resources

About