Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2001
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Learning in a Hierarchical Multiscale Neural Network

Abstract: We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
43
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(44 citation statements)
references
References 30 publications
0
43
0
1
Order By: Relevance
“…We fine-tuned BERT on our training dataset, which consisted of sentences (concepts and feature combinations) and corresponding true and false labels. We did this using the Hugging Face BertForSequenceClassification transformers package for Pytorch (Wolf et al, 2019). The trained BERT model outputted a 2-dimensional vector corresponding to activation in favor of true and activation in favor of false.…”
Section: Model Specificationmentioning
confidence: 99%
“…We fine-tuned BERT on our training dataset, which consisted of sentences (concepts and feature combinations) and corresponding true and false labels. We did this using the Hugging Face BertForSequenceClassification transformers package for Pytorch (Wolf et al, 2019). The trained BERT model outputted a 2-dimensional vector corresponding to activation in favor of true and activation in favor of false.…”
Section: Model Specificationmentioning
confidence: 99%
“…For the BERT‐based classification model, we use the Simple‐Representations PyPI library, which enables its users to extract text representations form the pre‐trained models and feed them directly as features to a Keras built neural network. We extract representations from the BERT model available at the huggingface.co (Wolf et al, 2019) website. Specifically, we use the “bert‐base‐multilingual‐uncased” version for both the English and Spanish languages.…”
Section: Methodsmentioning
confidence: 99%
“…It is therefore challenging to draw general conclusions on recurrent networks in CL from this kind of experiments. Examples of applications are online learning of language models where new words are added incrementally [71,113,63], continual learning in neural machine translation on multiple languages [102] and sentiment analysis on multiple domains [77].…”
Section: Survey Of Continual Learning In Recurrent Modelsmentioning
confidence: 99%
“…Stroke MNIST [99,33] stroke classification SIT+(NI/NC) Quick, Draw! † stroke classification SIT+NC MNIST-like [26] [25] † object classification SIT+(NI/NC) CORe50 [88] object recognition SIT+(NI/NC) MNLI [10] domain adaptation SIT+NI MDSD [77] sentiment analysis SIT+NI WMT17 [14] NMT MT+NC OpenSubtitles18 [73] NMT MT+NC WIPO COPPA-V2 [60] [102] NMT MT+NC CALM [63] language modeling Online WikiText-2 [113] language modeling SIT+NI/NC Audioset [26,33] sound classification SIT+NC LibriSpeech, Switchboard [114] speech recognition (SIT/MT)+NC Synthetic Speech Commands † sound classification SIT+NC Acrobot [62] reinforcement learning MT+NI Table 3: Datasets used in continual learning for sequential data processing. The scenario column indicates in which scenario the dataset has been used (or could be used when the related paper does not specify this information).…”
Section: Dataset Application Scenariomentioning
confidence: 99%