Julien Chaumond scite author profile

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered stateof-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at https://github.com/ huggingface/transformers.

show abstract

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

Wolf¹,

Sanh²,

Chaumond³

et al. 2019

Preprint

134

179

View full text Add to dashboard Cite

We introduce a new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transfo-rmer model. Fine-tuning is performed by using a multi-task objective which combines several unsupervised prediction tasks. The resulting fine-tuned model shows strong improvements over the current state-ofthe-art end-to-end conversational models like memory augmented seq2seq and information-retrieval models. On the privately held PERSONA-CHAT dataset of the Conversational Intelligence Challenge 2, this approach obtains a new state-ofthe-art, respectively pushing the perplexity, Hits@1 and F1 metrics to 16.28 (45% absolute improvement), 80.7 (46% absolute improvement) and 19.5 (20% absolute improvement).

show abstract

Datasets: A Community Library for Natural Language Processing

Lhoest¹,

Moral²,

Jernite

et al. 2021

160

View full text Add to dashboard Cite

Continuous Learning in a Hierarchical Multiscale Neural Network

Wolf

Chaumond²,

Delangue³

2018

View full text Add to dashboard Cite

We reformulate the problem of encoding a multi-scale representation of a sequence in a language model by casting it in a continuous learning framework. We propose a hierarchical multi-scale language model in which short time-scale dependencies are encoded in the hidden state of a lower-level recurrent neural network while longer time-scale dependencies are encoded in the dynamic of the lower-level network by having a meta-learner update the weights of the lower-level neural network in an online meta-learning fashion. We use elastic weights consolidation as a higher-level to prevent catastrophic forgetting in our continuous learning framework.

show abstract

Datasets: A Community Library for Natural Language Processing

Lhoest¹,

Moral²,

Jernite³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Julien Chaumond

Transformers: State-of-the-Art Natural Language Processing

TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents

Datasets: A Community Library for Natural Language Processing

Continuous Learning in a Hierarchical Multiscale Neural Network

Datasets: A Community Library for Natural Language Processing

Contact Info

Product

Resources

About