Recent Transformer-based architectures, e.g., BERT, provide impressive results in many Natural Language Processing tasks. However, most of the adopted benchmarks are made of (sometimes hundreds of) thousands of examples. In many real scenarios, obtaining highquality annotated data is expensive and timeconsuming; in contrast, unlabeled examples characterizing the target task can be, in general, easily collected. One promising method to enable semi-supervised learning has been proposed in image processing, based on Semi-Supervised Generative Adversarial Networks. In this paper, we propose GAN-BERT that extends the fine-tuning of BERT-like architectures with unlabeled data in a generative adversarial setting. Experimental results show that the requirement for annotated examples can be drastically reduced (up to only 50-100 annotated examples), still obtaining good performances in several sentence classification tasks.
Kernel-based learning algorithms have been shown to achieve state-of-the-art results in many Natural Language Processing (NLP) tasks. We present KELP, a Java framework that supports the implementation of both kernel-based learning algorithms and kernel functions over generic data representation, e.g. vectorial data or discrete structures. The framework has been designed to decouple kernel functions and learning algorithms: once a new kernel function has been implemented it can be adopted in all the available kernelmachine algorithms. The platform includes different Online and Batch Learning algorithms for Classification, Regression and Clustering, as well as several Kernel functions, ranging from vector-based to structural kernels. This paper will show the main aspects of the framework by applying it to different NLP tasks.
Named Entity Recognition (NER) is a vital task in various NLP applications. However, in many real-world scenarios (e.g., voice-enabled assistants) new named entities are frequently introduced, entailing re-training NER models to support these new entities. Re-annotating the original training data for the new entities could be costly or even impossible when storage limitations or security concerns restrict access to that data, and annotating a new dataset for all of the entities becomes impractical and error-prone as the number of entities increases. To tackle this problem, we introduce a novel Continual Learning approach for NER, which requires new training material to be annotated only for the new entities. To preserve the existing knowledge previously learned by the model, we exploit the Knowledge Distillation (KD) framework, where the existing NER model acts as the teacher for a new NER model (i.e., the student), which learns the new entity by using the new training material and retains knowledge of old entities by imitating the teacher's outputs on this new training set. Our experiments show that this approach allows the student model to ``progressively'' learn to identify new entities without forgetting the previously learned ones. We also present a comparison with multiple strong baselines to demonstrate that our approach is superior for continually updating an NER model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.