Self-training Improves Pre-training for Natural Language Understanding

Du, Jingfei; Grave, Édouard; Gunel, Beliz; Chaudhary, Vishrav; Çelebi, Onur; Auli, Michael; Stoyanov, Veselin; Conneau, Alexis

doi:10.18653/v1/2021.naacl-main.426

Cited by 71 publications

(66 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Self-training. We use the finetunned models with labeled data as teachers to generate pseudo soft labels on unlabeled data following (Du et al, 2021). Pseudo labeled data are combined with original labeled data to trained student models by optimizing objective function in Equation 2.…”

Section: Methodsmentioning

confidence: 99%

“…He et al (2020) injected noise to the input space as a noisy version of self-training for neural sequence generation and obtained state-of-the-art performance for tasks like neural machine translation. Du et al (2021) utilized information retrieval to retrieve task-specific in-domain data from a large bank of web sentences for self-training. Beyond these applications of self-training, Wei et al (2021) further theoretically proved that self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels under certain assumptions.…”

Section: Related Workmentioning

confidence: 99%

“…The student model is assigned as a teacher model in next round and the teacher-student training procedure is repeated until convergence or reaching maximum rounds. Self-training utilizes unlabeled data in a task-specific way during pseudo labeling process (Chen et al, 2020b) and has been successfully applied to a variety of tasks, including image recognition (Xie et al, 2020;Zoph et al, 2020), automatic speech recognition (Kahn et al, 2020), text classification (Du et al, 2021;Mukherjee and Awadallah, 2020), sequence labeling (Wang et al, 2021) and neural machine translation (He et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Li¹,

Yavuz²,

Chen³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Task-adaptive pre-training (TAPT) and Selftraining (ST) have emerged as the major semisupervised approaches to improve natural language understanding (NLU) tasks with massive amount of unlabeled data. However, it's unclear whether they learn similar representations or they can be effectively combined. In this paper, we show that TAPT and ST can be complementary with simple TFS protocol by following TAPT → Finetuning → Selftraining (TFS) process. Experimental results show that TFS protocol can effectively utilize unlabeled data to achieve strong combined gains consistently across six datasets covering sentiment classification, paraphrase identification, natural language inference, named entity recognition and dialogue slot classification. We investigate various semi-supervised settings and consistently show that gains from TAPT and ST can be strongly additive by following TFS procedure. We hope that TFS could serve as an important semi-supervised baseline for future NLP studies.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Li¹,

Yavuz²,

Chen³

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

show abstract

“…Huang et al (2019) explored a multitask sentence encoding model for semantic retrieval in QA systems. Du et al (2021) introduced SentAugment, a data augmentation method that computes task-specific query embeddings from labeled data to retrieve sentences from a bank of billions of unlabeled sentences crawled from the web. Yang et al (2020) uses the Universal Sentence Encoder (USE) for semantic similarity and semantic retrieval in a multilingual setting.…”

Section: Sentence-level Novelty Detectionmentioning

confidence: 99%

Novelty Detection: A Perspective from Natural Language Processing

Ghosal

Saikh

Biswas

et al. 2022

Computational Linguistics

View full text Add to dashboard Cite

The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known. With the exponential growth of information all across the web, there is an accompanying menace of redundancy. A considerable portion of the web contents are duplicates, and we need efficient mechanisms to retain new information and filter out redundant ones. However, detecting redundancy at the semantic level and identifying novel text is not straightforward because the text may have less lexical overlap yet convey the same information. On top of that, non-novel/redundant information in a document may have assimilated from multiple source documents, not just one. The problem surmounts when the subject of the discourse is documents, and numerous prior documents need to be processed to ascertain the novelty/non-novelty of the current one in concern. In this work, we build upon our earlier investigations for document-level novelty detection and present a comprehensive account of our efforts towards the problem. We explore the role of pre-trained Textual Entailment (TE) models to deal with multiple source contexts and present the outcome of our current investigations. We argue that a multi-premise entailment task is one close approximation towards identifying semantic-level non-novelty. Our recent approach either performs comparably or achieves significant improvement over the latest reported results on several datasets and across several related tasks (paraphrasing, plagiarism, rewrite). We critically analyze our performance with respect to the existing state-of-the-art and show the superiority and promise of our approach for future investigations. We also present our enhanced dataset TAP-DLND 2.0 and several baselines to the community for further researchon document-level novelty detection.

show abstract

“…Currently, best sentence-embeddings approaches are supervisedly trained using large labeled datasets (Conneau et al, 2017;Cer et al, 2018;Reimers and Gurevych, 2019; 1 Code available at https://github.com/ marco-digio/Twitter4SSE Du et al, 2021;Wieting et al, 2020;Huang et al, 2021), such as NLI datasets (Bowman et al, 2015;Williams et al, 2018) or paraphrase corpora (Dolan and Brockett, 2005). Round-trip translation has been also exploited, where semantically similar pairs of sentences are generated translating the non-English side of NMT pairs, as in ParaNMT (Wieting and Gimpel, 2018) and Opusparcus (Creutz, 2018).…”

Section: Introduction and Related Workmentioning

confidence: 99%

Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings

Giovanni

Brambilla

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Semantic sentence embeddings are usually supervisedly built minimizing distances between pairs of embeddings of sentences labelled as semantically similar by annotators. Since big labelled datasets are rare, in particular for non-English languages, and expensive, recent studies focus on unsupervised approaches that require not-paired input sentences. We instead propose a language-independent approach to build large datasets of pairs of informal texts weakly similar, without manual human effort, exploiting Twitter's intrinsic powerful signals of relatedness: replies and quotes of tweets. We use the collected pairs to train a Transformer model with triplet-like structures, and we test the generated embeddings on Twitter NLP similarity tasks (PIT and TURL) and STSb. We also introduce four new sentence ranking evaluation benchmarks of informal texts, carefully extracted from the initial collections of tweets, proving not only that our best model learns classical Semantic Textual Similarity, but also excels on tasks where pairs of sentences are not exact paraphrases. Ablation studies reveal how increasing the corpus size influences positively the results, even at 2M samples, suggesting that bigger collections of Tweets still do not contain redundant information about semantic similarities. 1

show abstract

Self-training Improves Pre-training for Natural Language Understanding

Cited by 71 publications

References 28 publications

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

Novelty Detection: A Perspective from Natural Language Processing

Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings

Contact Info

Product

Resources

About