Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons

Jo, Hwiyeol; Choi, Stanley Jungkyu

doi:10.18653/v1/w18-3003

Cited by 5 publications

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use GloVe (Pennington et al, 2014) as pretrained embeddings. To increase model performance, we apply a word vector post-processing method called extrofitting (Jo and Choi, 2018). We prepare 3 topic classification datasets; DBpedia ontology (DBpedia) (Lehmann et al, 2015), YahooAnswers (Yahoo) (Chang et al, 2008), AG-News.…”

Section: Methodsmentioning

confidence: 99%

Devil’s Advocate: Novel Boosting Ensemble Method from Psychological Findings for Text Classification

Jo¹,

Lim²,

Zhang³

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

Self Cite

View full text Add to dashboard Cite

We present a new form of ensemble method-Devil's Advocate, which uses a deliberately dissenting model to force other submodels within the ensemble to better collaborate. Our method consists of two different training settings: one follows the conventional training process (Norm), and the other is trained by artificially generated labels (DevAdv). After training the models, Norm models are fine-tuned through an additional loss function, which uses the DevAdv model as a constraint. In making a final decision, the proposed ensemble model sums the scores of Norm models and then subtracts the score of the De-vAdv model. The DevAdv model improves the overall performance of the other models within the ensemble. In addition to our ensemble framework being based on psychological background, it also shows comparable or improved performance on 5 text classification tasks when compared to conventional ensemble methods.

show abstract

Section: Methodsmentioning

confidence: 99%

Devil’s Advocate: Novel Boosting Ensemble Method from Psychological Findings for Text Classification

Jo¹,

Lim²,

Zhang³

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Joint specialization models (Yu and Dredze, 2014;Kiela et al, 2015;Liu et al, 2015;Osborne et al, 2016;Nguyen et al, 2017, inter alia) jointly train word embedding models from scratch and enforce the external constraints with an auxiliary objective. On the other hand, retrofitting models are postprocessors that fine-tune pretrained word embeddings by gauging pairwise distances according to the external constraints (Faruqui et al, 2015;Wieting et al, 2015;Mrkšić et al, 2016;Mrkšić et al, 2017;Jo and Choi, 2018;Lengerich et al, 2018).…”

Section: Specialization For Semantic Similaritymentioning

confidence: 99%

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Lauscher¹,

Vulić²,

Ponti³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Unsupervised pretraining models have been shown to facilitate a wide range of downstream NLP applications. These models, however, retain some of the limitations of traditional static word embeddings. In particular, they encode only the distributional knowledge available in raw text corpora, incorporated through language modeling objectives. In this work, we complement such distributional knowledge with external lexical knowledge, that is, we integrate the discrete knowledge on word-level semantic similarity into pretraining. To this end, we generalize the standard BERT model to a multi-task learning setting where we couple BERT's masked language modeling and next sentence prediction objectives with an auxiliary task of binary word relation classification. Our experiments suggest that our "Lexically Informed" BERT (LIBERT), specialized for the word-level semantic similarity, yields better performance than the lexically blind "vanilla" BERT on several language understanding tasks. Concretely, LIBERT outperforms BERT in 9 out of 10 tasks of the GLUE benchmark and is on a par with BERT in the remaining one. Moreover, we show consistent gains on 3 benchmarks for lexical simplification, a task where knowledge about word-level semantic similarity is paramount, as well as large gains on lexical reasoning probes.

show abstract

“…Post-hoc Approaches. In the post-hoc approach, pre-trained word vectors such as GloVe (Pennington et al, 2014), Word2Vec (Mikolov et al, 2013), FastText (Bojanowski et al, 2017), or Paragram (Wieting et al, 2015) are fine-tuned to endow them with lexical relational information (Faruqui et al, 2015;Rothe and Schütze, 2015;Wieting et al, 2015;Mrkšić et al, 2016Jo, 2018;Jo and Choi, 2018;Glavaš and Vulić, 2018). In this paper, we primarily discuss LEXSUB as a post-hoc model.…”

Section: Related Workmentioning

confidence: 99%

Learning Lexical Subspaces in a Distributional Vector Space

Arora

Chakraborty

Cheung

2020

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

In this paper, we propose LexSub, a novel approach towards unifying lexical and distributional semantics. We inject knowledge about lexical-semantic relations into distributional word embeddings by defining subspaces of the distributional vector space in which a lexical relation should hold. Our framework can handle symmetric attract and repel relations (e.g., synonymy and antonymy, respectively), as well as asymmetric relations (e.g., hypernymy and meronomy). In a suite of intrinsic benchmarks, we show that our model outperforms previous approaches on relatedness tasks and on hypernymy classification and detection, while being competitive on word similarity tasks. It also outperforms previous systems on extrinsic classification tasks that benefit from exploiting lexical relational cues. We perform a series of analyses to understand the behaviors of our model. 1 Code available at https://github.com/aishikchakraborty/LexSub .

show abstract

Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons

Cited by 5 publications

References 20 publications

Devil’s Advocate: Novel Boosting Ensemble Method from Psychological Findings for Text Classification

Devil’s Advocate: Novel Boosting Ensemble Method from Psychological Findings for Text Classification

Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity

Learning Lexical Subspaces in a Distributional Vector Space

Contact Info

Product

Resources

About