2019
DOI: 10.48550/arxiv.1908.06725
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

Zhi-Xiu Ye,
Qian Chen,
Wen Wang
et al.

Abstract: Neural language representation models such as Bidirectional Encoder Representations from Transformers (BERT) pretrained on large-scale corpora can well capture rich semantics from plain text, and can be fine-tuned to consistently improve the performance on various natural language processing (NLP) tasks. However, the existing pre-trained language representation models rarely consider explicitly incorporating commonsense knowledge or other knowledge. In this paper, we develop a pre-training approach for incorpo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(16 citation statements)
references
References 19 publications
0
16
0
Order By: Relevance
“…Such strategy is adopted by multiple existing approaches: ERNIE (BAIDU) [83] ERNIE(THU) [108] CokeBERT [80] KgPLM [26], LUKE [100], GLM [77], KALM [72], CoLAKE [82], JAKET [105] and AMS [103]. A typical choice of entityrelated objective is an entity linking loss which predicts the entity mention in text to entity in KG with a cross entropy loss or max-margin loss on the prediction [108][100][105] [72].…”
Section: Entity Knowledgementioning
confidence: 99%
See 2 more Smart Citations
“…Such strategy is adopted by multiple existing approaches: ERNIE (BAIDU) [83] ERNIE(THU) [108] CokeBERT [80] KgPLM [26], LUKE [100], GLM [77], KALM [72], CoLAKE [82], JAKET [105] and AMS [103]. A typical choice of entityrelated objective is an entity linking loss which predicts the entity mention in text to entity in KG with a cross entropy loss or max-margin loss on the prediction [108][100][105] [72].…”
Section: Entity Knowledgementioning
confidence: 99%
“…Since many existing approaches incorporate entity-level knowledge, entity related tasks (e.g., entity typing and relation classification) become natural testbeds for evaluating the efficacy of these KE-PLMs. By injecting entity information Knowledge Source ERICA [66] Yes entity/relation discrimination Wikipedia, Wikidata ERNIE (THU) [108] Yes entity prediction Wikipedia/Wikidata ERNIE 2.0 (Baidu) [83] Yes masked entity/phrase N/A E-BERT [64] Yes entity/wordpiece alignment Wikipedia2Vec E(commerce)-BERT [106] Yes neighbor Product Reconstruction product graph/AutoPhrase[75] EaE [21] Yes mention detection/linking Wikipedia CokeBERT [80] Yes entity prediction Wikipedia/Wikidata COMET [6] No autoregressive ATOMIC, ConceptNet K-Adapter [96] No dependency relation Wikipedia, Wikidata, Stanford Parser KnowBERT [61] Yes entity linking WordNet, Wikipedia K-BERT [49] No finetuning WikiZh, WebtextZh, CN-DBpedia HowNet, MedicalKG KEPLER [97] Yes TransE scoring Wikipedia/Wikidata KG-BERT [102] Yes relation cross-entropy ConceptNet KG-BART [51] Yes masked concept ConceptNet KgPLM [26] Yes generative/discriminative masked entity Wikipedia/Wikidata FaE [90] Yes masked entity et.al Wikipedia/Wikidata JAKET [105] Yes entity category/relation type/masked entity Wikipedia/Wikidata LUKE [100] Yes entity prediction Wikipedia WKLM [99] Yes entity replacement detection Wikipedia/Wikidata CoLAKE [82] Yes masked entity prediction Wikipedia/Wikidata KT-NET [101] No finetuning N/A LIBERT [40] Yes lexical relation prediction WordNet SenseBERT [41] Yes supersense prediction WordNet Syntax-BERT [2] No masks induced by syntax tree parsing syntax tree SentiLARE [36] Yes POS/ word level polarity/sentiment polarity SentiWordNet [44] No finetuning ConceptNet COCOLM [104] Yes discourse relation/co-occurrence relation ASER [22] Yes autoregressive ConceptNet/ATOMIC AMS [103] Yes distractor-based loss ConceptNet GLM …”
Section: Entity-related Tasksmentioning
confidence: 99%
See 1 more Smart Citation
“…Discriminative tasks ERNIE-Baidu (Sun et al 2020) phrase and entity masking -SenseBERT (Levine et al 2019) supersense prediction -BERT CS (Ye et al 2019) multi-choice question answering LIBERT (Lauscher et al 2019) lexical relation prediction LIMIT-BERT (Zhou, Zhang, and Zhao 2019) semantic/syntactic phrase masking -KEPLER (Wang et al 2019) knowledge representation learning SpanBERT (Joshi et al 2020a) span masking -WKLM (Xiong et al 2020) entity replacement checking K-Adapter relation classification, dependency relation prediction T5+SSM (Roberts, Raffel, and Shazeer 2020) salient span masking -TEK (Joshi et al 2020b) span masking on TEK-augmented text -CN-ADAPT (Lauscher et al 2020) MLM training on synthetic knowledge corpus -KgPLM (ours) knowledge span masking knowledge span replacement checking In this paper, we design masked span prediction as the generative knowledge completion task, and span replacement checking as the discriminative knowledge verification task. Hybrid knowledge, including link structure of Wikipedia and structured knowledge graph in Wikidata, is used to guide the both tasks.…”
Section: Generative Tasksmentioning
confidence: 99%
“…Besides, Rosset et al (2020) introduce entity information into an autoregressive language model, called KALM, which identifies the entity surface in a text sequence and maps word n-grams into entities to obtain an entity sequence for knowledge-aware language model pre-training. Ye et al (2019) propose a discriminative pre-training approach for incorporating commonsense knowledge into the language model, in which the question is concatenated with different candidates to construct a multi-choice question answering sample, and each choice is used to predict whether the candidate is the correct answer. KEPLER (Wang et al 2019) unifies knowledge representation learning and language modeling objectives, which builds a bridge between text representation and knowledge embeddings by encoding entity descriptions, and can better integrate factual knowledge into the pre-trained language model.…”
Section: Related Workmentioning
confidence: 99%