2020
DOI: 10.1609/aaai.v34i05.6485
|View full text |Cite
|
Sign up to set email alerts
|

Causally Denoise Word Embeddings Using Half-Sibling Regression

Abstract: Distributional representations of words, also known as word vectors, have become crucial for modern natural language processing tasks due to their wide applications. Recently, a growing body of word vector postprocessing algorithm has emerged, aiming to render off-the-shelf word vectors even stronger. In line with these investigations, we introduce a novel word vector postprocessing scheme under a causal inference framework. Concretely, the postprocessing pipeline is realized by Half-Sibling Regression (HSR), … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…After obtaining the cleaned word list, we represent each word by its word embedding and pile the embeddings together to obtain a matrix representation of the text. Word embedding represents a word by a vector of real numbers that preserves the semantic similarities between words, and it has been used by many downstream natural language processing tasks (Yang and Liu, 2020). In this study, we use FastText, a pre-trained 300-dimensional Chinese word embedding (Grave et al , 2018).…”
Section: Methodsmentioning
confidence: 99%
“…After obtaining the cleaned word list, we represent each word by its word embedding and pile the embeddings together to obtain a matrix representation of the text. Word embedding represents a word by a vector of real numbers that preserves the semantic similarities between words, and it has been used by many downstream natural language processing tasks (Yang and Liu, 2020). In this study, we use FastText, a pre-trained 300-dimensional Chinese word embedding (Grave et al , 2018).…”
Section: Methodsmentioning
confidence: 99%
“…Likewise, it is infeasible to take each particular word token (in some sentence) as a potential confounder, and it is also expensive to discuss all high-level words since there are about 30,000 words in the Bert vocabulary. In this paper, we choose nouns as potential confounders since 1) nouns are content words that have meaning or semantic value [53]; 2) the role of nouns is similar to the role of objects in image, which might ease the inter-modality intervention. Specifically, we use the NLTK toolkit [6] to perform Part-of-Speech Tagging, and choose word tokens of which the tags belong to ["N N ", "N N S", "N N P", "N N PS"] as potential confounders.…”
Section: Intra-and Inter-modality Interventionmentioning
confidence: 99%
“…Specifically, we adopt the architecture of visiolinguistic Bert [26] and choose nouns in user queries as the confounders in the model to mitigate spurious correlations between words [45]. Since nouns are content words that have meanings or semantic value [41,45], it is more likely for keywords as nouns to form spurious correlations because of high frequency to appear in the same sentences. Also, as the role of nouns is similar to the role of objects in images, spurious correlations caused by nouns can be harmful to the correctness of the vision-language fusion.…”
Section: Introductionmentioning
confidence: 99%