Proceedings of the Seventh Joint Conference on Lexical And Computational Semantics 2018
DOI: 10.18653/v1/s18-2010
|View full text |Cite
|
Sign up to set email alerts
|

The Limitations of Cross-language Word Embeddings Evaluation

Abstract: The aim of this work is to explore the possible limitations of existing methods of crosslanguage word embeddings evaluation, addressing the lack of correlation between intrinsic and extrinsic cross-language evaluation methods. To prove this hypothesis, we construct English-Russian datasets for extrinsic and intrinsic evaluation tasks and compare performances of 5 different cross-language models on them. The results say that the scores even on different intrinsic benchmarks do not correlate to each other. We ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
16
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 10 publications
(16 citation statements)
references
References 28 publications
0
16
0
Order By: Relevance
“…This produces word vectors (vector representations of words), but a related doc2vec algorithm can be used to obtain document vectors as well (Le and Mikolov 2014). There are some recent encouraging results with anomaly detection using the word2vec representation (Bertero et al 2017; Pande and Ahuja 2017; Bakarov et al 2018). A related embedding-based data representation was also proposed for anomaly detection in a series of categorical events (Chen et al 2016).…”
Section: Text Representationmentioning
confidence: 99%
See 4 more Smart Citations
“…This produces word vectors (vector representations of words), but a related doc2vec algorithm can be used to obtain document vectors as well (Le and Mikolov 2014). There are some recent encouraging results with anomaly detection using the word2vec representation (Bertero et al 2017; Pande and Ahuja 2017; Bakarov et al 2018). A related embedding-based data representation was also proposed for anomaly detection in a series of categorical events (Chen et al 2016).…”
Section: Text Representationmentioning
confidence: 99%
“…Besides the most common bag of words text representation, a recent more refined Global Vectors (GloVe) representation (Pennington, Socher, and Manning 2014) based on word embeddings is employed which makes it easy to control the dimensionality and apply arbitrary general-purpose classification and clustering algorithms. While text representations based on word embeddings are becoming popular (Goldberg and Levy 2014; Lau and Baldwin 2016; Bakarov 2018), there have been only few demonstrations of their utility for anomaly detection (Bertero et al 2017; Pande and Ahuja 2017; Bakarov et al 2018), using word2vec (Mikolov et al 2013a) rather than GloVe embeddings.The applied anomaly detection techniques based on one-class SVM and k -medoids cluster dissimilarity are probably for the first time combined with the GloVe representation, applied to text data, and compared to their counterparts using the bag of words representation. Prior work using word2vec for anomaly detection (Bertero et al 2017; Pande and Ahuja 2017; Bakarov et al 2018) did not combine it with clustering-based detection methods and did not include comparisons with bag of words.Unlike in most prior work on anomaly detection using word embeddings (Bertero et al 2017; Bakarov et al 2018), large datasets are used (tens of thousands rather than hundreds) to better match the scale of realistic applications.Unlike in most prior work on clustering-based anomaly detection (He et al 2003; Al-Zoubi 2009; Gao 2009; Amer and Goldstein 2012), the cluster dissimilarity approach is applied as a modeling algorithm, that is, with an anomaly detection model created on the training set and applicable to new data.New cluster dissimilarity-based anomaly score definitions are proposed that may be promising alternatives to those known from the literature (He et al 2003; Amer and Goldstein 2012).…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations