2018
DOI: 10.1007/978-3-319-75477-2_6
|View full text |Cite
|
Sign up to set email alerts
|

New Word Analogy Corpus for Exploring Embeddings of Czech Words

Abstract: The word embedding methods have been proven to be very useful in many tasks of NLP (Natural Language Processing). Much has been investigated about word embeddings of English words and phrases, but only little attention has been dedicated to other languages. Our goal in this paper is to explore the behavior of state-of-the-art word embedding methods on Czech, the language that is characterized by very rich morphology. We introduce new corpus for word analogy task that inspects syntactic, morphosyntactic and sem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
16
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(17 citation statements)
references
References 12 publications
1
16
0
Order By: Relevance
“…Word2vec vectors for Czech experiments are trained on Czech Wikipedia (Svoboda and Brychcín, 2016). For the English experiments we utilize the standard vectors trained on part of Google News dataset (Mikolov et al, 2013b).…”
Section: Tools and Corporamentioning
confidence: 99%
“…Word2vec vectors for Czech experiments are trained on Czech Wikipedia (Svoboda and Brychcín, 2016). For the English experiments we utilize the standard vectors trained on part of Google News dataset (Mikolov et al, 2013b).…”
Section: Tools and Corporamentioning
confidence: 99%
“…This is the reason why we bring up the comparison with highly inflected language. In [33] and [32] has been shown that there is a space for the performance improvement of current state-of-the-art word embedding models on languages from Slavic families. More information about individual section of Czech word analogy corpus is described in [33].…”
Section: Discussionmentioning
confidence: 99%
“…Such a question can be answered with a simple equation: vec(king) − vec(queen) = vec(man) − vec(woman). We evaluate on English and Czech word analogy datasets, proposed by [21] and [33], respectively. The word-phrases were excluded from original datasets, resulting in 8869 semantic and 10,675 syntactic questions for English (19,544 in total), and 6018 semantic.…”
Section: Trainingmentioning
confidence: 99%
“…Finally, most evaluations of word2vec embeddings focus on English, with notable exceptions (Köper et al, 2015;Berardi et al, 2015;Svoboda and Brychcín, 2018;Venekoski and Vankka, 2017;Rodrigues et al, 2016;Chen et al, 2015;Grave et al, 2018). However, these are translations of word similarity tasks and share the weaknesses of their English language counterparts.…”
Section: Related Workmentioning
confidence: 99%