2017
DOI: 10.1016/j.jml.2016.04.001
|View full text |Cite
|
Sign up to set email alerts
|

Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

19
518
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 387 publications
(538 citation statements)
references
References 66 publications
19
518
1
Order By: Relevance
“…A difficulty metric was computed for each of these words, by summing the z scores for frequency (Reppen et al, ), concreteness (Coltheart, ), age of acquisition (Kuperman, Stadthagen‐Gonzalez, & Brysbaert, ), and length (number of letters); frequency and age of acquisition were doubly weighted. The pairwise semantic distances between all of the 4,000 words were estimated with snaut (Mandera, Keuleers, & Brysbaert, ), a prediction‐based model of distributional semantics derived from corpora. For each word, the ten most related words output by snaut were considered as possible match items (these items were not necessarily actually closely related to the target word, given the limitations of the computational model).…”
Section: Methodsmentioning
confidence: 99%
“…A difficulty metric was computed for each of these words, by summing the z scores for frequency (Reppen et al, ), concreteness (Coltheart, ), age of acquisition (Kuperman, Stadthagen‐Gonzalez, & Brysbaert, ), and length (number of letters); frequency and age of acquisition were doubly weighted. The pairwise semantic distances between all of the 4,000 words were estimated with snaut (Mandera, Keuleers, & Brysbaert, ), a prediction‐based model of distributional semantics derived from corpora. For each word, the ten most related words output by snaut were considered as possible match items (these items were not necessarily actually closely related to the target word, given the limitations of the computational model).…”
Section: Methodsmentioning
confidence: 99%
“…We defined semantic similarity as the cosine between distributional vectors representing two words in a multi-dimensional lexical space. To this end, we collected pairwise estimates of semantic similarity using the Latent Semantic Analysis (LSA) from the UKWAC and SUBTLEX-UK corpus of English (available at http://zipf.ugent.be/snaut-english/, Mandera, Keuleers, & Brysbaert, in press). This application uses a 300-dimensional semantic space with CBOW embeddings, and a 6-word window for calculating co-occurrence statistics.…”
Section: Methodsmentioning
confidence: 99%
“…This application uses a 300-dimensional semantic space with CBOW embeddings, and a 6-word window for calculating co-occurrence statistics. The LSA solution for Dutch was trained on SONAR-500 and subtitle corpora, and used a 200-dimensional semantic space with CBOW embeddings and a window of 10 (available at http://zipf.ugent.be/snaut-dutch/, Mandera et al, in press). A greater LSA score indicates a greater dissimilarity between the meanings of a pair of words.…”
Section: Methodsmentioning
confidence: 99%
“…Several studies have already shown correspondence between priming magnitude and VSM measures of relation such as cosine similarity or neighbor rank (Mandera et al, 2016;Lapesa and Evert, 2013;Jones et al, 2006;Padó and Lapata, 2007;Herdagdelen et al, 2009;McDonald and Brew, 2004). These positive results suggest that some of the implicit relation structure in the human brain is already reflected in current vector space models, and that it is in fact feasible to evaluate relation structure of VSMs by testing their ability to predict this implicit human measure.…”
Section: Semantic Primingmentioning
confidence: 96%