2022
DOI: 10.1111/cogs.13085
|View full text |Cite
|
Sign up to set email alerts
|

Context Matters: Recovering Human Semantic Structure from Machine Learning Analysis of Large‐Scale Text Corpora

Abstract: Applying machine learning algorithms to automatically infer relationships between concepts from large-scale collections of documents presents a unique opportunity to investigate at scale how human semantic knowledge is organized, how people use it to make fundamental judgments ("How similar are cats and bears?"), and how these judgments depend on the features that describe concepts (e.g., size, furriness). However, efforts to date have exhibited a substantial discrepancy between algorithm predictions and human… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 81 publications
0
9
0
1
Order By: Relevance
“…T. Merten отмечает положительную корреляцию между выраженностью психотизма и числом уникальных ассоциативных реакций в различных условиях проведения ассоциативного эксперимента (Merten, 1993). В названных работах, а также в исследовании (Innes, 1972) (Черкасова, 2008), содержащего 253 стимула, отобранных по материалам трех ассоциативных опросов, проведенных с интервалами 10-20 лет с носителями русского языка. В данный словарь вошли русские слова-стимулы, которые повторялись в трех или двух массовых ассоциативных опросах, по материалам которых опубли-кованы важнейшие ассоциативные словари русского языка.…”
Section: Code and Dataset Are Available On Githubunclassified
“…T. Merten отмечает положительную корреляцию между выраженностью психотизма и числом уникальных ассоциативных реакций в различных условиях проведения ассоциативного эксперимента (Merten, 1993). В названных работах, а также в исследовании (Innes, 1972) (Черкасова, 2008), содержащего 253 стимула, отобранных по материалам трех ассоциативных опросов, проведенных с интервалами 10-20 лет с носителями русского языка. В данный словарь вошли русские слова-стимулы, которые повторялись в трех или двух массовых ассоциативных опросах, по материалам которых опубли-кованы важнейшие ассоциативные словари русского языка.…”
Section: Code and Dataset Are Available On Githubunclassified
“…To identify the degree to which the updated embedding yielded improved prediction of fine-grained similarity, we used 8 existing datasets from three studies [49][50][51] that had examined within category similarity. Note that predicted similarities are likely underestimated, given that the original similarity datasets were collected using different image examples and/or tasks.…”
Section: Fine-grained Prediction Of Perceived Similaritymentioning
confidence: 99%
“…Third, while increases in dataset size did not lead to notable improvements in overall performance, did increasing the dataset size improve more fine-grained predictions of similarity? To address this question, we used several existing datasets of within-category similarity ratings [49][50][51] and computed similarity predictions. Rather than computing similarity across all possible triplets, these predictions were constrained to triplet contexts within superordinate categories (e.g.…”
Section: Data Quality and Data Reliability In The Behavioral Odd-one ...mentioning
confidence: 99%
“…Word embeddings are largely founded on the notion of semantic similarity, and ensuring that word vector similarities match human judgments has been an important goal (e.g., Baroni et al, 2014 ; Pereira et al, 2016 ; An et al, 2018 ; Grand et al, 2018 ; Iordan et al, 2022 ). Less attention has been paid to whether the actual structure of a DSM's similarity space matches what is known about the human lexicon.…”
Section: Inspiration From Human Lexical Abilitiesmentioning
confidence: 99%
“…asteroid, belt , and buckle , Griffiths et al, 2007 )—are not consistently captured in embedding spaces (Griffiths et al, 2007 ; Nematzadeh et al, 2017 ; Rodriguez and Merlo, 2020 ). Building on the insight from Griffiths et al ( 2007 ) that interpretation of a word within the context of a topic can resolve some of these mismatches with human judgments by appropriately disambiguating the words, one avenue for the future may be to consider word embeddings that are topically-constrained (such as in Iordan et al, 2022 ).…”
Section: Inspiration From Human Lexical Abilitiesmentioning
confidence: 99%