2019
DOI: 10.1111/cogs.12730
|View full text |Cite
|
Sign up to set email alerts
|

The Role of Negative Information in Distributional Semantic Learning

Abstract: Distributional models of semantics learn word meanings from contextual co‐occurrence patterns across a large sample of natural language. Early models, such as LSA and HAL (Landauer & Dumais, 1997; Lund & Burgess, 1996), counted co‐occurrence events; later models, such as BEAGLE (Jones & Mewhort, 2007), replaced counting co‐occurrences with vector accumulation. All of these models learned from positive information only: Words that occur together within a context become related to each other. A recent class of d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
41
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 24 publications
(41 citation statements)
references
References 49 publications
0
41
0
Order By: Relevance
“…Hollis (2020) was correct in calling out many issues in the theoretical development of contextual diversity accounts of lexical organization. For experienced-based theories of cognition to be plausible, the types of materials that a model is trained upon must be coherent with the types of experience that a typical person also receives (Johns, Jones, & Mewhort, 2019).…”
Section: Contextual Diversity 10mentioning
confidence: 99%
See 1 more Smart Citation
“…Hollis (2020) was correct in calling out many issues in the theoretical development of contextual diversity accounts of lexical organization. For experienced-based theories of cognition to be plausible, the types of materials that a model is trained upon must be coherent with the types of experience that a typical person also receives (Johns, Jones, & Mewhort, 2019).…”
Section: Contextual Diversity 10mentioning
confidence: 99%
“…switched to using a word frequency distributional representation, where the context representation was a count of each word that occurred in a context (defined at a much larger scale than previous implementations, namely a whole book or the writings of an individual author). A word's representation was then the sum of the contexts (word frequency distributions) that a word occurred in, similar to count-based models of distributional semantics (Johns, Mewhort, & Jones, 2019). Johns (2021) further modified this approach by using population representations (PR) which contain the communication patterns of individuals across discourses.…”
Section: The Semantic Distinctiveness Model (Sdm)mentioning
confidence: 99%
“…the product of each of their probabilities), the resulting PMI yields a negative association between the two words. Our work complements work by Johns et al (2019) by further showing how the influence of negative information can best emerge through iterative feedback to drive generalization in a dynamical system. Despite using the same weight matrix, the Dynamic-Eigen-Net algorithm outperformed both the persistent Linear-Associative-Net and the Brain-State-in-a-Box.…”
Section: Discussionmentioning
confidence: 60%
“…Again, we do not propose that using PMI is necessary, as other normalization measures may be used, however, it is possible that inhibitory connections are required. Recently, Johns et al (2019) showed how negative information greatly improves semantic benchmarks for various representations of meaning. Our use of PMI for normalizing the weight matrix is likely benefiting from the resulting negative information.…”
Section: Discussionmentioning
confidence: 99%
“…It was initially believed that predictive neural networks were able to more accurately discriminate between words because of back-propagation or the connectionist architectures they commonly use (which is one of the reasons this architecture has become so popular). However, recently the role of negative sampling in DSMs has been explored in more depth by Johns, Mewhort, & Jones (2019) who find that the success predictive neural networks have at discriminating between words is due to the inclusion of negative information in the training data-not the use of connectionist architecture or predictive error correction. In fact, Johns et al demonstrated when negative sampling information is included in the training data for other DSMs, including random vector accumulators , their ability to discriminate words is on par with predictive neural networks.…”
Section: In Neural Nets 31mentioning
confidence: 99%