2009
DOI: 10.1016/j.jbi.2009.02.002
|View full text |Cite
|
Sign up to set email alerts
|

Empirical distributional semantics: Methods and biomedical applications

Abstract: Over the past fifteen years, a range of methods have been developed that are able to learn human-like estimates of the semantic relatedness between terms from the way in which these terms are distributed in a corpus of unannotated natural language text. These methods have also been evaluated in a number of applications in the cognitive science, computational linguistics and the information retrieval literatures. In this paper, we review the available methodologies for derivation of semantic relatedness from fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
89
0
1

Year Published

2010
2010
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 157 publications
(90 citation statements)
references
References 69 publications
0
89
0
1
Order By: Relevance
“…A subset of these methods, distributional semantics, relies on the co-occurrence information between words obtained from large corpora of text and makes the assumption that words with similar or related meanings tend to occur in similar contexts. This approach is foundational to a number of higher-level tasks including information retrieval, word sense ambiguity resolution, automatic synonym generation and recognition, and literature-based knowledge discovery, among many others (see Cohen and Widdows, 2009 for a comprehensive review).…”
Section: Introductionmentioning
confidence: 99%
“…A subset of these methods, distributional semantics, relies on the co-occurrence information between words obtained from large corpora of text and makes the assumption that words with similar or related meanings tend to occur in similar contexts. This approach is foundational to a number of higher-level tasks including information retrieval, word sense ambiguity resolution, automatic synonym generation and recognition, and literature-based knowledge discovery, among many others (see Cohen and Widdows, 2009 for a comprehensive review).…”
Section: Introductionmentioning
confidence: 99%
“…The latent variables which are considered in PLSA correspond to topics. The probabilistic model relies on two conditional probabilities: the probability that a word is associated to a given topic and the probability that a document refers to a topic -refer to the work of Cohen and Widdows (2009) for details.…”
Section: The Probabilistic Approachmentioning
confidence: 99%
“…The most representative models include Latent Sematic Analysis (LSA) [2] based on statistical analysis; Random Indexing [16], which employs random sparse vectors and random permutations; and BEAGLE [20], which computes vectors using circular convolution. For recent surveys of semantic space models, see [17,18].…”
Section: Vector Representations and Reduced Descriptionsmentioning
confidence: 99%