1975
DOI: 10.1145/361219.361220
|View full text |Cite
|
Sign up to set email alerts
|

A vector space model for automatic indexing

Abstract: In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
2,760
0
116

Year Published

1987
1987
2017
2017

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 5,699 publications
(2,878 citation statements)
references
References 3 publications
2
2,760
0
116
Order By: Relevance
“…Documents and sentences were represented as vectors in a vector space model [26], each dimension corresponding to a term and measured along a number of metrics: term occurrence (TO, the number of occurrences of the term in the document); binary term occurrence (BTO, set to 1 only if TO>0, set to 0 otherwise); term frequency (TF, given by the TO divided by the total number of terms in the document) and term frequency-inverse document frequency (TF-IDF, given by the TF divided by the frequency of the term in the whole corpus).…”
Section: Methodsmentioning
confidence: 99%
“…Documents and sentences were represented as vectors in a vector space model [26], each dimension corresponding to a term and measured along a number of metrics: term occurrence (TO, the number of occurrences of the term in the document); binary term occurrence (BTO, set to 1 only if TO>0, set to 0 otherwise); term frequency (TF, given by the TO divided by the total number of terms in the document) and term frequency-inverse document frequency (TF-IDF, given by the TF divided by the frequency of the term in the whole corpus).…”
Section: Methodsmentioning
confidence: 99%
“…After the feature extraction, we have represented each sample in dataset using the Vector Space Model [36] which is commonly used model in information retrieval.…”
Section: W2vmentioning
confidence: 99%
“…Considering all of these features, it is quite challenging to find a numerical counterpart for a word, which preserves all of these properties and represents the same word in a numerical feature space. To this end there are well-known models such as Salton et al (1975) which try to transfer words and their syntactic and semantic information. Recently, NNs have become the established state-of-theart for creating distributed representations of words (and also other textual units such as characters etc.).…”
Section: Enriching Word Embeddings With Subword Informationmentioning
confidence: 99%