2017
DOI: 10.1016/j.neucom.2017.05.046
|View full text |Cite
|
Sign up to set email alerts
|

Bag-of-concepts: Comprehending document representation through clustering words in distributed representation

Abstract: Two document representation methods are mainly used in solving text mining problems. Known for its intuitive and simple interpretability, the bag-ofwords method represents a document vector by its word frequencies. However, this method suffers from the curse of dimensionality, and fails to preserve accurate proximity information when the number of unique words increases. Furthermore, this method assumes every word to be independent, disregarding the impact of semantically similar words on preserving document p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
71
2

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 145 publications
(76 citation statements)
references
References 19 publications
0
71
2
Order By: Relevance
“…finding synonyms (Griffiths, Steyvers, & Tenenbaum, 2007), word clustering (Kovatchev, Salamo, & Marti, 2016), sentiment analysis (Duyu, Wei, Yang, Ming, Ting, & Bing, 2014), word sense disambiguation (Basile, Caputo, & Semeraro, 2014), metaphor recognition (Shutova, Sun, Gutierrez, Lichtenstein, & Narayanan, 2017), sentence paraphrasing , and documents classification (Kim, Kim, & Cho, 2017). Many problems are best solved with the help of DS models, e.g.…”
Section: Introductionmentioning
confidence: 99%
“…finding synonyms (Griffiths, Steyvers, & Tenenbaum, 2007), word clustering (Kovatchev, Salamo, & Marti, 2016), sentiment analysis (Duyu, Wei, Yang, Ming, Ting, & Bing, 2014), word sense disambiguation (Basile, Caputo, & Semeraro, 2014), metaphor recognition (Shutova, Sun, Gutierrez, Lichtenstein, & Narayanan, 2017), sentence paraphrasing , and documents classification (Kim, Kim, & Cho, 2017). Many problems are best solved with the help of DS models, e.g.…”
Section: Introductionmentioning
confidence: 99%
“…The bag of words is the data representation technique used in most of the consulted literature [13,6,14,15,1,16]. It consists on representing each text document as a vector of frequencies [13].…”
Section: Related Workmentioning
confidence: 99%
“…Recent studies have evaluated text classification techniques mostly focused on classification algorithms [4] and distance measures [3]. There have also been works related to the comparison of preprocessing methods and document representations [15,14]. This work intends to achieve a similar evaluation but combining the representations with the distance metrics.…”
Section: Related Workmentioning
confidence: 99%
“…Bags of concepts are an extension of bags of words to successive concepts in a text [38]. A recent extension of these concepts is given in [39] where bag of graphs are introduced to encode in graphs the local structure of a digital object: bags of graphs are declined into bags of singleton graphs and bags of visual graphs.…”
Section: Multisetsmentioning
confidence: 99%