2015
DOI: 10.1007/978-3-319-19390-8_49
|View full text |Cite
|
Sign up to set email alerts
|

Sentence Clustering Using Continuous Vector Space Representation

Abstract: Abstract. In this paper, we present a clustering approach based on the combined use of a continuous vector space representation of sentences and the k-means algorithm. The principal motivation of this proposal is to split a big heterogeneous corpus into clusters of similar sentences. We use the word2vec toolkit for obtaining the representation of a given word as a continuous vector space. We provide empirical evidence for proving that the use of our technique can lead to better clusters, in terms of intra-clus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2018
2018

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…The mixture components of the words capture the notion of latent topics here. [15] uses K-means to cluster the sentences based on vector representations derived as a weighted sum of the word embeddings. It uses intra-cluster perplexity and F1 scores to measure the effectiveness of the clustering.…”
Section: Related Studymentioning
confidence: 99%
“…The mixture components of the words capture the notion of latent topics here. [15] uses K-means to cluster the sentences based on vector representations derived as a weighted sum of the word embeddings. It uses intra-cluster perplexity and F1 scores to measure the effectiveness of the clustering.…”
Section: Related Studymentioning
confidence: 99%
“…In addition to term expansion, utilization of contextual information of the short texts can be enhanced by machine translation (Tang et al, 2012). Direct clustering based on the continuous distributed representations of words, sentences, or paragraphs (Chinea-Rios et al, 2015;Mikolov et al, 2013) may also be worth of exploring. As a tradition in NLP research, further study will try all the promising combinations of the mentioned techniques to see which combinations perform best in which conditions.…”
Section: Demonstrationmentioning
confidence: 99%