2015
DOI: 10.14257/ijdta.2015.8.4.07
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Pairwise Document Similarity Computation in Big Datasets

Abstract: Document similarity is a common task to a variety of problems such as clustering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(3 citation statements)
references
References 27 publications
0
3
0
Order By: Relevance
“…Each row represents tokens or words extracted from each document, and every column represents a document. Each row in the matrix corresponds to a score, and every column corresponds to the amount of similarity between documents [31,32]. In the case of larger numbers of documents, such as the requirements documents for several years of a multinational company, the computations will be extremely tedious.…”
Section: Lexical-based Similaritymentioning
confidence: 99%
See 1 more Smart Citation
“…Each row represents tokens or words extracted from each document, and every column represents a document. Each row in the matrix corresponds to a score, and every column corresponds to the amount of similarity between documents [31,32]. In the case of larger numbers of documents, such as the requirements documents for several years of a multinational company, the computations will be extremely tedious.…”
Section: Lexical-based Similaritymentioning
confidence: 99%
“…An extensive set of text used for research purposes is referred to as a corpus. The use of semantic similarity in query answer systems helps users to find what they are looking for regardless of how the characters are written [29][30][31]. Similarity is also measured using ontologies.…”
Section: Semantic-based Similaritymentioning
confidence: 99%
“…Niyigena et al [17], have presented a new method to compute the pairwise document similarity in a corpus in order to reduce the time execution and save space execution resources. Their algorithm provided an efficient solution for pairwise documents similarity in a corpus.…”
Section: Related Workmentioning
confidence: 99%