2019
DOI: 10.1108/el-08-2018-0165
|View full text |Cite
|
Sign up to set email alerts
|

Selecting a text similarity measure for a content-based recommender system

Abstract: Purpose The purpose of this paper is to develop a journal recommender system, which compares the content similarities between a manuscript and the existing journal articles in two subject corpora (covering the social sciences and medicine). The study examines the appropriateness of three text similarity measures and the impact of numerous aspects of corpus documents on system performance. Design/methodology/approach Implemented three similarity measures one at a time on a journal recommender system with two … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 49 publications
0
9
0
Order By: Relevance
“…By using 100 fine-tuning epochs, we compare the effectiveness of the proposed approach against the one obtained with a differently implemented sentence embedding modules. Specifically, we replace S-BERT with USE (Cer et al, 2018) and employ a well-known bag-of-words approaches, such as the cosine similarity and BM25 (Wijewickrema et al, 2019). USE represents the state of the art for sentence embedding approaches based on transformers (Ahmed et al, 2019).…”
Section: Resultsmentioning
confidence: 99%
“…By using 100 fine-tuning epochs, we compare the effectiveness of the proposed approach against the one obtained with a differently implemented sentence embedding modules. Specifically, we replace S-BERT with USE (Cer et al, 2018) and employ a well-known bag-of-words approaches, such as the cosine similarity and BM25 (Wijewickrema et al, 2019). USE represents the state of the art for sentence embedding approaches based on transformers (Ahmed et al, 2019).…”
Section: Resultsmentioning
confidence: 99%
“…, 2020; Wijewickrema et al. , 2019). Cosine similarity is superior because even if a TMT and board are far apart because of size (boards in our sample are about 50% larger than TMTs), their representing vectors could still have a smaller angle between them, i.e.…”
Section: Methodsmentioning
confidence: 99%
“…Cosine similarity with tf-idf is a common and effective strategy to determine the similarity of documents in text mining [ 25 ]. While multiple similarity measures exist, vocabulary comparisons have been found to yield similar results to other metrics, particularly for technical vocabulary and in the social sciences [ 26 ]. Moreover, cosine similarity best approximates the consensus of domain experts when manually comparing cluster content [ 27 ].…”
Section: Methodsmentioning
confidence: 99%