2022
DOI: 10.7717/peerj-cs.1024
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding

Abstract: A textual data processing task that involves the automatic extraction of relevant and salient keyphrases from a document that expresses all the important concepts of the document is called keyphrase extraction. Due to technological advancements, the amount of textual information on the Internet is rapidly increasing as a lot of textual information is processed online in various domains such as offices, news portals, or for research purposes. Given the exponential increase of news articles on the Internet, manu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(9 citation statements)
references
References 37 publications
0
9
0
Order By: Relevance
“…The most significant choice to make when applying our proposed methodology is the keyword extraction technique. We reviewed recent studies that addressed the problem of keyword extraction, focusing on those that compared the performance of state of the art techniques on the gold-standard keyword extraction datasets ( Sarwar, Noor & Miah, 2022 ; Piskorski et al, 2021 ; Miah et al, 2021 ; Papagiannopoulou & Tsoumakas, 2020 ). We also checked the methods that were reported by the recent techniques as effective baselines.…”
Section: Methodsmentioning
confidence: 99%
“…The most significant choice to make when applying our proposed methodology is the keyword extraction technique. We reviewed recent studies that addressed the problem of keyword extraction, focusing on those that compared the performance of state of the art techniques on the gold-standard keyword extraction datasets ( Sarwar, Noor & Miah, 2022 ; Piskorski et al, 2021 ; Miah et al, 2021 ; Papagiannopoulou & Tsoumakas, 2020 ). We also checked the methods that were reported by the recent techniques as effective baselines.…”
Section: Methodsmentioning
confidence: 99%
“…We used two lists of normalized keyphrases for each sample from the annotators. We use the Jaccard index to measure the agreement/similarity between annotations (Sarwar, Noor, and Miah 2022). Jaccard index is defined as:…”
Section: Validation Of Annotationmentioning
confidence: 99%
“…When the available knowledge is in the form of textual documents, this step is referred to as word embedding, whereas, when dealing with graph-shaped knowledge, as graph embedding. Examples about word embedding for semantic relatedness are proposed in [35], [52], and [65]. In particular, [35] aims at achieving a better accuracy on the semantic relatedness of both isolated words and words in contexts.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, [35] aims at achieving a better accuracy on the semantic relatedness of both isolated words and words in contexts. In [52], word embedding is applied to represent keyphrases in a corpus of textual documents in order to find similar news articles. In [65], a semantic relatedness graph is constructed in order to detect sentiment polarities in a long sentence towards multiple aspect categories.…”
Section: Related Workmentioning
confidence: 99%