2021
DOI: 10.1007/978-3-030-79150-6_50
|View full text |Cite
|
Sign up to set email alerts
|

A Comparative Assessment of State-Of-The-Art Methods for Multilingual Unsupervised Keyphrase Extraction

Abstract: Keyphrase extraction is a fundamental task in information management, which is often used as a preliminary step in various information retrieval and natural language processing tasks. The main contribution of this paper lies in providing a comparative assessment of prominent multilingual unsupervised keyphrase extraction methods that build on statistical (RAKE, YAKE), graphbased (TextRank, SingleRank) and deep learning (KeyBERT) methods. For the experimentations reported in this paper, we employ well-known dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 16 publications
0
8
0
Order By: Relevance
“…To understand the impact of this kind of network on group behavior and individual influence, SNA can be used to describe its structure (Wasserman and Faust 1994 ). Studies on web discourse primarily focus on computational aspects (Herring 2013 ), such as NLP (Yuksel and Tan 2018 ; Giarelis et al 2021 ), online content analysis, and speech analysis (Moser et al 2013 ; Kok and Rogers 2017 ; Pennington 2017 ; Goritz et al 2019 ), with a significant emphasis on text analysis and utilizing other knowledge areas. However, SNA and semantic analysis of posts have not received much attention and remain largely theoretical (Zhao et al 2021 ).…”
Section: Social Network Analysis and Decision-making Support For Airl...mentioning
confidence: 99%
“…To understand the impact of this kind of network on group behavior and individual influence, SNA can be used to describe its structure (Wasserman and Faust 1994 ). Studies on web discourse primarily focus on computational aspects (Herring 2013 ), such as NLP (Yuksel and Tan 2018 ; Giarelis et al 2021 ), online content analysis, and speech analysis (Moser et al 2013 ; Kok and Rogers 2017 ; Pennington 2017 ; Goritz et al 2019 ), with a significant emphasis on text analysis and utilizing other knowledge areas. However, SNA and semantic analysis of posts have not received much attention and remain largely theoretical (Zhao et al 2021 ).…”
Section: Social Network Analysis and Decision-making Support For Airl...mentioning
confidence: 99%
“…This paper focuses mostly on showcasing the applications of the T5 generative language model [13] and comparing its performance to text classification (extremeText / fastText) [17] [10] and statistical terminology extraction (C / NC-values) [7] as baseline methods. Although the complementarity of statistical and transformer-based approaches to keyword extraction has been explored before [8], we are not aware of any published assessment of text-to-text generative models on this task.…”
Section: Keyword Extraction and Generationmentioning
confidence: 99%
“…Logistyka (logistics) is the only potentially irrelevant keyword in this subset which may have resulted from some over-fitting of the model on the original domain of scientific abstracts. (4) seal break (4) plomba na liczniku (8) seal on meter (8) PESEL (2) National Identification Number (2) weryfikacja tożsamości (1) identity verification (1)…”
Section: Customer Support Dialoguesmentioning
confidence: 99%
“…By acquiring ranked words, it accurately extracts just noun phrases from datasets, not keyphrases. In ranking phase, unimportant keywords are used, although this does not always screen out small scoring terms, providing longer keywords greater scores [22].…”
Section: Unsupervised Techniquesmentioning
confidence: 99%
“…Again, because of the huge number of complex processes, statistical unsupervised techniques such as [15,21] are computationally expensive. Graph-based unsupervised approaches perform badly because of their inability to detect cohesion amongst numerous words that compose a keyphrase [22][23][24][25][26][27]. Finally, TeKET [14] is extremely versatile and acts similarly to TF-IDF for short data lengths.…”
Section: Introductionmentioning
confidence: 99%