2018 9th International Symposium on Telecommunications (IST) 2018
DOI: 10.1109/istel.2018.8661095
|View full text |Cite
|
Sign up to set email alerts
|

PerKey: A Persian News Corpus for Keyphrase Extraction and Generation

Abstract: Keyphrases provide an extremely dense summary of a text. Such information can be used in many Natural Language Processing tasks, such as information retrieval and text summarization. Since previous studies on Persian keyword or keyphrase extraction have not published their data, the field suffers from the lack of a human extracted keyphrase dataset. In this paper, we introduce PerKey 1 , a corpus of 553k news articles from six Persian news websites and agencies with relatively high quality author extracted key… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 19 publications
0
3
0
Order By: Relevance
“…PerKey (Doostmohammadi et al, 2018) is a key phrase extraction dataset for the Persian language crawled from six Persian news agencies. There are 553k articles available in this dataset.…”
Section: Downstream Datasetsmentioning
confidence: 99%
“…PerKey (Doostmohammadi et al, 2018) is a key phrase extraction dataset for the Persian language crawled from six Persian news agencies. There are 553k articles available in this dataset.…”
Section: Downstream Datasetsmentioning
confidence: 99%
“…We used KEA [36] as our supervised base-line method. For more information on the hyperparameters, settings and implementation of the base-line models see [29].…”
Section: Baseline Modelsmentioning
confidence: 99%
“…Here, we use a subset of the PerKey dataset introduced in [29] with at least 3 keyphrases for each news article. As concluded in PerKey paper, news articles with at least 3 keyphrases are more reliable in terms of recall.…”
Section: A Training and Testing Datasetsmentioning
confidence: 99%