A ranking approach to keyphrase extraction

Jiang, Xin; Hu, Yunhua; Li, Hang

doi:10.1145/1571941.1572113

Cited by 110 publications

(77 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We differentiate between cited and citing contexts for a paper: let d be a target paper and C be a citation network such that d ∈ C. A cited context for d is a context in which d is cited by some paper d i in C. A citing context for d is a context in which d is citing some paper d j in C. If a paper is cited in multiple contexts by another paper, the contexts are aggregated into a single one; citation tf-idf, i.e., the tf-idf score of each phrase computed from the citation contexts; (3) Novel features -Extend Other Existing Features include: first position of a candidate phrase, i.e., the distance of the first occurrence of a phrase from the beginning of a paper; this is similar to relative position except that it does not consider the length of a paper; tf-idfOver, i.e., a boolean feature, which is true if the tf-idf of a candidate phrase is greater than a threshold θ, and firstPosUnder, also a boolean feature, which is true if the distance of the first occurrence of a phrase from the beginning of a target paper is below some value β. This feature is similar to the feature is-in-title, used previously in the literature (Litvak and Last, 2008;Jiang et al, 2009). Both tf-idf and citation tf-idf features showed better results when each tf was divided by the maximum tf values from the target paper or citation contexts.…”

Section: Featuresmentioning

confidence: 94%

Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach

Caragea

Bulgarov

Godea

et al. 2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Given the large amounts of online textual documents available these days, e.g., news articles, weblogs, and scientific papers, effective methods for extracting keyphrases, which provide a high-level topic description of a document, are greatly needed. In this paper, we propose a supervised model for keyphrase extraction from research papers, which are embedded in citation networks. To this end, we design novel features based on citation network information and use them in conjunction with traditional features for keyphrase extraction to obtain remarkable improvements in performance over strong baselines.

show abstract

Section: Featuresmentioning

confidence: 94%

Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach

Caragea

Bulgarov

Godea

et al. 2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Keyphrases and non-keyphrases are used to generate positive and negative examples, respectively. Different learning algorithms have been used to train this classifier, including naïve Bayes , decision trees (Turney, 1999;Turney, 2000), bagging (Hulth, 2003), boosting (Hulth et al, 2001), maximum entropy (Yih et al, 2006;Kim and Kan, 2009), multi-layer perceptron (Lopez and Romary, 2010), and support vector machines (Jiang et al, 2009;Lopez and Romary, 2010).…”

Section: Task Reformulationmentioning

confidence: 99%

“…Motivated by this observation, Jiang et al (2009) propose a ranking approach to keyphrase extraction, where the goal is to learn a ranker to rank two candidate keyphrases. This pairwise ranking approach therefore introduces competition between candidate keyphrases, and has been shown to significantly outperform KEA , a popular supervised baseline that adopts the traditional supervised classification approach (Song et al, 2003;Kelleher and Luz, 2005).…”

Section: Task Reformulationmentioning

confidence: 99%

Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan¹,

Ng²

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

390

188

View full text Add to dashboard Cite

show abstract

“…Treating automatic keyphrase extraction as a supervised machine learning task means that a classifier is trained using documents with known keyphrases. While the decision is binary, a ranking of phrases can be obtained using classifier confidence estimates, or alternatively, by applying a learning-to-rank approach (Jiang et al, 2009). …”

Section: Supervised Keyphrase Extractionmentioning

confidence: 99%

Creation and evaluation of large keyphrase extraction collections with multiple opinions

Sterckx

Demeester

Deleu

et al. 2017

Lang Resources & Evaluation

View full text Add to dashboard Cite

While several Automatic Keyphrase Extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.

show abstract

A ranking approach to keyphrase extraction

Cited by 110 publications

References 2 publications

Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach

Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach

Automatic Keyphrase Extraction: A Survey of the State of the Art

Creation and evaluation of large keyphrase extraction collections with multiple opinions

Contact Info

Product

Resources

About