Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval 2009
DOI: 10.1145/1571941.1572113
|View full text |Cite
|
Sign up to set email alerts
|

A ranking approach to keyphrase extraction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
77
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 110 publications
(77 citation statements)
references
References 2 publications
0
77
0
Order By: Relevance
“…We differentiate between cited and citing contexts for a paper: let d be a target paper and C be a citation network such that d ∈ C. A cited context for d is a context in which d is cited by some paper d i in C. A citing context for d is a context in which d is citing some paper d j in C. If a paper is cited in multiple contexts by another paper, the contexts are aggregated into a single one; citation tf-idf, i.e., the tf-idf score of each phrase computed from the citation contexts; (3) Novel features -Extend Other Existing Features include: first position of a candidate phrase, i.e., the distance of the first occurrence of a phrase from the beginning of a paper; this is similar to relative position except that it does not consider the length of a paper; tf-idfOver, i.e., a boolean feature, which is true if the tf-idf of a candidate phrase is greater than a threshold θ, and firstPosUnder, also a boolean feature, which is true if the distance of the first occurrence of a phrase from the beginning of a target paper is below some value β. This feature is similar to the feature is-in-title, used previously in the literature (Litvak and Last, 2008;Jiang et al, 2009). Both tf-idf and citation tf-idf features showed better results when each tf was divided by the maximum tf values from the target paper or citation contexts.…”
Section: Featuresmentioning
confidence: 94%
“…We differentiate between cited and citing contexts for a paper: let d be a target paper and C be a citation network such that d ∈ C. A cited context for d is a context in which d is cited by some paper d i in C. A citing context for d is a context in which d is citing some paper d j in C. If a paper is cited in multiple contexts by another paper, the contexts are aggregated into a single one; citation tf-idf, i.e., the tf-idf score of each phrase computed from the citation contexts; (3) Novel features -Extend Other Existing Features include: first position of a candidate phrase, i.e., the distance of the first occurrence of a phrase from the beginning of a paper; this is similar to relative position except that it does not consider the length of a paper; tf-idfOver, i.e., a boolean feature, which is true if the tf-idf of a candidate phrase is greater than a threshold θ, and firstPosUnder, also a boolean feature, which is true if the distance of the first occurrence of a phrase from the beginning of a target paper is below some value β. This feature is similar to the feature is-in-title, used previously in the literature (Litvak and Last, 2008;Jiang et al, 2009). Both tf-idf and citation tf-idf features showed better results when each tf was divided by the maximum tf values from the target paper or citation contexts.…”
Section: Featuresmentioning
confidence: 94%
“…Keyphrases and non-keyphrases are used to generate positive and negative examples, respectively. Different learning algorithms have been used to train this classifier, including naïve Bayes , decision trees (Turney, 1999;Turney, 2000), bagging (Hulth, 2003), boosting (Hulth et al, 2001), maximum entropy (Yih et al, 2006;Kim and Kan, 2009), multi-layer perceptron (Lopez and Romary, 2010), and support vector machines (Jiang et al, 2009;Lopez and Romary, 2010).…”
Section: Task Reformulationmentioning
confidence: 99%
“…Motivated by this observation, Jiang et al (2009) propose a ranking approach to keyphrase extraction, where the goal is to learn a ranker to rank two candidate keyphrases. This pairwise ranking approach therefore introduces competition between candidate keyphrases, and has been shown to significantly outperform KEA , a popular supervised baseline that adopts the traditional supervised classification approach (Song et al, 2003;Kelleher and Luz, 2005).…”
Section: Task Reformulationmentioning
confidence: 99%
“…Treating automatic keyphrase extraction as a supervised machine learning task means that a classifier is trained using documents with known keyphrases. While the decision is binary, a ranking of phrases can be obtained using classifier confidence estimates, or alternatively, by applying a learning-to-rank approach (Jiang et al, 2009). …”
Section: Supervised Keyphrase Extractionmentioning
confidence: 99%