Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings

Sahrawat, Dhruva; Mahata, Debanjan; Zhang, Haimin; Kulkarni, Mayank; Sharma, Anuradha; Gosangi, Rakesh; Stent, Amanda; Kumar, Yaman; Shah, Rajiv Ratn; Zimmermann, Roger

doi:10.1007/978-3-030-45442-5_41

Cited by 57 publications

(57 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Problem Formulation Similar to recent works Sahrawat et al, 2020), we formulate keyphrase extraction as a sequence labeling task. Let D = (t 1 , t 2 , ..., t n ) be a document consisting of n tokens, where t i represents the i th token of the document.…”

Section: Preliminariesmentioning

confidence: 99%

“…Baseline Models In this work, we employ the BiLSTM-CRF architecture as the baseline architecture (Huang et al, 2015;Alzaidy et al, 2019;Sahrawat et al, 2020;Zhu et al, 2020). Figure 1 shows a high-level overview of our baseline model.…”

Section: Preliminariesmentioning

confidence: 99%

“…Over the past years, researchers have proposed many methods for the task, which can be divided into two major categories: supervised (Sterckx et al, 2016;Zhang et al, 2017;Alzaidy et al, 2019) and unsupervised techniques (Florescu and Caragea, 2017b;Boudin, 2018;Mahata et al, 2018). In the presence of sufficient domain-specific labeled data, supervised keyphrase extraction methods are often reported to outperform unsupervised methods (Kim et al, 2013;Caragea et al, 2014;Sahrawat et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

Lai¹,

Bui²,

Kim³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task, achieving new state-of-the-art results on two public benchmarks: Inspec and SemEval-2017.

show abstract

Section: Preliminariesmentioning

confidence: 99%

Section: Preliminariesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

Lai¹,

Bui²,

Kim³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Keyphrase generation is the process of predicting both extractive and abstractive keyphrases from a given document. Most of the previous works in keyphrase domain, including both supervised and unsupervised techniques, primarily focus on extractive keyphrases (Hasan and Ng, 2014;Mahata et al, 2018;Sahrawat et al, 2020). Recent studies Meng et al (2017); Ye and Wang (2018); Chan et al (2019) have started to develop generative approaches that produce both abstractive and extractive keyphrases from documents.…”

Section: Introductionmentioning

confidence: 99%

A Preliminary Exploration of GANs for Keyphrase Generation

Swaminathan¹,

Zhang²,

Mahata³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

We introduce a new keyphrase generation approach using Generative Adversarial Networks (GANs). For a given document, the generator produces a sequence of keyphrases, and the discriminator distinguishes between human-curated and machinegenerated keyphrases. We evaluated this approach on standard benchmark datasets. We observed that our model achieves state-of-theart performance in the generation of abstractive keyphrases and is comparable to the best performing extractive techniques. Although we achieve promising results using GANs, they are not significantly better than the stateof-the-art generative models. To our knowledge, this is one of the first works that use GANs for keyphrase generation. We present a detailed analysis of our observations and expect that these findings would help other researchers to further study the use of GANs for the task of keyphrase generation.

show abstract

“…The first corpora for automated keyphrase extraction were likewise assembled out of publications from scientific fields including technical reports (Witten et al, 1999), paper abstracts (Hulth, 2003), and scientific papers (Nguyen and Kan, 2007;Medelyan et al, 2009;Kim et al, 2010). To this day, scientific publications still serve as a fundamental fixed-domain benchmark for neural KPE methods (Meng et al, 2017;Alzaidy et al, 2019;Sahrawat et al, 2019) due to the availability of ample data of this kind. However, experiments have revealed that KPE methods trained directly on such corpora do not generalize well to other web-related genres or other types of documents (Chen et al, 2018;Xiong et al, 2019), where there may be far more heterogeneity in topics, content and structure, and there may be more variation in terms of where a key phrase may appear.…”

Section: Introductionmentioning

confidence: 99%

Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction

Wang¹,

Fan²,

Rosé³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Open-domain Keyphrase extraction (KPE) on the Web is a fundamental yet complex NLP task with a wide range of practical applications within the field of Information Retrieval. In contrast to other document types, web page designs are intended for easy navigation and information finding. Effective designs encode within the layout and formatting signals that point to where the important information can be found. In this work, we propose a modeling approach that leverages these multi-modal signals to aid in the KPE task. In particular, we leverage both lexical and visual features (e.g., size, font, position) at the micro-level to enable effective strategy induction, and metalevel features that describe pages at a macrolevel to aid in strategy selection. Our evaluation demonstrates that a combination of effective strategy induction and strategy selection within this approach for the KPE task outperforms state-of-the-art models. A qualitative post-hoc analysis illustrates how these features function within the model. * Equally contributed.to choose results from different tactics using macro-level features. In our evaluation, we compare SMART-KPE with several state-of-the-art baselines, where SMART-KPE shows its better ability to locate and extract keyphrases. We offer post-hoc case studies and ablation studies to illustrate model strengths and weaknesses. In addition to the improvement over SOTA baselines for the KPE task, to the best of our knowledge, Strategy-based Multimodal Architecture for Keyphrase Extraction is the most comprehensive treatment of multimodality in open-domain KPE.

show abstract

Keyphrase Extraction as Sequence Labeling Using Contextualized Embeddings

Cited by 57 publications

References 23 publications

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

A Preliminary Exploration of GANs for Keyphrase Generation

Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction

Contact Info

Product

Resources

About