In the age of social media, faced with a huge amount of knowledge and information, accurate and effective keyphrase extraction methods are needed to be applied in information retrieval and natural language processing. It is difficult for traditional keyphrase extraction models to contain a large amount of external knowledge information, but with the rise of pre-trained language models, there is a new way to solve this problem. Based on the above background, we propose a new baseline for unsupervised keyphrase extraction based on pre-trained language model called SIFRank. SIFRank combines sentence embedding model SIF and autoregressive pre-trained language model ELMo, and it has the best performance in keyphrase extraction for short documents. We speed up SIFRank while maintaining its accuracy by document segmentation and contextual word embeddings alignment. For long documents, we upgrade SIFRank to SIFRank+ by position-biased weight, greatly improve its performance on long documents. Compared to other baseline models, our model achieves state-of-the-art level on three widely used datasets. INDEX TERMS Keyphrase extraction, pre-trained language model, sentence embeddings, position-biased weight, SIFRank. I. INTRODUCTION Keyphrase extraction is the task of selecting a set of words or phrases from a document that could summarize the main topics discussed in the document [1]. Keyphrase extraction can greatly accelerate the speed of information retrieval, help people get the first-hand information from a long text quickly and accurately. A. MOTIVATION Keyphrase Extraction can be divided into two main kinds of approaches: supervised and unsupervised. Supervised methods perform better on specific domain tasks, but it takes a lot of labor to annotate the corpus, and the model after training may overfit and do not work well on other datasets. The main traditional unsupervised methods are mainly divided into the models based on statistics and the models based on The associate editor coordinating the review of this manuscript and approving it for publication was Shuai Han .
Using prompts to utilize language models to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. Nonetheless, virtually all prompt-based methods are tokenlevel, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform clozestyle tasks. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models-Next Sentence Prediction (NSP). Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking with ease. Based on the characteristics of NSP-BERT, we offer several quick building templates for various downstream tasks. We suggest a two-stage prompt method for word sense disambiguation tasks in particular. Our strategies for mapping the labels significantly enhance the model's performance on sentence pair tasks. On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of these tasks and comes close to the few-shot methods. 1
No abstract
Computation of the intersection area of polygons is an important mathematical problem. For a long time, the processing of arbitrary polygons has been an important and difficult research topic. In this paper, we proposed a GPU-based rasterized polygon intersection area algorithm, GPURAS, and its accelerated versions, GPURASMC, which use Monte Carlo method and prove the correctness of these algorithms. Experiments and comparisons were performed using simple, arbitrary complex, and large-scale polygons. The results show that, compared to CPU-based algorithms, the efficiency of our algorithms is hundreds of times superior.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.