Hangping Qiu scite author profile

In the age of social media, faced with a huge amount of knowledge and information, accurate and effective keyphrase extraction methods are needed to be applied in information retrieval and natural language processing. It is difficult for traditional keyphrase extraction models to contain a large amount of external knowledge information, but with the rise of pre-trained language models, there is a new way to solve this problem. Based on the above background, we propose a new baseline for unsupervised keyphrase extraction based on pre-trained language model called SIFRank. SIFRank combines sentence embedding model SIF and autoregressive pre-trained language model ELMo, and it has the best performance in keyphrase extraction for short documents. We speed up SIFRank while maintaining its accuracy by document segmentation and contextual word embeddings alignment. For long documents, we upgrade SIFRank to SIFRank+ by position-biased weight, greatly improve its performance on long documents. Compared to other baseline models, our model achieves state-of-the-art level on three widely used datasets. INDEX TERMS Keyphrase extraction, pre-trained language model, sentence embeddings, position-biased weight, SIFRank. I. INTRODUCTION Keyphrase extraction is the task of selecting a set of words or phrases from a document that could summarize the main topics discussed in the document [1]. Keyphrase extraction can greatly accelerate the speed of information retrieval, help people get the first-hand information from a long text quickly and accurately. A. MOTIVATION Keyphrase Extraction can be divided into two main kinds of approaches: supervised and unsupervised. Supervised methods perform better on specific domain tasks, but it takes a lot of labor to annotate the corpus, and the model after training may overfit and do not work well on other datasets. The main traditional unsupervised methods are mainly divided into the models based on statistics and the models based on The associate editor coordinating the review of this manuscript and approving it for publication was Shuai Han .

show abstract

NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

Sun¹,

Zheng²,

Hao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Using prompts to utilize language models to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. Nonetheless, virtually all prompt-based methods are tokenlevel, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform clozestyle tasks. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models-Next Sentence Prediction (NSP). Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking with ease. Based on the characteristics of NSP-BERT, we offer several quick building templates for various downstream tasks. We suggest a two-stage prompt method for word sense disambiguation tasks in particular. Our strategies for mapping the labels significantly enhance the model's performance on sentence pair tasks. On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of these tasks and comes close to the few-shot methods. 1

show abstract

Survey of structure from motion

Gao

Luo

Qiu

et al. 2014

View full text Add to dashboard Cite

GPU-based Arbitrary Polygon Intersection Area Algorithm

Gao¹,

Wu²,

Luo³

et al. 2017

dtetr

View full text Add to dashboard Cite

Computation of the intersection area of polygons is an important mathematical problem. For a long time, the processing of arbitrary polygons has been an important and difficult research topic. In this paper, we proposed a GPU-based rasterized polygon intersection area algorithm, GPURAS, and its accelerated versions, GPURASMC, which use Monte Carlo method and prove the correctness of these algorithms. Experiments and comparisons were performed using simple, arbitrary complex, and large-scale polygons. The results show that, compared to CPU-based algorithms, the efficiency of our algorithms is hundreds of times superior.

show abstract

Multi-object Tracking with Noisy Labels

Liu

Wang

et al. 2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hangping Qiu

SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-Trained Language Model

NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

Survey of structure from motion

GPU-based Arbitrary Polygon Intersection Area Algorithm

Multi-object Tracking with Noisy Labels

Contact Info

Product

Resources

About