NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

Sun, Yi; Zheng, Yu; Hao, Chao; Qiu, Hangping

doi:10.48550/arxiv.2109.03564

Cited by 21 publications

(24 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Wang et al (2021) transform NLP tasks into textual entailment and provide label-specific descriptions for each class. Sun et al (2021) propose an approach named NSP-BERT which utilizes a BERT original next sentence prediction pre-training task to perform few-shot learning. Additionally, Puri and Catanzaro (2019) show that reformulating NLP tasks as question answering problems to query generative language models is also a feasible approach.…”

Section: Prompt-based Few-shot Learningmentioning

confidence: 99%

Pre-trained Token-replaced Detection Model as Few-shot Learner

Li¹,

Li²,

Zhou³

2022

Preprint

View full text Add to dashboard Cite

Pre-trained masked language models have demonstrated remarkable ability as few-shot learners. In this paper, as an alternative, we propose a novel approach to few-shot learning with pretrained token-replaced detection models like ELECTRA. In this approach, we reformulate a classification or a regression task as a token-replaced detection problem. Specifically, we first define a template and label description words for each task and put them into the input to form a natural language prompt. Then, we employ the pre-trained token-replaced detection model to predict which label description word is the most original (i.e., least replaced) among all label description words in the prompt. A systematic evaluation on 16 datasets demonstrates that our approach outperforms few-shot learners with pre-trained masked language models in both onesentence and two-sentence learning tasks.

show abstract

Section: Prompt-based Few-shot Learningmentioning

confidence: 99%

Pre-trained Token-replaced Detection Model as Few-shot Learner

Li¹,

Li²,

Zhou³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…It teaches the model to understand dependencies across sentences [53]. In spite of that, NSP is criticized as a weak task for its comparison of similarity [83]. To overcome this limitation, we introduce a harder snapshot ordering task, which aims to order a set of conformations as a coherent sub-trajectory.…”

Section: Snapshot Ordering Pre-trainingmentioning

confidence: 99%

Pre-training of Equivariant Graph Matching Networks with Conformation Flexibility for Drug Binding

Wu¹,

Jiang²,

Jin³

et al. 2022

Preprint

View full text Add to dashboard Cite

The latest biological findings discover that the motionless 'lock-and-key' theory is no longer applicable and the flexibility of both the receptor and ligand plays a significant role in helping understand the principles of the binding affinity prediction. Based on this mechanism, molecular dynamics (MD) simulations have been invented as a useful tool to investigate the dynamical properties of this molecular system. However, the computational expenditure prohibits the growth of reported protein trajectories. To address this insufficiency, we present a novel spatial-temporal pre-training protocol, PretrainMD, to grant the protein encoder the capacity to capture the time-dependent geometric mobility along MD trajectories. Specifically, we introduce two sorts of self-supervised learning tasks: an atom-level denoising generative task and a protein-level snapshot ordering task. We validate the effectiveness of PretrainMD through the PDBbind dataset for both linear-probing and fine-tuning. Extensive experiments show that our PretrainMD exceeds most state-of-the-art methods and achieves comparable performance. More importantly, through visualization we discover that the learned representations by pre-training on MD trajectories without any label from the downstream task follow similar patterns of the magnitude of binding affinities. This strongly aligns with the fact that the motion of the interactions of protein and ligand maintains the key information of their binding. Our work provides a promising perspective of self-supervised pre-training for protein representations with very fine temporal * The corresponding authors.Preprint. Under review.

show abstract

“…• For the natural language inference task, we exploit NSP-based prompt training [100]. Different labels are regarded as prompts to concatenate the two sentences, and the model is trained to select the label that makes the concatenated sentence the most coherent.…”

Section: Settingsmentioning

confidence: 99%

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Wang¹,

Sun²,

Xiang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 [2] was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle [3] platform. Furthermore, we design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts. To reduce the computation overhead and carbon emission, we propose an online distillation framework for ERNIE 3.0 Titan, where the teacher model will teach students and train itself simultaneously. ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets.

show abstract

NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction

Cited by 21 publications

References 35 publications

Pre-trained Token-replaced Detection Model as Few-shot Learner

Pre-trained Token-replaced Detection Model as Few-shot Learner

Pre-training of Equivariant Graph Matching Networks with Conformation Flexibility for Drug Binding

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Contact Info

Product

Resources

About