Yuan, Lifan scite author profile

Yuan, Lifan

3Publications

13Citation Statements Received

165Citation Statements Given

How they've been cited

How they cite others

164

Affiliations

Publications

Order By: Most citations

A Close Look into the Calibration of Pre-trained Language Models

Yangyi¹,

Lifan²,

Cui³

et al. 2022

Preprint

View full text Add to dashboard Cite

Pre-trained language models (PLMs) achieve remarkable performance on many downstream tasks, but may fail in giving reliable estimates of their predictive uncertainty. However, well-calibrated PLMs are essential for decision makers in high-risk applications since their confidence scores reflect the true probability of the outcomes. Given the lack of a comprehensive understanding of PLMs calibration, we take a close look into this new research problem, aiming to answer two questions: (1) Do PLMs learn to become calibrated in the training process? (2) How effective are existing calibration methods? For the first question, we conduct fine-grained control experiments to study the dynamic change in PLMs' calibration performance in training. We consider six factors as control variables, including dataset difficulty, available training samples, training steps, the number of tunable parameters, model scale, and pretraining. In experiments, we observe a consistent change in calibration performance across six factors. We find that PLMs don't learn to become calibrated in training, evidenced by the continual increase in confidence, no matter the predictions are correct or not. We highlight that our finding presents some contradiction with two established conclusions: (a) Larger PLMs are more calibrated; (b) Pretraining improves model calibration. Next, we study the effectiveness of existing calibration methods in mitigating the overconfidence issue, in both in-distribution and various out-of-distribution settings. Besides unlearnable calibration methods (e.g., label smoothing), we adapt two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations. Also, we propose extended learnable methods based on existing ones to further improve or maintain PLMs calibration without sacrificing the original task performance. Experimental results show that learnable methods significantly reduce PLMs' confidence in wrong predictions, and our methods exhibit superior performance compared with previous methods. All the code and data can be obtained at https://github.com/lifan-yuan/PLMCalibration.

show abstract

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

Cui¹,

Lifan²,

He³

et al. 2022

Preprint

View full text Add to dashboard Cite

Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a backdoor in the training phase, the adversary could control model predictions via predefined triggers. As various attack and defense models have been proposed, it is of great significance to perform rigorous evaluations. However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e.g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving. To address these issues, we categorize existing works into three practical scenarios in which attackers release datasets, pre-trained models, and fine-tuned models respectively, then discuss their unique evaluation methodologies. On metrics, to completely evaluate poisoned samples, we use grammar error increase and perplexity difference for stealthiness, along with text similarity for validity. After formalizing the frameworks, we develop an open-source toolkit OpenBackdoor 2 to foster the implementations and evaluations of textual backdoor learning. With this toolkit, we perform extensive experiments to benchmark attack and defense models under the suggested paradigm. To facilitate the underexplored defenses against poisoned datasets, we further propose CUBE, a simple yet strong clustering-based defense baseline. We hope that our frameworks and benchmarks could serve as the cornerstones for future model development and evaluations. * Equal contribution 2 https://github.com/thunlp/OpenBackdoor Preprint. Under review.

show abstract

A Close Look into the Calibration of Pre-trained Language Models

Yangyi¹,

Lifan²,

Cui³

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuan, Lifan

A Close Look into the Calibration of Pre-trained Language Models

A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks

A Close Look into the Calibration of Pre-trained Language Models

Contact Info

Product

Resources

About