Ximing Lu scite author profile

Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DEX-PERTS: Decoding-time Experts, a decodingtime method for controlled text generation that combines a pretrained language model with "expert" LMs and/or "anti-expert" LMs in a product of experts. Intuitively, under the ensemble, tokens only get high probability if they are considered likely by the experts and unlikely by the anti-experts. We apply DEXPERTS to language detoxification and sentiment-controlled generation, where we outperform existing controllable generation methods on both automatic and human evaluations. Moreover, because DEXPERTS operates only on the output of the pretrained LM, it is effective with (anti-)experts of smaller size, including when operating on GPT-3. Our work highlights the promise of tuning small LMs on text with (un)desirable attributes for efficient decoding-time steering.

show abstract

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound

Zellers

Lü

et al. 2022

132

View full text Add to dashboard Cite

Generated Knowledge Prompting for Commonsense Reasoning

Liu¹,

Liu²,

Lu³

et al. 2022

View full text Add to dashboard Cite

Volatility Forecast Based on the Hybrid Artificial Neural Network and GARCH-type Models

Que

Cao

2016

Procedia Computer Science

View full text Add to dashboard Cite

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

West¹,

Bhagavatula²,

Hessel³

et al. 2022

View full text Add to dashboard Cite

The common practice for training commonsense models has gone from-human-to-corpusto-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-tomachine: general language models author these commonsense knowledge graphs to train commonsense models.Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the resulting neural model. We distill only one aspect-the commonsense of a general language model teacher, allowing the student to be a different type of model, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill highquality causal commonsense from GPT-3, a general language model. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and will share our new symbolic knowledge graph and commonsense models 1 .

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ximing Lu

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound

Generated Knowledge Prompting for Commonsense Reasoning

Volatility Forecast Based on the Hybrid Artificial Neural Network and GARCH-type Models

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

Contact Info

Product

Resources

About