Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Chen, Yihan; Xu, Benfeng; Wang, Quan; Liu, Yi; Mao, Zhendong

doi:10.1609/aaai.v38i16.29734

AAAI

2024

DOI: 10.1609/aaai.v38i16.29734

|View full text |Cite

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Yihan Chen,

Benfeng Xu,

Quan Wang

et al.

Abstract: While large language models (LLMs) have exhibited impressive instruction-following capabilities, it is still unclear whether and to what extent they can respond to explicit constraints that might be entailed in various instructions. As a significant aspect of LLM alignment, it is thus important to formulate such a specialized set of instructions as well as investigate the resulting behavior of LLMs. To address this vacancy, we propose a new benchmark CoDI-Eval to systematically and comprehensively evaluate LLM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Does ChatGPT have sociolinguistic competence?

Duncan

2024

J. Comp. Assist. Linguist. Res.

View full text Add to dashboard Cite

Large language models are now able to generate content- and genre-appropriate prose with grammatical sentences. However, these targets do not fully encapsulate human-like language use. For example, set aside is the fact that human language use involves sociolinguistic variation that is regularly constrained by internal and external factors. This article tests whether one widely used LLM application, ChatGPT, is capable of generating such variation. I construct an English corpus of “sociolinguistic interviews” using the application and analyze the generation of seven morphosyntactic features. I show that the application largely fails to generate any variation at all when one variant is prescriptively incorrect, but that it is able to generate variable deletion of the complementizer that that is internally constrained, with variants occurring at human-like rates. ChatGPT fails, however, to properly generate externally constrained complementizer that deletion. I argue that these outcomes reflect bias both in the training data and Reinforcement Learning from Human Feedback. I suggest that testing whether an LLM can properly generate sociolinguistic variation is a useful metric for evaluating if it generates human-like language.

show abstract

Does ChatGPT have sociolinguistic competence?

Duncan

2024

J. Comp. Assist. Linguist. Res.

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Benchmarking Large Language Models on Controllable Generation under Diversified Instructions

Cited by 1 publication

References 17 publications

Does ChatGPT have sociolinguistic competence?

Does ChatGPT have sociolinguistic competence?

Contact Info

Product

Resources

About