Refocusing on Relevance: Personalization in NLG

Dudy, Shiran; Bedrick, Steven; Webber, Bonnie

doi:10.18653/v1/2021.emnlp-main.421

Cited by 9 publications

(2 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is critical for the output from LLMs to respond to a variety of user attributes, ranging from interaction history to the situation of use. In the past, researchers have investigated user modeling for various tasks, including headline generation, dialog response generation and recipe creation (Majumder et al, 2019;Flek, 2020;Wu et al, 2021;Dudy et al, 2021;Cai et al, 2023). In this paper, our focus has been on exploring human preference modeling for LLM development.…”

Section: Related Workmentioning

confidence: 99%

DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4

Hu,

Song,

Cho

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Human preference judgments are pivotal in guiding large language models (LLMs) to produce outputs that align with human values. Human evaluations are also used in summarization tasks to compare outputs from various systems, complementing existing automatic metrics. Despite their significance, however, there has been limited research probing these pairwise or kwise comparisons. The collective impact and relative importance of factors such as output length, informativeness, fluency, and factual consistency are still not well understood. It is also unclear if there are other hidden factors influencing human judgments. In this paper, we conduct an in-depth examination of a collection of pairwise human judgments released by Ope-nAI. Utilizing the Bradley-Terry-Luce (BTL) model, we reveal the inherent preferences embedded in these human judgments. We find that the most favored factors vary across tasks and genres, whereas the least favored factors tend to be consistent, e.g., outputs are too brief, contain excessive off-focus content or hallucinated facts. Our findings have implications on the construction of balanced datasets in human preference evaluations, which is a crucial step in shaping the behaviors of future LLMs.

show abstract

Section: Related Workmentioning

confidence: 99%

DecipherPref: Analyzing Influential Factors in Human Preference Judgments via GPT-4

Hu,

Song,

Cho

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Natural language generation (NLG) is an interesting application of artificial intelligence which focuses exclusively on the automatic production of understandable written or spoken language. To date, numerous studies have been conducted to explore the design and implementation of NLG systems (Singh et al, 2016), the system evaluation (Van der Lee et al, 2021), and the personalization of NLG systems (Dudy et al, 2021). Reading Computer-generated Text takes a step further by examining NLG from a humanities perspective.…”

mentioning

confidence: 99%