Seraphina Goldfarb-Tarrant scite author profile

Long-form narrative text generated from large language models manages a fluent impersonation of human writing, but only at the local sentence level, and lacks structure or global cohesion. We posit that many of the problems of story generation can be addressed via highquality content planning, and present a system that focuses on how to learn good plot structures to guide story generation. We utilize a plot-generation language model along with an ensemble of rescoring models that each implement an aspect of good story-writing as detailed in Aristotle's Poetics. We find that stories written with our more principled plotstructure are both more relevant to a given prompt and higher quality than baselines that do not content plan, or that plan in an unprincipled way. 1 Prompt: Write the most beautiful end to the universe that you can imagine Naive Plot with Semantic Role Labelling Tags: The universe end ent 0 see ent 3 dying ent 1 # ent 1 left ent 0 the last human live ent 6 end # ent 2 come the last one die a universe of life left Revised Plot with Semantic Role Labelling Tags: The light filled the sky A bright flash lit # began to grow # the stars grow began ent 0 to fade # The stars fade ent 2 looked ent 1 me dying star Naive Generated Story: "You're dead. I'll be dead, and there'll always be no more of you."* *"No, my dear. It's not true!" -The last of the world is dead, * * "It'll never be over. The universe will n't end. Not yet. But I'll see it, one way or another. There is no other way."* *My last thought, the dying man left me. He told that I was the last human to live. And he told me the universe would end, and the end will come soon. That your death would be the last one to die, and the first one to be the most beautiful. * "There is a universe of life left, my dear, and that's all there is. Generated Story (from revision): It was the end of the universe. The light filled the sky. The galaxy. All the light in all the worlds, and all the stars. A bright flash. The first light. And then a bright flash of light... then a flash of darkness. Then a bright light. A flash of blackness and then the stars began to grow. It was a bright, beautiful bright, light. And then, slowly the stars started to grow in number. There were millions and millions of stars.. But, slowly but surely, all the galaxies began to fade. Slowly, but surely. Finally, it looked at me with its eyes. One dying star. Gold Story: It slept, and dreamt of twirling lights. It cradled countless twirling lights, all dancing in their flashing ways, dancing to the sound of a faraway flute. The light was vibrant and warm and spun all around It. The tiny lights grew with the tune, and more lights came to dance within the luminescence. It was surrounded by light, all waltzing in their ways t...

show abstract

Plan, Write, and Revise: an Interactive System for Open-Domain Story Generation

Goldfarb-Tarrant¹,

Feng²,

Peng³

2019

View full text Add to dashboard Cite

Story composition is a challenging problem for machines and even for humans. We present a neural narrative generation system that interacts with humans to generate stories. Our system has different levels of human interaction, which enables us to understand at what stage of story-writing human collaboration is most productive, both to improving story quality and human engagement in the writing process. We compare different varieties of interaction in story-writing, story-planning, and diversity controls under time constraints, and show that increased types of human collaboration at both planning and writing stages results in a 10-50% improvement in story quality as compared to less interactive baselines. We also show an accompanying increase in user engagement and satisfaction with stories as compared to our own less interactive systems and to previous turn-taking approaches to interaction. Finally, we find that humans tasked with collaboratively improving a particular characteristic of a story are in fact able to do so, which has implications for future uses of human-in-the-loop systems.

show abstract

Intrinsic Bias Metrics Do Not Correlate with Application Bias

Goldfarb-Tarrant¹,

Marchant²,

Muñoz³

et al. 2021

View full text Add to dashboard Cite

Natural Language Processing (NLP) systems learn harmful societal biases that cause them to amplify inequality as they are deployed in more and more situations. To guide efforts at debiasing these systems, the NLP community relies on a variety of metrics that quantify bias in models. Some of these metrics are intrinsic, measuring bias in word embedding spaces, and some are extrinsic, measuring bias in downstream tasks that the word embeddings enable. Do these intrinsic and extrinsic metrics correlate with each other? We compare intrinsic and extrinsic metrics across hundreds of trained models covering different tasks and experimental conditions. Our results show no reliable correlation between these metrics that holds in all scenarios across tasks and languages. We urge researchers working on debiasing to focus on extrinsic measures of bias, and to make using these measures more feasible via creation of new challenge sets and annotated test data. To aid this effort, we release code, a new intrinsic metric, and an annotated test set focused on gender bias in hate speech. 1

show abstract

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Orgad¹,

Goldfarb-Tarrant²,

Belinkov³

2022

View full text Add to dashboard Cite

Common studies of gender bias in NLP focus either on extrinsic bias measured by model performance on a downstream task or on intrinsic bias found in models' internal representations. However, the relationship between extrinsic and intrinsic bias is relatively unknown. In this work, we illuminate this relationship by measuring both quantities together: we debias a model during downstream fine-tuning, which reduces extrinsic bias, and measure the effect on intrinsic bias, which is operationalized as bias extractability with information-theoretic probing. Through experiments on two tasks and multiple bias metrics, we show that our intrinsic bias metric is a better indicator of debiasing than (a contextual adaptation of) the standard WEAT metric, and can also expose cases of superficial debiasing. Our framework provides a comprehensive perspective on bias in NLP models, which can be applied to deploy NLP systems in a more informed manner. 1 * Supported by the Viterbi Fellowship in the Center for Computer Engineering at the Technion.

show abstract

Intrinsic Bias Metrics Do Not Correlate with Application Bias

Goldfarb-Tarrant¹,

Marchant²,

Muñoz³

et al. 2020

Preprint

View full text Add to dashboard Cite

Natural Language Processing (NLP) systems learn harmful societal biases that cause them to widely proliferate inequality as they are deployed in more and more situations. To address and combat this, the NLP community relies on a variety of metrics to identify and quantify bias in black-box models and to guide efforts at debiasing. Some of these metrics are intrinsic, and are measured in word embedding spaces, and some are extrinsic, which measure the bias present downstream in the tasks that the word embeddings are plugged into. This research examines whether easy-tomeasure intrinsic metrics correlate well to real world extrinsic metrics. We measure both intrinsic and extrinsic bias across hundreds of trained models covering different tasks and experimental conditions and find that there is no reliable correlation between these metrics that holds in all scenarios across tasks and languages. We advise that efforts to debias embedding spaces be always also paired with measurement of downstream model bias, and suggest that that community increase effort into making downstream measurement more feasible via creation of additional challenge sets and annotated test data. We additionally release code, a new intrinsic metric, and an annotated test set for gender bias for hatespeech. 1

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.