Dynamic word recommendation to obtain diverse crowdsourced paraphrases of user utterances

Yaghoub-Zadeh-Fard, Mohammad-Ali; Benatallah, Boualem; Casati, Fabio; Barukh, Moshe Chai; Zamanirad, Shayan

doi:10.1145/3377325.3377486

Cited by 10 publications

(6 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Evaluation metrics. The pipeline configurations were evaluated using automatic evaluation metrics commonly used in assessing paraphrase quality [37] To capture the relevance of the generated paraphrases to the input utterance, we use two different metrics. This includes the Bi-Lingual Evaluation Understudy (BLEU) [19], a widely adopted metric that measures the similarity between two given sentences.…”

Section: Methodsmentioning

confidence: 99%

Automated Paraphrase Generation with Over-Generation and Pruning Services

Berro

Báez

Benatallah

et al. 2021

Service-Oriented Computing

Self Cite

View full text Add to dashboard Cite

Conversational services are emerging as a new paradigm for accessing information by simply uttering questions in natural language, posing a whole new set of challenges to the design and engineering of information systems. Training conversational services to deal with the nuances of natural language often requires collecting a high-quality and diverse set of training samples (i.e., paraphrases). Traditional approaches such as hiring an expert or crowdsourcing involve data collection processes that are often costly and time-consuming. Automated paraphrase generation is a promising cost-effective and scalable approach to generating training samples. Current automatic techniques, however, tend to specialise in specific types of lexical or syntactic variations. As a result, generated paraphrases may not perform well in relevant quality aspects such as diversity and semantic relatedness. In this paper, we follow an approach inspired by services integration to address these issues and generate paraphrases in English that are semantically relevant and diverse. We propose an extensible and reusable pipeline that combines automatic paraphrasing techniques in a two-step process that first focus on i) leveraging the strengths of multiple techniques to generate the most diverse (and possibly noisy) set of paraphrases, to then ii) address common quality issues in a separate step. Through empirical evaluations we show the benefits of the two-step process design and of combining techniques for more balancing relevance and diversity.

show abstract

Section: Methodsmentioning

confidence: 99%

Automated Paraphrase Generation with Over-Generation and Pruning Services

Berro

Báez

Benatallah

et al. 2021

Service-Oriented Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Prior work has shown that crowdsourced natural language benchmarks exhibit various spurious biases (i.e., unintended correlations between input and output), that lead to overestimation of PLM performance [SSK * 17,PNH * 18,GSL * 18,LBSB * 20]. Several techniques have been proposed to handle such bias post‐creation, including improving linguistic diversity of samples [YZFBC * 20, LML * 19, SYH20] and augmenting data with adversarial samples intended to fool the model [WRF * 19,KBN * 21,TYLB * ]. Similarly, there is evidence that natural language instructions provided by dataset creators during crowdsourcing influences crowdworkers to follow specific patterns during sample creation [GGB19,PMGB22, HSG * 21].…”

Section: Related Workmentioning

confidence: 99%

LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity

Arunkumar

Sharma

Agrawal

et al. 2023

Computer Graphics Forum

View full text Add to dashboard Cite

Cross‐task generalization is a significant outcome that defines mastery in natural language understanding. Humans show a remarkable aptitude for this, and can solve many different types of tasks, given definitions in the form of textual instructions and a small set of examples. Recent work with pre‐trained language models mimics this learning style: users can define and exemplify a task for the model to attempt as a series of natural language prompts or instructions. While prompting approaches have led to higher cross‐task generalization compared to traditional supervised learning, analyzing ‘bias’ in the task instructions given to the model is a difficult problem, and has thus been relatively unexplored. For instance, are we truly modeling a task, or are we modeling a user's instructions? To help investigate this, we develop LINGO, a novel visual analytics interface that supports an effective, task‐driven workflow to (1) help identify bias in natural language task instructions, (2) alter (or create) task instructions to reduce bias, and (3) evaluate pre‐trained model performance on debiased task instructions. To robustly evaluate LINGO, we conduct a user study with both novice and expert instruction creators, over a dataset of 1,616 linguistic tasks and their natural language instructions, spanning 55 different languages. For both user groups, LINGO promotes the creation of more difficult tasks for pre‐trained models, that contain higher linguistic diversity and lower instruction bias. We additionally discuss how the insights learned in developing and evaluating LINGO can aid in the design of future dashboards that aim to minimize the effort involved in prompt creation across multiple domains.

show abstract

“…This technique is used by Chklovski (2005) where paraphrases are collected via a game where users must reformulate a given sentence based on hints. To collect more diverse paraphrases, Yaghoub-Zadeh-Fard et al (2020) were inspired by another game called Taboo and gave workers a list of taboo words they were not allowed to include in their paraphrases.…”

Section: Literature Reviewmentioning

confidence: 99%

ECAsT: a large dataset for conversational search and an evaluation of metric robustness

Al-Thani

Jansen

Elsayed

2023

PeerJ Computer Science

View full text Add to dashboard Cite

The Text REtrieval Conference Conversational assistance track (CAsT) is an annual conversational passage retrieval challenge to create a large-scale open-domain conversational search benchmarking. However, as of yet, the datasets used are small, with just more than 1,000 turns and 100 conversation topics. In the first part of this research, we address the dataset limitation by building a much larger novel multi-turn conversation dataset for conversation search benchmarking called Expanded-CAsT (ECAsT). ECAsT is built using a multi-stage solution that uses a combination of conversational query reformulation and neural paraphrasing and also includes a new model to create multi-turn paraphrases. The meaning and diversity of paraphrases are evaluated with human and automatic evaluation. Using this methodology, we produce and release to the research community a conversational search dataset that is 665% more extensive in terms of size and language diversity than is available at the time of this study, with more than 9,200 turns. The augmented dataset not only provides more data but also more language diversity to improve conversational search neural model training and testing. In the second part of the research, we use ECAsT to assess the robustness of traditional metrics for conversational evaluation used in CAsT and identify its bias toward language diversity. Results show the benefits of adding language diversity for improving the collection of pooled passages and reducing evaluation bias. We found that introducing language diversity via paraphrases returned up to 24% new passages compared to only 2% using CAsT baseline.

show abstract

Dynamic word recommendation to obtain diverse crowdsourced paraphrases of user utterances

Cited by 10 publications

References 60 publications

Automated Paraphrase Generation with Over-Generation and Pruning Services

Automated Paraphrase Generation with Over-Generation and Pruning Services

LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity

ECAsT: a large dataset for conversational search and an evaluation of metric robustness

Contact Info

Product

Resources

About