QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Rogers, Anna Backman; Gardner, Matt; Augenstein, Isabelle

doi:10.48550/arxiv.2107.12708

Cited by 17 publications

(17 citation statements)

References 205 publications

(329 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Keyword Matching Without Structural Concern Aligning with the insight that retrieval often emphasizes content matching rather than complex reasoning (Rogers et al, 2021), we find that 71 out of the 100 samples only require simple keyword matching, where 18 questions fully match with table titles (Figure 2 (a)) and the other 53 questions further match with table headers (Figure 2 (b)).…”

Section: Nq-table Analysis: How Muchmentioning

confidence: 81%

Table Retrieval May Not Necessitate Table-specific Model Design

Wang¹,

Jiang²,

Nyberg³

et al. 2022

Preprint

View full text Add to dashboard Cite

Tables are an important form of structured data for both human and machine readers alike, providing answers to questions that cannot, or cannot easily, be found in texts. Recent work has designed special models and training paradigms for table-related tasks such as tablebased question answering and table retrieval. Though effective, they add complexity in both modeling and data acquisition compared to generic text solutions and obscure which elements are truly beneficial. In this work, we focus on the task of table retrieval, and ask: "is table-specific model design necessary for table retrieval, or can a simpler text-based model be effectively used to achieve a similar result?" First, we perform an analysis on a table-based portion of the Natural Questions dataset (NQtable), and find that structure plays a negligible role in more than 70% of the cases. Based on this, we experiment with a general Dense Passage Retriever (DPR) based on text and a specialized Dense Table Retriever (DTR) that uses table-specific model designs. We find that DPR performs well without any table-specific design and training, and even achieves superior results compared to DTR when fine-tuned on properly linearized tables. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. However, none of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval. 1

show abstract

Section: Nq-table Analysis: How Muchmentioning

confidence: 81%

Table Retrieval May Not Necessitate Table-specific Model Design

Wang¹,

Jiang²,

Nyberg³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…SQuAD is available under the CC BY-SA license. SQuAD has become a de facto standard and inspired creation of analogous resources in other languages (Rogers et al, 2021).…”

Section: Question Answeringmentioning

confidence: 99%

The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer

Efimov¹,

Boytsov²,

Arslanova³

et al. 2022

Preprint

View full text Add to dashboard Cite

Large pre-trained multilingual models such as mBERT and XLM-R enabled effective crosslingual zero-shot transfer in many NLP tasks. A cross-lingual adjustment of these models using a small parallel corpus can potentially further improve results. This is a more data efficient method compared to training a machine-translation system or a multi-lingual model from scratch using only parallel data. In this study, we experiment with zero-shot transfer of English models to four typologically different languages (Spanish, Russian, Vietnamese, and Hindi) and three NLP tasks (QA, NLI, and NER). We carry out a cross-lingual adjustment of an off-the-shelf mBERT model. We confirm prior finding that this adjustment makes embeddings of semantically similar words from different languages closer to each other, while keeping unrelated words apart. However, from the paired-differences histograms introduced in our work we can see that the adjustment only modestly affects the relative distances between related and unrelated words. In contrast, fine-tuning of mBERT on English data (for a specific task such as NER) draws embeddings of both related and unrelated words closer to each other. The cross-lingual adjustment of mBERT improves NLI in four languages and NER in two languages, while QA performance never improves and sometimes degrades. When we fine-tune a cross-lingual adjusted mBERT for a specific task (e.g., NLI), the cross-lingual adjustment of mBERT may still improve the separation between related and related words, but this works consistently only for the XNLI task. Our study contributes to a better understanding of cross-lingual transfer capabilities of large multilingual language models and of effectiveness of their cross-lingual adjustment in various NLP tasks.

show abstract

“…The rather high age of participants (see Fig. 4) may have induced significant demographic bias [56] regarding negative attitudes towards artificial intelligence and, thus, ACA [17]. No person below 18 years participated due to legal constraints by the platform.…”

Section: Remote User Experience Survey (Gui Prototype 2)mentioning

confidence: 99%

User Experience Design for Automatic Credibility Assessment of News Content About COVID-19

Schulz¹,

Rauenbusch²,

Jan³

et al. 2022

Preprint

View full text Add to dashboard Cite

The increasingly rapid spread of information about COVID-19 on the web calls for automatic measures of credibility assessment [18]. If large parts of the population are expected to act responsibly during a pandemic, they need information that can be trusted [20].In that context, we model the credibility of texts using 25 linguistic phenomena, such as spelling, sentiment and lexical diversity. We integrate these measures in a graphical interface and present two empirical studies to evaluate its usability for credibility assessment on COVID-19 news. Raw data for the studies, including all questions and responses, has been made available to the public using an open license: https://github.com/ konstantinschulz/credible-covid-ux. The user interface prominently features three sub-scores and an aggregation for a quick overview. Besides, metadata about the concept, authorship and infrastructure of the underlying algorithm is provided explicitly. Our working definition of credibility is operationalized through the terms of trustworthiness, understandability, transparency, and relevance. Each of them builds on well-established scientific notions [41,65,68] and is explained orally or through Likert scales. In a moderated qualitative interview with six participants, we introduce information transparency for news about COVID-19 as the general goal of a prototypical platform, accessible through an interface in the form of a wireframe [43]. The participants' answers are transcribed in excerpts. Then, we triangulate inductive and deductive coding methods [19] to analyze their content. As a result, we identify rating scale, sub-criteria and algorithm authorship as important predictors of the usability. In a subsequent quantitative online survey, we present a questionnaire with wireframes to 50 crowdworkers. The question formats include Likert scales, multiple choice and open-ended types. This way, we aim to strike a balance between the known strengths and weaknesses of open vs. closed questions [11]. The answers reveal a conflict between transparency and conciseness in the interface design: Users tend to ask for more information, but do not necessarily make explicit use of it when given. This discrepancy is influenced by capacity constraints of the human working memory [38]. Moreover, a perceived hierarchy of metadata

show abstract

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

Cited by 17 publications

References 205 publications

Table Retrieval May Not Necessitate Table-specific Model Design

Table Retrieval May Not Necessitate Table-specific Model Design

The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer

User Experience Design for Automatic Credibility Assessment of News Content About COVID-19

Contact Info

Product

Resources

About