Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-3154
|View full text |Cite
|
Sign up to set email alerts
|

Mitigating Noisy Inputs for Question Answering

Abstract: Natural language processing systems are often downstream of unreliable inputs: machine translation, optical character recognition, or speech recognition. For instance, virtual assistants can only answer your questions after understanding your speech. We investigate and mitigate the effects of noise from Automatic Speech Recognition systems on two factoid Question Answering (QA) tasks.Integrating confidences into the model and forced decoding of unknown words are empirically shown to improve the accuracy of dow… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 16 publications
0
10
0
Order By: Relevance
“…Question answering can also be interpreted as an exercise in verifying the knowledge of experts by finding the answer to trivia questions that are carefully crafted by someone who already knows the answer such that exactly one answer is correct such as TriviaQA and Quizbowl/Jeopoardy! questions (Ferrucci et al, 2010;Dunn et al, 2017;Joshi et al, 2017;Peskov et al, 2019); this information-verifying paradigm also describes reading comprehension datasets such as NewsQA (Trischler et al, 2017), SQuAD (Rajpurkar et al, 2016(Rajpurkar et al, , 2018, CoQA (Reddy et al, 2019), and the multiple choice RACE (Lai et al, 2017). This paradigm has been taken even further by biasing the distribution of questions toward especially hard-to-model examples as in QAngaroo (Welbl et al, 2018), HotpotQA (Yang et al, 2018), and DROP (Dua et al, 2019).…”
Section: Quality Controlmentioning
confidence: 99%
“…Question answering can also be interpreted as an exercise in verifying the knowledge of experts by finding the answer to trivia questions that are carefully crafted by someone who already knows the answer such that exactly one answer is correct such as TriviaQA and Quizbowl/Jeopoardy! questions (Ferrucci et al, 2010;Dunn et al, 2017;Joshi et al, 2017;Peskov et al, 2019); this information-verifying paradigm also describes reading comprehension datasets such as NewsQA (Trischler et al, 2017), SQuAD (Rajpurkar et al, 2016(Rajpurkar et al, , 2018, CoQA (Reddy et al, 2019), and the multiple choice RACE (Lai et al, 2017). This paradigm has been taken even further by biasing the distribution of questions toward especially hard-to-model examples as in QAngaroo (Welbl et al, 2018), HotpotQA (Yang et al, 2018), and DROP (Dua et al, 2019).…”
Section: Quality Controlmentioning
confidence: 99%
“…On our set of human voices, Kaldi produces at least one UNK token for ∼50% of the questions, and BERT achieves an F1 score of only 43.6 on this set (54.4 F1 and 32.3 F1 separately on questions with and without UNK respectively) compared to 67.1 F1 achieved by Google ASR, demonstrating that speech recognizer choice can greatly affect downstream QA performance. The observed degradation due to UNK decoding (previously noted by Peskov et al, 2019) suggests that practitioners might find it useful to go beyond speech recognition benchmarks, and also evaluate ASR systems in the context of downstream QA applications.…”
Section: Results and Analysismentioning
confidence: 93%
“…The interactive component of CQA also provides a natural mechanism for improving rewriting. When the computer cannot understand (rewrite) a question because of complicated context, missing world knowledge, or upstream errors (Peskov et al, 2019) in the course of a conversation, it should be able to ask its interlocutor, "can you unpack that?" This dataset helps start that conversation; the next steps are developing and evaluating models that efficiently decide when to ask for human assistance, and how to best use this assistance.…”
Section: Related Work and Discussionmentioning
confidence: 99%