“…Question answering can also be interpreted as an exercise in verifying the knowledge of experts by finding the answer to trivia questions that are carefully crafted by someone who already knows the answer such that exactly one answer is correct such as TriviaQA and Quizbowl/Jeopoardy! questions (Ferrucci et al, 2010;Dunn et al, 2017;Joshi et al, 2017;Peskov et al, 2019); this information-verifying paradigm also describes reading comprehension datasets such as NewsQA (Trischler et al, 2017), SQuAD (Rajpurkar et al, 2016(Rajpurkar et al, , 2018, CoQA (Reddy et al, 2019), and the multiple choice RACE (Lai et al, 2017). This paradigm has been taken even further by biasing the distribution of questions toward especially hard-to-model examples as in QAngaroo (Welbl et al, 2018), HotpotQA (Yang et al, 2018), and DROP (Dua et al, 2019).…”