Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Kembhavi, Aniruddha; Seo, Minjoon; Schwenk, Dustin; Choi, Jonghyun; Farhadi, Ali; Hajishirzi, Hannaneh

doi:10.1109/cvpr.2017.571

Cited by 197 publications

(161 citation statements)

References 9 publications

Supporting

Mentioning

159

Contrasting

Unclassified

Order By: Relevance

“…To select the best P q (x), P c (x) and sampling strategy we conducted the following search. First we explored sampling probabilities 0.2, 0.4, 0.6, 0.8, 1.0 for query and context separately, using random sampling, and subsequently we combined them using values informed from the previous exploration, this time BioASQ (Tsatsaronis et al, 2015) 60.28 71.98 DROP (Dua et al, 2019) 48.50 58.90 DuoRC (Saha et al, 2018) 53.29 63.36 RACE (Lai et al, 2017) 39.35 53.87 RelationExtraction (Levy et al, 2017) 79.20 87.85 TextbookQA (Kembhavi et al, 2017) 56.50 65.54…”

Section: Experiments and Discussionmentioning

confidence: 99%

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering

Longpre

Lu²,

et al. 2019

Proceedings of the 2nd Workshop on Machine Reading for Question Answering

View full text Add to dashboard Cite

To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pretrained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a simple negative sampling technique to be particularly effective, even though it is typically used for datasets that include unanswerable questions, such as SQuAD 2.0. When applied in conjunction with per-domain sampling, our XLNet (Yang et al., 2019)-based submission achieved the second best Exact Match and F1 in the MRQA leaderboard competition. * equal contribution 1 https://mrqa.github.io/shared

show abstract

Section: Experiments and Discussionmentioning

confidence: 99%

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering

Longpre

Lu²,

et al. 2019

Proceedings of the 2nd Workshop on Machine Reading for Question Answering

View full text Add to dashboard Cite

show abstract

“…The Multi-Output Model (MOM) introduced in DVQA uses an OCR module to read chart specific content. Textbook QA (TQA) [24] considers the task of answering questions from middle-school textbooks, which often require understanding and reasoning about text and diagrams. Similarly, AI2D [23] contains diagram based multiple-choice questions.…”

Section: Related Workmentioning

confidence: 99%

Towards VQA Models That Can Read

Singh

Natarajan

Shah

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

397

348

View full text Add to dashboard Cite

Studies have shown that a dominant class of questions asked by visually impaired users on images of their surroundings involves reading text in the image. But today's VQA models can not read! Our paper takes a first step towards addressing this problem. First, we introduce a new "TextVQA" dataset to facilitate progress on this important problem. Existing datasets either have a small proportion of questions about text (e.g., the VQA dataset) or are too small (e.g., the VizWiz dataset). TextVQA contains 45,336 questions on 28,408 images that require reasoning about text to answer. Second, we introduce a novel model architecture that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or is composed of the strings found in the image. Consequently, we call our approach Look, Read, Reason & Answer (LoRRA) 1 . We show that LoRRA outperforms existing state-of-the-art VQA models on our TextVQA dataset. We find that the gap between human performance and machine performance is significantly larger on TextVQA than on VQA 2.0, suggesting that TextVQA is well-suited to benchmark progress along directions complementary to VQA 2.0. VQA ComponentSimilar to many VQA models [7,17], we first embed the question words w 1 , w 2 , . . . , w L of the question q with a pre-trained embedding function (e.g. GloVe [36]) and then encode the resultant word embeddings iteratively with a re-

show abstract

“…The closest works to ours are (Iyyer et al, 2017), (Tapaswi et al, 2016) and (Kembhavi et al, 2017) where data multi-modality is the key aspect. COMICS dataset (Iyyer et al, 2017) focus on comic book narratives and explore visual cloze style questions, introducing a dataset consisting of drawings from comic books.…”

Section: Related Workmentioning

confidence: 99%

RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes

Yagcioglu¹,

Erdem²,

Erdem³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

110

100

View full text Add to dashboard Cite

Understanding and reasoning about cooking recipes is a fruitful research direction towards enabling machines to interpret procedural text. In this work, we introduce RecipeQA, a dataset for multimodal comprehension of cooking recipes. It comprises of approximately 20K instructional recipes with multiple modalities such as titles, descriptions and aligned set of images. With over 36K automatically generated question-answer pairs, we design a set of comprehension and reasoning tasks that require joint understanding of images and text, capturing the temporal flow of events and making sense of procedural knowledge. Our preliminary results indicate that RecipeQA will serve as a challenging test bed and an ideal benchmark for evaluating machine comprehension systems. The data and leaderboard are available at

show abstract

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Cited by 197 publications

References 9 publications

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering

Towards VQA Models That Can Read

RecipeQA: A Challenge Dataset for Multimodal Comprehension of Cooking Recipes

Contact Info

Product

Resources

About