Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Nandy, Abhilash; Sharma, Soumya; Maddhashiya, Shubham; Sachdeva, Kapil; Goyal, Pawan; Ganguly, Niloy

doi:10.18653/v1/2021.findings-emnlp.392

Cited by 3 publications

(9 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different from previous multimodal inputs, the product manual is a specific domain in terms of the question type and the content. Since product manuals usually contain detailed operation instructions for a specific device, the questions beginning with 'How to' are very common (Nandy et al 2021), while this type of contents and questions rarely occur in general domain datasets. Moreover, the answers in the abovementioned works are all in text format, including text span, multi-choice, and generative sentences.…”

Section: Multimodal Question Answeringmentioning

confidence: 99%

“…The product manuals in PM209 are from two sources: 1) E-manual corpus (Nandy et al 2021); 2) official websites of the brands.…”

Section: A Product Manual Collectionmentioning

confidence: 99%

“…Source 1: E-manual corpus E-manual corpus (Nandy et al 2021) is a large-scale text corpus. It is constructed by crawling product manuals from the website 2 and extracting their text contents.…”

Section: A Product Manual Collectionmentioning

confidence: 99%

“…Each question is associated with a multimodal answer which is comprised of two parts: a textual part in natural language sentences, and a visual part containing regions from the manual. Table 1 shows the basic comparison between PM209 and existing PMQA datasets (Nandy et al 2021). The scale of PM209 is larger than existing PMQA datasets in terms of brands, manual numbers, and QA pairs.…”

Section: Introductionmentioning

confidence: 99%

“…Table8shows that PM209 covers diverse topics including PC, automobiles, speakers, applications, hardware, cellphones, cameras, etc. The topics of PM209 are much richer than S10 QA and Smart TV/Remote QA(Nandy et al 2021) which only focus on one specific product. Topics in PM209 are also very different from VisualMRC(Tanaka, Nishida, and Yoshida 2021) which consists of open-domain webpages and contains rare topics about specific consumer products.…”

mentioning

confidence: 99%

See 4 more Smart Citations

Commercialization of state-owned broadcast networks in China

Yan¹

View full text Add to dashboard Cite

Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is constructed with human annotations, which contains 209 product manuals from 27 well-known consumer electronic brands. Human annotations include 6 types of semantic regions for manual contents and 22,021 pairs of question and answer. Especially, each answer consists of a textual sentence and related visual regions from manuals. Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MP-MQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers. We further propose a unified model that can perform these two subtasks all together and achieve comparable performance with multiple task-specific models. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA.

show abstract

Section: Multimodal Question Answeringmentioning

confidence: 99%

“…The product manuals in PM209 are from two sources: 1) E-manual corpus (Nandy et al 2021); 2) official websites of the brands.…”

Section: A Product Manual Collectionmentioning

confidence: 99%

Section: A Product Manual Collectionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 3 more Smart Citations

Commercialization of state-owned broadcast networks in China

Yan¹

View full text Add to dashboard Cite

show abstract

MPMQA: Multimodal Question Answering on Product Manuals

Zhang

et al. 2023

AAAI

View full text Add to dashboard Cite

Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the model not only to process multimodal contents but also to provide multimodal answers. To support MPMQA, a large-scale dataset PM209 is constructed with human annotations, which contains 209 product manuals from 27 well-known consumer electronic brands. Human annotations include 6 types of semantic regions for manual contents and 22,021 pairs of question and answer. Especially, each answer consists of a textual sentence and related visual regions from manuals. Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers. We further propose a unified model that can perform these two subtasks all together and achieve comparable performance with multiple task-specific models. The PM209 dataset is available at https://github.com/AIM3-RUC/MPMQA.

show abstract

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

Nandy,

Kapadnis,

Goyal

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

In this paper, we propose CLMSM, a domainspecific, continual pre-training framework, that learns from a large set of procedural recipes. CLMSM uses a Multi-Task Learning Framework to optimize two objectives -a) Contrastive Learning using hard triplets to learn fine-grained differences across entities in the procedures, and b) a novel Mask-Step Modelling objective to learn step-wise context of a procedure. We test the performance of CLMSM on the downstream tasks of tracking entities and aligning actions between two procedures on three datasets, one of which is an open-domain dataset not conforming with the pre-training dataset. We show that CLMSM not only outperforms baselines on recipes (indomain) but is also able to generalize to opendomain procedural NLP tasks.

show abstract

Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

Cited by 3 publications

References 23 publications

Commercialization of state-owned broadcast networks in China

Commercialization of state-owned broadcast networks in China

MPMQA: Multimodal Question Answering on Product Manuals

CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text

Contact Info

Product

Resources

About