2022
DOI: 10.1007/978-3-031-20059-5_28
|View full text |Cite
|
Sign up to set email alerts
|

AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 25 publications
0
2
0
Order By: Relevance
“…The same work presented an extensive analysis of several VQA methods highlighting the effective support provided by large language models. Wong et al (2022) defined the affordance-centric VQA problem where the AI assistant should learn from instructional videos to provide step-by-step help in the user's view. The authors introduced a new dataset and developed a novel question-to-action model based on an encoder-decoder architecture.…”
Section: Visual Question Answering (Vqa)mentioning
confidence: 99%
See 1 more Smart Citation
“…The same work presented an extensive analysis of several VQA methods highlighting the effective support provided by large language models. Wong et al (2022) defined the affordance-centric VQA problem where the AI assistant should learn from instructional videos to provide step-by-step help in the user's view. The authors introduced a new dataset and developed a novel question-to-action model based on an encoder-decoder architecture.…”
Section: Visual Question Answering (Vqa)mentioning
confidence: 99%
“…The QA task is formulated as a classification problem over the whole answer vocabulary. The AQTC benchmark proposed by Wong et al (2022) was created with a close focus on task completion and affordances. It contains 100 instructional videos with an average duration of 115 seconds and involves 25 common household appliances, with 531 multiple-choice question-answer samples.…”
Section: Visual Question Answering (Vqa)mentioning
confidence: 99%
“…Interactive learning could be applicable to many applications in the metaverse, in which humans interact with the model in both virtual and physical worlds, convey and discover new knowledge. As a starting point, recent works [182], [183] introduce AI assistants on smart glasses that can instruct novices in using a new device or learning a new skill. Yet, these models still do not form a bidirectional feedback loop between users and learning systems.…”
Section: J Ar/vr Data Streaming and Learningmentioning
confidence: 99%
“…Egocentric videos have become popular in the computer vision community due to the prevalence of small cameras and the ease of collecting FPV videos. These cameras are useful to collect a variety of videos in different domains, such as manufacturing (assemblydisassembly) (Tan et al, 2020), education, behavior (Bambach et al, 2017;Wong et al, 2022), and sports . These videos also enable analysis such as object detection (Fathi et al, 2011;Fan et al, 2018;Furnari et al, 2017), hands detection (Bambach et al, 2015;Chen et al, 2023), gaze detection and prediction (Huang et al, 2018).…”
Section: Video Qamentioning
confidence: 99%