2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.02042
|View full text |Cite
|
Sign up to set email alerts
|

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 88 publications
(37 citation statements)
references
References 32 publications
0
29
0
Order By: Relevance
“…While some of these tasks have been studied in previous works, none of them has been studied in industrial scenarios from the egocentric perspective also considering multimodal observations. Moreover, there are only few datasets publicy available [15,41,81] which can be used to study different tasks simultaneously and to develop a complete system for human behavior understanding taking into account different aspects (e.g., actions, interactions, objects, future intentions).…”
Section: Benchmarks and Baseline Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…While some of these tasks have been studied in previous works, none of them has been studied in industrial scenarios from the egocentric perspective also considering multimodal observations. Moreover, there are only few datasets publicy available [15,41,81] which can be used to study different tasks simultaneously and to develop a complete system for human behavior understanding taking into account different aspects (e.g., actions, interactions, objects, future intentions).…”
Section: Benchmarks and Baseline Resultsmentioning
confidence: 99%
“…Inspired by the first version of the MECCANO dataset [74], [81] proposed Assembly101 which is a procedural activity dataset comprising multi-view videos in which subjects assembly and disassembly toys. Contextually, they benchmarked three action understanding tasks (i.e., action recognition, action anticipation and temporal segmentation) and proposed a new task which is related to mistakes detection.…”
Section: Datasets For Human Behavior Understandingmentioning
confidence: 99%
“…Large-scale narrated instructional video datasets [6,17,25,30,31] have paved the way for learning joint video-language representations and task structure from videos. More recently, datasets such as Assembly-101 dataset [21] and Ikea ASM [3] provide videos of people assembling and disassembling toys and furniture. Assembly-101 also contains annotations for detecting mistakes in the video.…”
Section: Related Workmentioning
confidence: 99%
“…This field has important applications in egocentric robotics vision [27] and virtual reality [17]. Unfortunately, despite the availability of several related benchmarks [16,29,40,49], current Ego-HOI works often require bulky laboratory equipment like headset cameras for data collection.…”
Section: Introductionmentioning
confidence: 99%
“…To validate our approach, we further define new benchmark settings called Ego-HOI-XView, which utilizes third-person videos during pre-training to help learn HOI knowledge for cross-view fine-tuning and inference in egocentric videos. The benchmarks are based on two multi-view datasets, Assembly101 [49] and H2O [29], and are designed to evaluate cross-view egocentric human-object interaction recognition. We conduct extensive experiments and analyses on these benchmarks to verify the transferable ability of our model across different views.…”
Section: Introductionmentioning
confidence: 99%