2020 IEEE International Conference on Image Processing (ICIP) 2020
DOI: 10.1109/icip40778.2020.9191324
|View full text |Cite
|
Sign up to set email alerts
|

Actionet: An Interactive End-To-End Platform For Task-Based Data Collection And Augmentation In 3D Environment

Abstract: The problem of task planning for artificial agents remains largely unsolved. While there has been increasing interest in data-driven approaches for the study of task planning for artificial agents, a significant remaining bottleneck is the dearth of large-scale comprehensive task-based datasets. In this paper, we present ActioNet, an interactive end-to-end platform for data collection and augmentation of task-based dataset in 3D environment. Using ActioNet, we collected a large-scale comprehensive task-based d… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 6 publications
0
7
0
Order By: Relevance
“…Datasets. We evaluate PlaTe on an instructional video dataset CrossTask [16] and an interactive dataset ActioNet [30], which is based on AI2-THOR [31]. For real-world UR-5 experiments, we collect a UR-5 Reaching Dataset which consists of 100 trajectories (2150 first-person-view RGB image and corresponding action pairs) as a training set and evaluate on a real UR-5 platform.…”
Section: A Experimental Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…Datasets. We evaluate PlaTe on an instructional video dataset CrossTask [16] and an interactive dataset ActioNet [30], which is based on AI2-THOR [31]. For real-world UR-5 experiments, we collect a UR-5 Reaching Dataset which consists of 100 trajectories (2150 first-person-view RGB image and corresponding action pairs) as a training set and evaluate on a real UR-5 platform.…”
Section: A Experimental Setupmentioning
confidence: 99%
“…To further illustrate the effectiveness of our method, we experiment on a second dataset ActioNet [30]. ActioNet is an interactive end-to-end platform for data collection and augmentation of the task-based dataset in 3D environment.…”
Section: Evaluating Procedures Planning On Actionetmentioning
confidence: 99%
“…But VLN sequences are much longer and require a constant feeding input of vision data and the ability to manipulate camera viewpoints, which is unlike VQA which takes in a single input question and performs a series of actions to determine the answer to the question.The notion that we might be able to give out a general, natural language instruction to a robot and expect them to execute or perform the task is now possible. These are [100,10,11] achieved with the advancedment of recurrent neural network methods for joint interpretation of both visual and natural language input and datasets that are designed for simplifying processes of task-based instruction in navigation and performing of tasks in the 3D environment.…”
Section: Types Of Visual Navigationmentioning
confidence: 99%
“…These simulated worlds serve as virtual testbeds to train and test embodied AI frameworks before deploying them into the real world. These embodied AI simulators also facilitate the collection of of task-based dataset [10,11] which are tedious to collect in real-world as it requires an extensive amount of manual labour to replicate the same setting as in the virtual world. While there have been several survey papers in the field of embodied AI [1,12,2], they are mostly outdated as they were published before the modern deep learning era, which started around 2009 [13,14,15,16,8].…”
Section: Introductionmentioning
confidence: 99%
“…However, those works primarily focused on human intuitions of recognizing or predicting motions. But with the advancement and rise of deep learning, computer graphics, and embodied AI [22,23,24], there has been a paradigm shift towards generating synthetic datasets that range from simple 2D cartoons [25,26,27] to realistic interaction in 3D environments [13,28,14,29,30,15], all with the aim to explore machine perception of physics and causal reasoning on a deeper level. However, only datasets from CLEVRER [15], CoPhy [13], and CATER [14] are most relevant to our work.…”
Section: Related Workmentioning
confidence: 99%