Synthesizing Environment-Aware Activities via Activity Sketches

Liao, Yuan-Hong; Puig, Xavier; Boben, Marko; Torralba, Antonio; Fidler, Sanja

doi:10.1109/cvpr.2019.00645

Cited by 18 publications

(29 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The development of VirtualHome environment [32] enables such possibility. However, relevant works [32,21] rely on human-annotated data and perform supervised training from scratch. Due to the lack of rich world knowledge, these models can only generate action plans given detailed instructions of how to act or video demonstrations.…”

Section: Related Workmentioning

confidence: 99%

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Huang¹,

Abbeel²,

Pathak³

et al. 2022

Preprint

View full text Add to dashboard Cite

Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards *Equal advising. Correspondence to Wenlong Huang

show abstract

Section: Related Workmentioning

confidence: 99%

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Huang¹,

Abbeel²,

Pathak³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The VirtualHome dataset [17,23] contains activities that people do at home. For each activity, there are different descriptions on how to perform them.…”

Section: Household Datasetmentioning

confidence: 99%

“…Furthermore, our framework can recommend sequences of actions on how to perform a human scaled task, like "How can I make a sandwich?". Our framework is based on a domain-specific ontology that we have developed which contains knowledge from the VirtualHome dataset [17,23]. The ontology is built in OWL [19] and the Knowledge Base (KB) can be easily extended by adding new instances of objects, actions, and activities.…”

Section: Introductionmentioning

confidence: 99%

A Knowledge Retrieval Framework for Household Objects and Actions with External Knowledge

Vassiliades

Bassiliades

Gouidis

et al. 2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

In the field of domestic cognitive robotics, it is important to have a rich representation of knowledge about how household objects are related to each other and with respect to human actions. In this paper, we present a domain dependent knowledge retrieval framework for household environments which was constructed by extracting knowledge from the VirtualHome dataset (http://virtual-home.org). The framework provides knowledge about sequences of actions on how to perform human scaled tasks in a household environment, answers queries about household objects, and performs semantic matching between entities from the web knowledge graphs DBpedia, ConceptNet, and WordNet, with the ones existing in our knowledge graph. We offer a set of predefined SPARQL templates that directly address the ontology on which our knowledge retrieval framework is built, and querying capabilities through SPARQL. We evaluated our framework via two different user evaluations.

show abstract

“…Puig et al [14] build a data base of symbolic programs constituting high-level tasks in a home by using human subject instructing a virtual agent in a simulation environment. Liao et al [15] use the corpus to learn translations of domain independent task sketches to executable programs in the agent's physical context. Shridhar et al [16] take a similar approach by collecting natural language corpora describing high-level tasks and learn to associate instructions to spatial attention over the scene.…”

Section: Related Workmentioning

confidence: 99%

“…Given the current world state S (as a graph) and the goal description Λ g , our neural model estimates the likelihood p(t | S, Λ g ) over candidate tool objects t ∈ τ in the environment. We build on the ResActGraph model by Liao et al [15] as the baseline and extend the model to our problem setup. Figure 4 presents the final TOOLNET model.…”

Section: Learning To Predict Tool Usementioning

confidence: 99%

ToolNet: Using Commonsense Generalization for Predicting Tool Use for Robot Plan Synthesis

Rajas¹,

Tuli²,

Paul³

et al. 2020

Preprint

View full text Add to dashboard Cite

A robot working in a physical environment (like home or factory) needs to learn to use various available tools for accomplishing different tasks, for instance, a mop for cleaning and a tray for carrying objects. The number of possible tools is large and it may not be feasible to demonstrate usage of each individual tool during training. Can a robot learn commonsense knowledge and adapt to novel settings where some known tools are missing, but alternative unseen tools are present? We present a neural model that predicts the best tool from the available objects for achieving a given declarative goal. This model is trained by user demonstrations, which we crowd-source through humans instructing a robot in a physics simulator. This dataset maintains user plans involving multi-step object interactions along with symbolic state changes. Our neural model, TOOLNET, combines a graph neural network to encode the current environment state, and goal-conditioned spatial attention to predict the appropriate tool. We find that providing metric and semantic properties of objects, and pre-trained object embeddings derived from a commonsense knowledge repository such as ConceptNet, significantly improves the model's ability to generalize to unseen tools. The model makes accurate and generalizable tool predictions. When compared to a graph neural network baseline, it achieves 14-27% accuracy improvement for predicting known tools from new world scenes, and 44-67% improvement in generalization for novel objects not encountered during training.

show abstract

Synthesizing Environment-Aware Activities via Activity Sketches

Cited by 18 publications

References 9 publications

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

A Knowledge Retrieval Framework for Household Objects and Actions with External Knowledge

ToolNet: Using Commonsense Generalization for Predicting Tool Use for Robot Plan Synthesis

Contact Info

Product

Resources

About