A New Representation of Universal Successor Features for Enhancing the Generalization of Target-Driven Visual Navigation

Hu, Jiaocheng; Ma, Yuexin; Jiang, Haiyun; He, Shaofeng; Liu, Gelu; Weng, Qizhen; Zhu, Xiangwei

doi:10.1109/lra.2024.3438588

IEEE Robot. Autom. Lett.

2024

DOI: 10.1109/lra.2024.3438588

|View full text |Cite

A New Representation of Universal Successor Features for Enhancing the Generalization of Target-Driven Visual Navigation

Jiaocheng Hu,

Yuexin Ma,

Haiyun Jiang

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Multi-modal scene graph inspired policy for visual navigation

He,

Zhou,

Tian

2024

J Supercomput

View full text Add to dashboard Cite

Visual navigation needs the agent locate the given target with visual perception. To enable robots to effectively execute tasks, combining large language models (LLMs) with multi-modal inputs in navigation is necessary. While LLMs offer rich semantic knowledge, they lack specific real-world information and real-time interaction capabilities. This paper introduces a Multi-modal Scene Graph (MMSG) navigation framework that aligns LLMs with visual perception models to predict next steps. Firstly, a multi-modal scene dataset is constructed, containing triplets of object-relations-target words. We provide target words and lists of existing objects in the scene to generate a large number of instructions and corresponding action plans for GPT$$-$$ - 3.5. The generated data is then utilized for pre-train LLM for path planning. During inference, we discover objects in the scene by extending the DETR visual object detector to multi-view RGB image collected from different reachable positions. Experimental results show that path planning generated from MMSG outperforms state-of-the-art methods, indicating its feasibility in complex environments. We evaluate our methods on the ProTHOR dataset and show superior navigation performance.

show abstract

Multi-modal scene graph inspired policy for visual navigation

He,

Zhou,

Tian

2024

J Supercomput

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A New Representation of Universal Successor Features for Enhancing the Generalization of Target-Driven Visual Navigation

Cited by 1 publication

References 17 publications

Multi-modal scene graph inspired policy for visual navigation

Multi-modal scene graph inspired policy for visual navigation

Contact Info

Product

Resources

About