2024
DOI: 10.1101/2024.12.05.627038
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Egocentric Perception of Walking Environments using an Interactive Vision-Language System

Haining Tan,
Alex Mihailidis,
Brokoslaw Laschowski

Abstract: Large language models can provide a more detailed contextual understanding of a scene beyond what computer vision alone can provide, which have implications for robotics and embodied intelligence. In this study, we developed a novel multimodal vision-language system for egocentric visual perception, with an initial focus on real-world walking environments. We trained a number of state-of-the-art transformer-based vision-language models that use causal language modelling on our custom dataset of 43,055 image-te… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 12 publications
0
0
0
Order By: Relevance