2020
DOI: 10.1007/s11263-020-01374-3
|View full text |Cite
|
Sign up to set email alerts
|

Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory

Abstract: The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual navigation environment based on Google Street View; we further design an annotation method to highlight mined anchor … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 32 publications
(19 citation statements)
references
References 65 publications
0
17
0
2
Order By: Relevance
“…Talk to Nav. Vasudevan et al [148] developed an interactive visual navigation environment based on Google Street View named Talk2Nav dataset with 10,714 routes, and an built effective model to create large-scale navigational instructions over long-range city environments. 16 http://streetlearn.cc Street View.…”
Section: Street View Navigationmentioning
confidence: 99%
“…Talk to Nav. Vasudevan et al [148] developed an interactive visual navigation environment based on Google Street View named Talk2Nav dataset with 10,714 routes, and an built effective model to create large-scale navigational instructions over long-range city environments. 16 http://streetlearn.cc Street View.…”
Section: Street View Navigationmentioning
confidence: 99%
“…ere are a large number of variants of the K-means algorithm, including initialization optimization K-means++, distance calculation optimization Elkan K-means algorithm, and optimization Mini Batch K-means algorithm in the case of big data. e deterministic algorithm converts the landmark visual saliency problem into an optimization problem and converts the local pattern matching in the landmark visual saliency of the video sequence into a cost function minimization problem, the most representative deterministic algorithm is the K-means clustering algorithm, and the advantages of the K-means clustering algorithm are fast convergence, being used in the landmark visual saliency of the high frame rate, and being very suitable for the landmark visual saliency analysis of real-time scenes when a considerable number of landmark visual saliency algorithms are based on the improvement of K-means clustering algorithm [8]. However, the K-means clustering method also has its drawbacks; for example, it is difficult to cope with the scale change and shape change of the target in the landmark visual saliency model acquisition process, easy to be influenced by the similar background and the interference of light change, and easy to occur in the building surface clustering in the building surface process.…”
Section: Related Workmentioning
confidence: 99%
“…e artificial ant in the ant colony algorithm uses the overall information of the ant colony, and the global update of the residual pheromone is performed only after the completion of an optimization search. e pheromone update formula on each path in the ant colony algorithm is (8), where D(j) min is the intraclass distance when the objective function obtains the minimum value. v(wkj) is the pheromone increment; M is the total amount of pheromone released by ants; and D(j) (0 < D(j) < 1) is the pheromone volatility coefficient.…”
Section: Advances In Civil Engineeringmentioning
confidence: 99%
“…In recent years, researchers have investigated systems where passengers can give commands to self-driving cars. For instance, (Vasudevan, Dai, and Van Gool 2021;Chen et al 2019) consider navigational commands such as "Take the first left and at the red building turn right. Afterwards, drive to the white building".…”
Section: Introductionmentioning
confidence: 99%