Traditional visual place recognition (VPR) methods generally use frame-based cameras, which will easily fail due to rapid illumination changes or fast motion. To overcome this, we propose an end-to-end visual place recognition network using event cameras, which can achieve good recognition performance in challenging environments (e.g., large-scale driving scenes). The key idea of the proposed algorithm is firstly to characterize the event streams with the EST voxel grid representation, then extract features using a deep residual network, and finally aggregate features using an improved VLAD network to realize end-to-end visual place recognition using event streams. To verify the effectiveness of the proposed algorithm, on the event-based driving datasets (MVSEC, DDD17, Brisbane-Event-VPR) and the synthetic event datasets (Oxford RobotCar, CARLA), we analyze the performance of our proposed method on large-scale driving sequences including cross-weather, cross-season and illumination changing scenes, and then we compare the proposed method with state-of-the-art event-based VPR method (Ensemble-Event-VPR) to prove its advantages. Experimental results show that the performance of the proposed method is better than that of event-based ensemble scheme in challenging scenarios. To our knowledge, for visual place recognition task, this is the first endto-end weakly supervised deep network architecture that directly processes event stream data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.