Continuous Self-Localization on Aerial Images Using Visual and Lidar Sensors

Fervers, Florian; Bullinger, Sebastian; Bodensteiner, Christoph; Arens, Michael; Stiefelhagen, Rainer

doi:10.1109/iros47612.2022.9982195

Cited by 11 publications

(17 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Chiu et al (2018) proposed a new approach that uses semantic information to register 2D monocular video frames to LiDAR data for augmented reality driving applications. Fervers et al (2022) combined the feature extracted in ground image and LiDAR point cloud to find the registration relationship. Some researchers believe that compared with 3D point clouds, 2D orthophoto or other map is suitable for geographical reference object because it is easier to obtain.…”

Section: Image Geo-registrationmentioning

confidence: 99%

Real‐time mosaic of multiple fisheye surveillance videos based on geo‐registration and rectification

Gao

2023

The Photogrammetric Record

View full text Add to dashboard Cite

A distributed fisheye video surveillance system (DFVSS) can monitor a wide area without blind spots, but it is often affected by the viewpoint discontinuity and space inconsistency of multiple videos in the area. This paper proposes a novel real‐time fisheye video mosaic algorithm for wide‐area surveillance. First, by extending the line photogrammetry theory under central projection to spherical projection, a fisheye video geo‐registration model is established and estimated using orthogonal parallel lines on the ground, so that all videos of DFVSS are in the unified reference system to eliminate the space inconsistency between them. Second, by combining the photogrammetry orthorectification technique with thin‐plate spline transformation, a fisheye video rectification model is established to eliminate serious distortion in geo‐registered fisheye videos and align them accurately. Third, the viewport‐dependent video selection strategy and video look‐up table computation technique are adopted to create a high‐resolution panorama from input fisheye videos in real time. A parking lot of about 0.4 km2 monitored by eight fisheye cameras was selected as the test area. The experimental result shows the line re‐projection error in fisheye videos is about 0.5 pixels, and the overall efficiency, including panorama creation and mapping to the ground as texture, is not <30 fps. It indicates that the proposed algorithm can achieve a good balance between the limitation of video transmission bandwidth and the smooth observation requirement of computer equipment for the panorama, which is of great value for the construction and application of DFVSS.

show abstract

Section: Image Geo-registrationmentioning

confidence: 99%

Real‐time mosaic of multiple fisheye surveillance videos based on geo‐registration and rectification

Gao

2023

The Photogrammetric Record

View full text Add to dashboard Cite

show abstract

“…Hybrid sensor solutions have also been explored, such as in [16] where an aerial robot achieves global localization through the use of egocentric 3D semantically labelled LiDAR, IMU, and visual information. CSLA [6] and SIBCL [33] extract visual features from ground and satellite images and use LiDAR points to establish correspondence between the two views. CSLA [6] aims to estimate 2-DoF translation, while SIBCL [33] aims to estimate 3-DoF pose, including an additional orientation.…”

Section: Related Workmentioning

confidence: 99%

“…CSLA [6] and SIBCL [33] extract visual features from ground and satellite images and use LiDAR points to establish correspondence between the two views. CSLA [6] aims to estimate 2-DoF translation, while SIBCL [33] aims to estimate 3-DoF pose, including an additional orientation. All these methods critically rely on depth information to build the correspondence across the two views.…”

Section: Related Workmentioning

confidence: 99%

Satellite Image Based Cross-view Localization for Autonomous Vehicle

Wang

Zhang

Vora

et al. 2023

2023 IEEE International Conference on Robotics and Automation (ICRA)

View full text Add to dashboard Cite

This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images. The proposed method addresses limitations in existing cross-view localization methods that struggle to handle noise sources such as moving objects and seasonal variations. It is the first sparse visual-only method that enhances perception in dynamic environments by detecting view-consistent key points and their corresponding deep features from ground and satellite views, while removing off-the-ground objects and establishing homography transformation between the two views. Moreover, the proposed method incorporates a spatial embedding approach that leverages camera intrinsic and extrinsic information to reduce the ambiguity of purely visual matching, leading to improved feature matching and overall pose estimation accuracy. The method exhibits strong generalization and is robust to environmental changes, requiring only geo-poses as ground truth. Extensive experiments on the KITTI and Ford Multi-AV Seasonal datasets demonstrate that our proposed method outperforms existing state-of-the-art methods, achieving median spatial accuracy errors below 0.5 meters along the lateral and longitudinal directions, and a median orientation accuracy error below 2 • 1 .

show abstract

“…HD maps may thus be provided by third-party mapping companies [26] as well as publicly available data, e.g. from aerial imagery [27].…”

Section: Introductionmentioning

confidence: 99%

“…Especially for cross-modality localization, the identification of reliable landmarks for various sensor and map modalities is non-trivial. Here, learningbased approaches have become the state-of-the-art for both cross-modal PR [1], [11], [34], [35], [36] as well as local pose tracking, achieving localization accuracies below 1 m for various sensor modalities, including radar-to-lidar [2], [37], [38], range-to-aerial-imagery [35], [39], [40], [41] and camera-to-aerial-imagery, also called cross-view geolocalization (CVGL) [27], [41], [42].…”

Section: Introductionmentioning

confidence: 99%