AE-GAN-Net: Learning Invariant Feature Descriptor to Match Ground Camera Images and a Large-Scale 3D Image-Based Point Cloud for Outdoor Augmented Reality

Liu, Weiquan; Wang, Cheng; Bian, Xuesheng; Chen, Shuting; Li, Wei; Lin, Xiuhong; Li, Yongchuan; Weng, Dongdong; Lai, Shang-Hong; Li, Jonathan

doi:10.3390/rs11192243

Cited by 8 publications

(7 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To improve the shortcomings of PPFNet, Deng et al ( 2018a ) further proposed PPF-FoldNet for unsupervised learning of 3D local descriptors in point clouds. In this network, the source point cloud and normal vector are not included in the coding, but the point cloud features are sent to the automatic encoder (AE) like FoldingNet (Hinton and Zemel, 1993 ; Yang et al, 2018 ; Liu et al, 2019a ). After training, the set distance can be used to reconstruct the point pair features.…”

Section: Complete Overlap Point Cloud Registrationmentioning

confidence: 99%

A review of rigid point cloud registration based on deep learning

Chen,

Feng,

et al. 2024

Front. Neurorobot.

View full text Add to dashboard Cite

With the development of 3D scanning devices, point cloud registration is gradually being applied in various fields. Traditional point cloud registration methods face challenges in noise, low overlap, uneven density, and large data scale, which limits the further application of point cloud registration in actual scenes. With the above deficiency, point cloud registration methods based on deep learning technology gradually emerged. This review summarizes the point cloud registration technology based on deep learning. Firstly, point cloud registration based on deep learning can be categorized into two types: complete overlap point cloud registration and partially overlapping point cloud registration. And the characteristics of the two kinds of methods are classified and summarized in detail. The characteristics of the partially overlapping point cloud registration method are introduced and compared with the completely overlapping method to provide further research insight. Secondly, the review delves into network performance improvement summarizes how to accelerate the point cloud registration method of deep learning from the hardware and software. Then, this review discusses point cloud registration applications in various domains. Finally, this review summarizes and outlooks the current challenges and future research directions of deep learning-based point cloud registration.

show abstract

Section: Complete Overlap Point Cloud Registrationmentioning

confidence: 99%

A review of rigid point cloud registration based on deep learning

Chen,

Feng,

et al. 2024

Front. Neurorobot.

View full text Add to dashboard Cite

show abstract

“…The most relevant works with cross-domain image (ground camera image and rendered image) patch matching is H-Net [26], H-Net++ [26], SiamAM-Net [1] and AE-GAN-Net [27]. H-Net only performs the binary matching judgments Final triplet of cross-domain image patches.…”

Section: Related Workmentioning

confidence: 99%

“…When capturing camera images, we set up the mobile phone on a handheld gimbal to acquire a more accurate camera pose, which reduces the camera jitter, to capture images. In addition, the same as described in the literature [1] and [27], to better obtain the cross-domain image patch dataset, we have carried out manual supervision, which the ground camera image and rendered image pairs with obvious deviations are discarded.…”

Section: A Datasetmentioning

confidence: 99%

“…On the extra collected 2,000 pairs of cross-domain image patch retrieval benchmark, the TOP1 and TOP5 retrieval accuracy results of DIFD-Net and comparative networks are shown in Table III, which shows that our proposed DIFD-Net achieves the state-of-art retrieval performance. The structure of AE-GAN-Net [27], SiamAM-Net [1] and H-Net++ [26] are similar to the proposed DIFD-Net, but the sampling strategy of non-matching cross-domain image patches is different. Thus, demonstrating the superiority of the embedded sampling strategy, which is to find the closest nonmatching cross-domain image patch pairs.…”

Section: B Performances Of Difdsmentioning

confidence: 99%

See 1 more Smart Citation

Ground Camera Image and Large-Scale 3-D Image-Based Point Cloud Registration Based on Learning Domain Invariant Feature Descriptors

Liu

Lai

Wang

et al. 2021

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

Self Cite

View full text Add to dashboard Cite

Multisource data are captured from different sensors or generated with different generation mechanisms. Ground camera images (images taken from ground-based camera) and rendered images (synthesized by the position information from 3D image-based point cloud) are different-source geospatial data, called cross-domain images. Particularly, in outdoor environments, the registration relationship between the above crossdomain images is available to establish the spatial relationship between 2D and 3D space, which is an indirect solution for virtual-real registration of Augmented Reality (AR). However, the traditional handcrafted feature descriptors cannot match the above cross-domain images because of the low quality of rendered images and the domain gap between cross-domain images. In this paper, inspired by the success achieved by deep learning in computer vision, we first propose an end-to-end network, DIFD-Net, to learn Domain Invariant Feature Descriptors (DIFDs) for cross-domain image patches. The DIFDs are used for crossdomain image patch retrieval to the registration of ground camera and rendered images. Second, we construct a domain-kept consistent loss function, which balances the feature descriptors for narrowing the gap in different domains, to optimize DIFD-Net. Specially, the negative samples are generated from positive during training, and the introduced constraint of intermediate feature maps increases extra supervision information to learn feature descriptors. Finally, experiments show the superiority of DIFDs for the retrieval of cross-domain image patches, which achieves state-of-the-art retrieval performance. Additionally, we use DIFDs to match ground camera images and rendered images, and verify the feasibility of the derived AR virtual-real registration in open outdoor environments. Index Terms-Domain Invariant Feature Descriptor (DIFD), multisource remote sensing data, cross-domain image, image patch matching, augmented reality, virtual-real registration.

show abstract

“…AR can enable users to experience the real world in which virtual objects and real objects coexist, and interact with them in the real time. In the past two decades, AR application has been a trending research topic in many areas, such as education, entertainment, medicine, and industry [2,3]. Volkswagen intended to use AR to compare the calculated crash test imagery with the actual case [4].…”

Section: Introductionmentioning

confidence: 99%

Augmented Reality Assisted Assembly Training Oriented Dynamic Gesture Recognition and Prediction

2021

View full text Add to dashboard Cite

Augmented reality assisted assembly training (ARAAT) is an effective and affordable technique for labor training in the automobile and electronic industry. In general, most tasks of ARAAT are conducted by real-time hand operations. In this paper, we propose an algorithm of dynamic gesture recognition and prediction that aims to evaluate the standard and achievement of the hand operations for a given task in ARAAT. We consider that the given task can be decomposed into a series of hand operations and furthermore each hand operation into several continuous actions. Then, each action is related with a standard gesture based on the practical assembly task such that the standard and achievement of the actions included in the operations can be identified and predicted by the sequences of gestures instead of the performance throughout the whole task. Based on the practical industrial assembly, we specified five typical tasks, three typical operations, and six standard actions. We used Zernike moments combined histogram of oriented gradient and linear interpolation motion trajectories to represent 2D static and 3D dynamic features of standard gestures, respectively, and chose the directional pulse-coupled neural network as the classifier to recognize the gestures. In addition, we defined an action unit to reduce the dimensions of features and computational cost. During gesture recognition, we optimized the gesture boundaries iteratively by calculating the score probability density distribution to reduce interferences of invalid gestures and improve precision. The proposed algorithm was evaluated on four datasets and proved to increase recognition accuracy and reduce the computational cost from the experimental results.

show abstract

AE-GAN-Net: Learning Invariant Feature Descriptor to Match Ground Camera Images and a Large-Scale 3D Image-Based Point Cloud for Outdoor Augmented Reality

Cited by 8 publications

References 50 publications

A review of rigid point cloud registration based on deep learning

A review of rigid point cloud registration based on deep learning

Ground Camera Image and Large-Scale 3-D Image-Based Point Cloud Registration Based on Learning Domain Invariant Feature Descriptors

Augmented Reality Assisted Assembly Training Oriented Dynamic Gesture Recognition and Prediction

Contact Info

Product

Resources

About