3D pose estimation is always an active but challenging task for object detection in remote sensing images. In this paper, we present a new algorithm for predicting an object’s 3D pose in remote sensing images, called Anchor Points Prediction (APP). Compared to previous methods, such as RoI Transform, our object results of the final output can obtain direction information. We predict the object’s multiple feature points based on the neural network to obtain the homograph transformation relationship between object coordinates and image coordinates. The resulting 3D pose can accurately describe the three-dimensional position and attitude of the object. At the same time, we redefine the method I o U A P P for calculating the direction and posture of the object. We tested our algorithm on the HRSC2016 dataset and the DOTA dataset with accuracy rates of 0.863 and 0.701, respectively. The experimental results show that the accuracy of the APP algorithm is significantly improved. At the same time, the algorithm can achieve one-stage prediction, which makes the calculation process easier and more efficient.
Constraints often exist in the high-dimensional data output in object detection, such as the inverse vector {cos , sin } of the two-dimensional object and the attitude quaternion of the threedimensional object. The range of each component of the output value of the traditional neural network is unconstrained, which is difficult to meet the needs of practical problems. To solve this problem, this paper designed the transformation network layer according to the high dimensional space transformation theory and constructed a constrained neural network model to detect the pose of objects from a single aerial image. Firstly, in the yolov3 network structure, according to the size of the three field scales, three scale transformation network layers are added correspondingly to implement the constrained unit quaternion field. Secondly, according to the characteristics of quaternion, we proposed a special loss function. Then a new constrained neural network called the quaternion field pose network (qfiled PoseNet) model is constructed, which can predict the probability field of the object and the contained unit quaternion field respectively. Next, The object's probability field is generated to determine the 2D bounding box of the object, and the unit quaternion field is generated to determine the 3D rotation R. Finally, combining the rotation matrix R and the 2D bounding box of the object to calculate the 3D translation T. We used our method to experiment on DOTA1.5 data set and the HSRC2016 data set respectively. The experimental results show that our method can detect the pose of the object well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.