DeepSFM: Structure from Motion via Deep Bundle Adjustment

Wei, Xingkui; Zhang, Yinda; Li, Zhuwen; Fu, Yanwei; Xue, Xiangyang

doi:10.1007/978-3-030-58452-8_14

Cited by 72 publications

(45 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many researchers have introduced deep learning theory into this technical framework in recent years and achieved good results. Typical examples include MVSNET [23], DeepSFM [24], and R-MVSNet [25]. The issue of a slow reconstruction speed was addressed and optimized using the CNN network by Xiang et al [26].…”

Section: Related Workmentioning

confidence: 99%

A 3D Reconstruction Framework of Buildings Using Single Off-Nadir Satellite Image

et al. 2021

View full text Add to dashboard Cite

A novel framework for 3D reconstruction of buildings based on a single off-nadir satellite image is proposed in this paper. Compared with the traditional methods of reconstruction using multiple images in remote sensing, recovering 3D information that utilizes the single image can reduce the demands of reconstruction tasks from the perspective of input data. It solves the problem that multiple images suitable for traditional reconstruction methods cannot be acquired in some regions, where remote sensing resources are scarce. However, it is difficult to reconstruct a 3D model containing a complete shape and accurate scale from a single image. The geometric constraints are not sufficient as the view-angle, size of buildings, and spatial resolution of images are different among remote sensing images. To solve this problem, the reconstruction framework proposed consists of two convolutional neural networks: Scale-Occupancy-Network (Scale-ONet) and model scale optimization network (Optim-Net). Through reconstruction using the single off-nadir satellite image, Scale-Onet can generate water-tight mesh models with the exact shape and rough scale of buildings. Meanwhile, the Optim-Net can reduce the error of scale for these mesh models. Finally, the complete reconstructed scene is recovered by Model-Image matching. Profiting from well-designed networks, our framework has good robustness for different input images, with different view-angle, size of buildings, and spatial resolution. Experimental results show that an ideal reconstruction accuracy can be obtained both on the model shape and scale of buildings.

show abstract

Section: Related Workmentioning

confidence: 99%

A 3D Reconstruction Framework of Buildings Using Single Off-Nadir Satellite Image

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Future work may improve upon our findings by explicitly identifying the regions which exhibit aleatoric uncertainties. Exploring additional constraints beyond image synthesis (such as geometric information [39]) for training depth estimation networks and causal analysis [27] of the aleatoric uncertainties and divergence are of interest for future work.…”

Section: Conclusion and Limitationsmentioning

confidence: 99%

On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation

Li,

Drenkow,

Ding

et al. 2021

Preprint

View full text Add to dashboard Cite

Scene depth estimation from stereo and monocular imagery is critical for extracting 3D information for downstream tasks such as scene understanding. Recently, learning-based methods for depth estimation have received much attention due to their high performance and flexibility in hardware choice. However, collecting ground truth data for supervised training of these algorithms is costly or outright impossible. This circumstance suggests a need for alternative learning approaches that do not require corresponding depth measurements. Indeed, self-supervised learning of depth estimation provides an increasingly popular alternative. It is based on the idea that observed frames can be synthesized from neighboring frames if accurate depth of the scene is known -or in this case, estimated. We show empirically that -contrary to common belief -improvements in image synthesis do not necessitate improvement in depth estimation. Rather, optimizing for image synthesis can result in diverging performance with respect to the main prediction objective -depth. We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data. Based on our experiments on four datasets (spanning street, indoor, and medical) and five architectures (monocular and stereo), we conclude that this diverging phenomenon is independent of the dataset domain and not mitigated by commonly used regularization techniques. To underscore the importance of this finding, we include a survey of methods which use image synthesis, totaling 127 papers over the last six years. This observed divergence has not been previously reported or studied in depth, suggesting room for future improvement of self-supervised approaches which might be impacted the finding.

show abstract

“…However, there still exist problems that need to be further solved such as time efficiency. Recently, deep learning techniques have made a great impact in the field of computer vision and show an advantage in accuracy and efficiency [12].More recently, more and more works are exploring to exploit the deep learning techniques to help improve the SfM task [13,14,15] performance on efficiency and accuracy. When applied to the SfM task, the advantage of the deep learning-based techniques is proven on the efficiency [15], compared with the traditional SfM methods.…”

Section: Related Workmentioning

confidence: 99%

“…Recently, deep learning techniques have made a great impact in the field of computer vision and show an advantage in accuracy and efficiency [12].More recently, more and more works are exploring to exploit the deep learning techniques to help improve the SfM task [13,14,15] performance on efficiency and accuracy. When applied to the SfM task, the advantage of the deep learning-based techniques is proven on the efficiency [15], compared with the traditional SfM methods. However, the disadvantage is also found out for the low robustness and accuracy under varying environment [16], due to the high reliability of deep learning on the image data distribution which makes the deep model hard to be generalized to different settings.…”

Section: Related Workmentioning

confidence: 99%

A Deep Learning Method for Frame Selection in Videos for Structure from Motion Pipelines

Banterle

Gong

Corsini

et al. 2021

2021 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

Structure-from-Motion (SfM) using the frames of a video sequence can be a challenging task because there is a lot of redundant information, the computational time increases quadratically with the number of frames, there would be low-quality images (e.g., blurred frames) that can decrease the final quality of the reconstruction, etc. To overcome all these issues, we present a novel deep-learning architecture that is meant for speeding up SfM by selecting frames using predicted sub-sampling frequency. This architecture is general and can learn/distill the knowledge of any algorithm for selecting frames from a video for generating high-quality reconstructions. One key advantage is that we can run our architecture in real-time saving computations while keeping high-quality results.

show abstract

DeepSFM: Structure from Motion via Deep Bundle Adjustment

Cited by 72 publications

References 42 publications

A 3D Reconstruction Framework of Buildings Using Single Off-Nadir Satellite Image

A 3D Reconstruction Framework of Buildings Using Single Off-Nadir Satellite Image

On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation

A Deep Learning Method for Frame Selection in Videos for Structure from Motion Pipelines

Contact Info

Product

Resources

About