2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.00828
|View full text |Cite
|
Sign up to set email alerts
|

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features

Abstract: In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
864
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 979 publications
(869 citation statements)
references
References 63 publications
(195 reference statements)
4
864
0
1
Order By: Relevance
“…This departs from the traditional detectthen-describe process by selecting points after describing them, but it is simple to train and has shown good results on standard benchmarks [18]. Note that D typically ranges from 512 to 2048 in the last layers of the CNN, hence the PCA reduction to D = 40 Dusmanu et al [6] expanded this work with a detect-and-describe approach where they enforce keypoint repeatability and descriptor robustness using a Structure-from-Motion dataset with correponding points on different images.…”
Section: Local Methodsmentioning
confidence: 99%
“…This departs from the traditional detectthen-describe process by selecting points after describing them, but it is simple to train and has shown good results on standard benchmarks [18]. Note that D typically ranges from 512 to 2048 in the last layers of the CNN, hence the PCA reduction to D = 40 Dusmanu et al [6] expanded this work with a detect-and-describe approach where they enforce keypoint repeatability and descriptor robustness using a Structure-from-Motion dataset with correponding points on different images.…”
Section: Local Methodsmentioning
confidence: 99%
“…SuperPoint [9] learns scale invariance at the descriptor level, which works for visual odometry but breaks in more generalized problems. D2-Net [10] focuses on difficult imaging conditions and relies on a single network for detection and description. R2D2 [31] ap- plies L2-Net convolutionally while penalizing repeatable but non-discriminative patches.…”
Section: Related Workmentioning
confidence: 99%
“…As a result, the quest for ever-improving local feature descriptors goes on [23, 5, 46, 42, 39, 12, 50, 38, 41, 28, 45, This research was partially funded by Google's Visual Positioning System, the Swiss National Science Foundation, the Natural Sciences and Engineering Research Council of Canada, and by Compute Canada. 19,25,15,24,10,31]. These methods all seek to achieve invariance to small changes in location, orientation, scale, perspective, and illumination, along with imaging artefacts and partial occlusions.…”
Section: Introductionmentioning
confidence: 99%
“…Comprehensive experiments on challenging real-world datasets demonstrate the benefit of our method. Further improvements could be achieved by incorporating CNN-based feature descriptors [11] or hierarchical localization schemes [27].…”
Section: Resultsmentioning
confidence: 99%