Procedings of the British Machine Vision Conference 2016 2016
DOI: 10.5244/c.30.119
|View full text |Cite
|
Sign up to set email alerts
|

Learning local feature descriptors with triplets and shallow convolutional neural networks

Abstract: Finding correspondences between images via local descriptors is one of the most extensively studied problems in computer vision due to the wide range of applications. Recently, end-to-end learnt descriptors [1,2,3] based on Convolutional Neural Network (CNN) architectures and training on large datasets have demonstrated to significantly outperform state of the art features. These works are focused on exploiting pairs of positive and negative patches to learn discriminative representations.Recent work on deep l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
372
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
5
2
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 428 publications
(400 citation statements)
references
References 3 publications
3
372
0
Order By: Relevance
“…While the Oxford dataset contains images that are all captured by a camera, Generated Matching dataset [14] is obtained by generating images using synthetic transformations, and contains 16 sequences of 48 images. However, the synthetic nature of the transformations does not model all noise that typically occurs in the capturing process, thus making this data less challenging than the Oxford data [4]. The DTU Robots dataset [1] contains real images of 3D objects, captured using a robotic arm in controlled laboratory conditions, which is suitable for certain application scenarios but of limited diversity in the data.…”
Section: Image-based Benchmarksmentioning
confidence: 99%
“…While the Oxford dataset contains images that are all captured by a camera, Generated Matching dataset [14] is obtained by generating images using synthetic transformations, and contains 16 sequences of 48 images. However, the synthetic nature of the transformations does not model all noise that typically occurs in the capturing process, thus making this data less challenging than the Oxford data [4]. The DTU Robots dataset [1] contains real images of 3D objects, captured using a robotic arm in controlled laboratory conditions, which is suitable for certain application scenarios but of limited diversity in the data.…”
Section: Image-based Benchmarksmentioning
confidence: 99%
“…The interest in CNNs based descriptors started from results shown in [29] that the features from the last layer of a convolutional deep network trained on ImageNet can outperform SIFT even though the networks were not specifically optimized for such local representations. End-to-end learning of patch descriptors using Siamese networks and the hinge contrastive loss [30], [31], [32] has recently been re-attempted in several works [29], [33], [34], [35], [36] and consistent improvements were reported over the state of the art descriptors in terms of matching performance. However, their efficiency is still far behind the traditional engineered descriptors and further progress has to be made to make their applications possible.…”
Section: Related Workmentioning
confidence: 99%
“…Our work is based on the recent success of the triplet network presented in [3], named PN-Net, but adapted to work with cross-spectral image pairs, where for each matching pair, there are two possible non-matching patches; one for each spectrum. Results show that our technique is useful for learning cross-spectral feature descriptors that can be used as drop-in replacements of SIFT-like features descriptors.…”
Section: Mp2mentioning
confidence: 99%