Finding correspondences between images via local descriptors is one of the most extensively studied problems in computer vision due to the wide range of applications. Recently, end-to-end learnt descriptors [1,2,3] based on Convolutional Neural Network (CNN) architectures and training on large datasets have demonstrated to significantly outperform state of the art features. These works are focused on exploiting pairs of positive and negative patches to learn discriminative representations.Recent work on deep learning for learning feature embeddings examines the use of triplets of samples instead of pairs. In this paper we investigate the use of triplets in learning local feature descriptors with CNNs and we propose a novel in-triplet hard negative mining step to achieve a more effective training and better descriptors. Our method reaches state of the art results without the computational overhead typically associated with mining of negatives and with lower complexity of the network architecture. This is a significant advantage over previous CNN-based descriptors since makes our proposal suitable for practical problems involving large datasets.Learning with triplets involves training from samples of the form {a a a, p p p, n n n}, where a a a is the anchor, p p p is a positive example, which is a different sample of the same class as a a a, and n n n is a negative example, belonging to a different class than a a a. In our case, a a a and p p p are different viewpoints of the same physical point, and n n n comes from a different keypoint. The goal is to learn the embedding f (x x x) s.t. δ + = || f (a a a) − f (p p p)|| 2 is low (i.e., the network brings a a a and p p p close in the feature space) and δ − = || f (a a a) − f (n n n)|| 2 is high (i.e., the network pushes the descriptors of a a a and n n n far apart). With this aim, we examine two different loss functions for triplet based-learning: the margin ranking loss and the ratio loss. The margin ranking loss is defined aswhere µ is an arbitrarily set margin. It measures the violation of the ranking order of the embedded features inside the triplet, which should be δ − > δ + + µ. If that is not the case, then the network adjusts its weights to achieve this result. For its part, the ratio loss optimises the ratio distances within triplets. It learns embeddings such thatThe goal of this loss function is to force ( e δ + e δ + +e δ − ) 2 to 0, and ( e δ − e δ + +e δ − ) 2 to 1. There is no margin associated with this loss, and by definition we have 0 ≤λ ≤ 1 for all values of δ − , δ + . Fig. 1 illustrates both approaches and their loss surface. In λ (δ + , δ − ) the loss remains 0 until the margin is violated, and after that, there is a linear increase not upper bounded. In contrast,λ (δ + , δ − ) has a clear slope between the two loss levels, and the loss reaches a 1-valued plateau quickly when δ − > δ + .All previous proposals based on triplet based learning use only two of the possible three distances within each triplet, ignoring the distance δ − = || f (p p ...
Edge detection is the basis of many computer vision applications. State of the art predominantly relies on deep learning with two decisive factors: dataset content and network's architecture. Most of the publicly available datasets are not curated for edge detection tasks. Here, we offer a solution to this constraint. First, we argue that edges, contours and boundaries, despite their overlaps, are three distinct visual features requiring separate benchmark datasets. To this end, we present a new dataset of edges. Second, we propose a novel architecture, termed Dense Extreme Inception Network for Edge Detection (DexiNed), that can be trained from scratch without any pretrained weights. DexiNed outperforms other algorithms in the presented dataset. It also generalizes well to other datasets without any fine-tuning. The higher quality of DexiNed is also perceptually evident thanks to the sharper and finer edges it outputs.
This work presents Kornia, an open source computer vision library built upon a set of differentiable routines and modules that aims to solve generic computer vision problems. The package uses PyTorch as its main backend, not only for efficiency but also to take advantage of the reverse autodifferentiation engine to define and compute the gradient of complex functions. Inspired by OpenCV, Kornia is composed of a set of modules containing operators that can be integrated into neural networks to train models to perform a wide range of operations including image transformations, camera calibration, epipolar geometry, and low level image processing techniques, such as filtering and edge detection that operate directly on high dimensional tensor representations on graphical processing units, generating faster systems. Examples of classical vision problems implemented using our framework are provided including a benchmark comparing to existing vision libraries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.