Advances in Neural Information Processing Systems 19 2007
DOI: 10.7551/mitpress/7503.003.0057
|View full text |Cite
|
Sign up to set email alerts
|

Image Retrieval and Classification Using Local Distance Functions

Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
2

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 77 publications
(7 citation statements)
references
References 17 publications
0
5
0
2
Order By: Relevance
“…In early cross-modal retrieval, employing Canonical Correlation Analysis (CCA) [5] to build a shared space was the primary method to address cross-modal alignment, aiming to maximize the co-occurrence correlation between different modal features [6]- [10]. Following that, researchers proposed a series of methods based on contrast [11], triplet [12], and ranking [13] metric losses, aiming to better learn the cross-modal common feature space. Building on this, VSE++ [14] utilizes hard sample mining, PCME [15] transforms deterministic features into probabilistic ones, and DiVE [16] introduces discrete prompts to enhance the extraction of modal features, further strengthening the alignment capability and robustness of the common space.…”
Section: A Cross-modal Retrievalmentioning
confidence: 99%
“…In early cross-modal retrieval, employing Canonical Correlation Analysis (CCA) [5] to build a shared space was the primary method to address cross-modal alignment, aiming to maximize the co-occurrence correlation between different modal features [6]- [10]. Following that, researchers proposed a series of methods based on contrast [11], triplet [12], and ranking [13] metric losses, aiming to better learn the cross-modal common feature space. Building on this, VSE++ [14] utilizes hard sample mining, PCME [15] transforms deterministic features into probabilistic ones, and DiVE [16] introduces discrete prompts to enhance the extraction of modal features, further strengthening the alignment capability and robustness of the common space.…”
Section: A Cross-modal Retrievalmentioning
confidence: 99%
“…We expect the distances of these vectors to each other to become increasingly meaningful in the context of our overarching prediction task as we train the actual prediction model f NN (refs. [31][32][33][34][35] ). As our encoders are modules of our prediction model, they are automatically trained each time we apply backpropagation on f NN through stochastic gradient descent.…”
Section: Embedded Feature Vectorsmentioning
confidence: 99%
“…Before the invention of class activation mapping, researchers proposed to reduce patch distance (Zuo et al, 2014;Frome et al, 2006) to improve classification results. Forme et al (Frome et al, 2006) NNs face both underfitting and overfitting issues based on the size of the dataset and the NN structure. In the proposed method, we can adjust the background data.…”
Section: Improvement With Background Classmentioning
confidence: 99%