2008
DOI: 10.1007/978-3-540-88693-8_33
|View full text |Cite
|
Sign up to set email alerts
|

Scene Discovery by Matrix Factorization

Abstract: Abstract. What constitutes a scene? Defining a meaningful vocabulary for scene discovery is a challenging problem that has important consequences for object recognition. We consider scenes to depict correlated objects and present visual similarity. We introduce a max-margin factorization model that finds a low dimensional subspace with high discriminative power for correlated annotations. We postulate this space should allow us to discover a large number of scenes in unsupervised data; we show scene discrimina… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
57
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 30 publications
(58 citation statements)
references
References 20 publications
1
57
0
Order By: Relevance
“…In contrast, our key insight is that rich contextual information is available in "how" human-provided tags or descriptions are given, specifically in their order, rank, and proximity. Thus, whereas prior work learning representations with accompanying text associates image content with an appropriate word or distribution of words (Qi et al 2009;Makadia et al 2008;Quattoni et al 2007;Loeff and Farhadi 2008;Bekkerman and Jeon 2007;Hardoon and ShaweTaylor 2003;Yakhnenko and Honavar 2009), the semantic space we discover also preserves the relative significance of the objects present. We expect this difference to be most useful in retrieval applications where one wants to access scenes that are perceptually similar though not visually identical, or in auto-tagging applications where very compact focused descriptions are required.…”
Section: Related Workmentioning
confidence: 95%
See 1 more Smart Citation
“…In contrast, our key insight is that rich contextual information is available in "how" human-provided tags or descriptions are given, specifically in their order, rank, and proximity. Thus, whereas prior work learning representations with accompanying text associates image content with an appropriate word or distribution of words (Qi et al 2009;Makadia et al 2008;Quattoni et al 2007;Loeff and Farhadi 2008;Bekkerman and Jeon 2007;Hardoon and ShaweTaylor 2003;Yakhnenko and Honavar 2009), the semantic space we discover also preserves the relative significance of the objects present. We expect this difference to be most useful in retrieval applications where one wants to access scenes that are perceptually similar though not visually identical, or in auto-tagging applications where very compact focused descriptions are required.…”
Section: Related Workmentioning
confidence: 95%
“…A number of learning strategies have been explored in the literature along these lines, including variants of metric learning (Qi et al 2009;Makadia et al 2008), transfer learning (Quattoni et al 2007), matrix factorization (Loeff and Farhadi 2008), and random field models (Bekkerman and Jeon 2007). Most relevant to our method, some previous work has specifically considered KCCA for this purpose (Hardoon and Shawe-Taylor 2003;Yakhnenko and Honavar 2009;Blaschko and Lampert 2008).…”
Section: Related Workmentioning
confidence: 99%
“…Our approach is inspired by the hypothesis speculated in [14] that there exists a latent low-dimensional feature space that are shared by classifiers for different tags. In [14], a low-rank matrix factorization approach is used to exploit this low-dimensional space. In our work, we directly use the latent topics as the low-dimensional space shared by tag classifiers.…”
Section: Mmlda Amentioning
confidence: 99%
“…This is because MTL would be able to estimate a latent feature representation shared across all views. While MTL is well known to vision [20,13,16,31], it has never been used for view-invariant action recognition.…”
Section: Introductionmentioning
confidence: 99%