2001
DOI: 10.1142/s0219467801000256
|View full text |Cite
|
Sign up to set email alerts
|

Learning Structured Visual Detectors From User Input at Multiple Levels

Abstract: In this paper, we propose a new framework for the dynamic construction of structured visual object/scene detectors for content-based retrieval. In the Visual Apprentice, a user defines visual object/scene models via a multiple-level Definition Hierarchy: a scene consists of objects, which consist of object-parts, which consist of perceptual-areas, which consist of regions. The user trains the system by providing example images/videos and labeling components according to the hierarchy she defines (e.g., image o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2004
2004
2008
2008

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(14 citation statements)
references
References 40 publications
0
14
0
Order By: Relevance
“…One advantage of such rule-based approach is the ease to insert, delete, and modify the existing rules when the nature of video classes changes. Since the underlying video making rules are used for detecting the semantic video events, the rule-based approach is only attractive for some specific domains such as news video and film which have well-defined story structure for the semantic video events [Adams et al 2003;Naphade and Huang 2001;Jaimes and Chang 2001;Greenspan et al 2004;Qi et al 2003;Gatica-Perez et al 2003;Ma et al 2002;Smith and Kanade 1995;Arman et al 1994;He et al 1999;Sundaram et al 2002;Chang 2002;Sundaram and Chang 2002b]. On the other hand, surgery education videos do not have well-defined video making rules that can be used to generate the semantic video events, thus it is very important to develop new techniques that are able to achieve more effective video classification and concept-oriented video summarization and skimming.…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
See 2 more Smart Citations
“…One advantage of such rule-based approach is the ease to insert, delete, and modify the existing rules when the nature of video classes changes. Since the underlying video making rules are used for detecting the semantic video events, the rule-based approach is only attractive for some specific domains such as news video and film which have well-defined story structure for the semantic video events [Adams et al 2003;Naphade and Huang 2001;Jaimes and Chang 2001;Greenspan et al 2004;Qi et al 2003;Gatica-Perez et al 2003;Ma et al 2002;Smith and Kanade 1995;Arman et al 1994;He et al 1999;Sundaram et al 2002;Chang 2002;Sundaram and Chang 2002b]. On the other hand, surgery education videos do not have well-defined video making rules that can be used to generate the semantic video events, thus it is very important to develop new techniques that are able to achieve more effective video classification and concept-oriented video summarization and skimming.…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
“…He et al [1999] have also incorporated audio-visual information to summarize the audio-video presentations. To integrate the high-level video semantics for video summarization and skimming, hidden Markov models and Gaussian mixture models have been investigated widely for detecting semantic video scenes [Liu et al 1998;Ekin et al 2003;Xie et al 2003;Dimitrova et al 2000;Adams et al 2003;Naphade and Huang 2001;Jaimes and Chang 2001;Greenspan et al 2004;Qi et al 2003;Gatica-Perez et al 2003]. Naphade and Huang [2001] and Adams et al [2003] have also investigated the techniques for hierarchical video classification.…”
Section: Related Work and Our Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Many methods rely uniquely on perceptual features such as color histogram [4][9] [11]; whereas few also consider text features from annotations or captions [7] [8]. There are some approaches that only use individual classifiers or joint distributions [1]; while others combine multiple classifiers for improved accuracy [4][9] [11].…”
Section: Introductionmentioning
confidence: 99%
“…Finally, experts handpick the classes in many methods [11] [7][8] [9] to which the classifiers are often fine-tuned. Exceptions are frameworks where "expert" users define their own classes and relations [4], and approaches that associate words annotating images to new images or regions [1]. The most similar prior work is [7] and [9], which learn Bayesian Networks (BNs) with classifiers as nodes.…”
Section: Introductionmentioning
confidence: 99%