2012
DOI: 10.5402/2012/376804
|View full text |Cite
|
Sign up to set email alerts
|

Bag-of-Words Representation in Image Annotation: A Review

Abstract: Content-based image retrieval (CBIR) systems require users to query images by their low-level visual content; this not only makes it hard for users to formulate queries, but also can lead to unsatisfied retrieval results. To this end, image annotation was proposed. The aim of image annotation is to automatically assign keywords to images, so image retrieval users are able to query images by keywords. Image annotation can be regarded as the image classification problem: that images are represented by some low-l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
84
0
3

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 149 publications
(87 citation statements)
references
References 108 publications
0
84
0
3
Order By: Relevance
“…In the BoW approach the SURF descriptor is additionally used to describe the salient points. This is because SURF is a well-proven point descriptor (local descriptor), widely used in BoW-based image classification processes [48]. Furthermore, SURF descriptors are based on wavelet responses, which also describe the image region in terms of textures, similar to HOG and Gabor feature descriptors.…”
Section: (B) Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…In the BoW approach the SURF descriptor is additionally used to describe the salient points. This is because SURF is a well-proven point descriptor (local descriptor), widely used in BoW-based image classification processes [48]. Furthermore, SURF descriptors are based on wavelet responses, which also describe the image region in terms of textures, similar to HOG and Gabor feature descriptors.…”
Section: (B) Feature Extractionmentioning
confidence: 99%
“…Numerous feature encoding methods have been reported for visual word dictionary construction [49]. We adopted the most commonly used iterative k-means clustering algorithm [48]. The obtained feature vector is clustered into k clusters using the iterative k-means clustering [50].…”
Section: (C) Visual Words Dictionary Constructionmentioning
confidence: 99%
“…For image analysis, a visual analogue of a word is used in the BoW model, which is based on the vector quantization process by clustering low-level visual features of local regions or points, such as color, texture, and so forth [18]. Representation of it uses image patches as visual words.…”
Section: Bag Of Featuresmentioning
confidence: 99%
“…SIFT and SURF can be used in image categorization. Generally, [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24] better performance and efficiency of training and classification depend on better representation and clustering of features.…”
Section: Introductionmentioning
confidence: 99%
“…In video activity recognition literature spatial information is often captured by various local space-time features as defined in [2], [3], [4], [5], [6], [7], [8], [9], [10], [11] and [12]. These local space-time features capture frame-wise spatial information by first detecting interest points with either interest point detectors (Harris detector, Hessian detectors, edge detector, corner detectors) or various sampling methods (dense sampling [13] or motion adaptive sampling [14]) for each frame, then spatio-temporal regions are defined around all the detected points in each frame and finally the spatio-temporal regions are described using one of the local space-time features.…”
Section: Introductionmentioning
confidence: 99%