Predicting an Object Location Using a Global Image Representation

Serrano, Jose A. Rodriguez; Larlus, Diane

doi:10.1109/iccv.2013.217

Cited by 3 publications

(1 citation statement)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In non-parametric human parsing, pixels [16], superpixels [24,7,14] and object proposals [26,13,14,22] were used to facilitate non-parametric image parsing. Specifically, the model of Yamaguchi et al [28] transferred parsing masks from retrieved examples to the query image.…”

Section: Related Workmentioning

confidence: 99%

Matching-CNN meets KNN: Quasi-parametric human parsing

Liu

Liang

Liu

et al. 2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

134

View full text Add to dashboard Cite

Both parametric and non-parametric approaches have demonstrated encouraging performances in the human parsing task, namely segmenting a human image into several semantic regions (e.g., hat, bag, left arm, face). In this work, we aim to develop a new solution with the advantages of both methodologies, namely supervision from annotated data and the flexibility to use newly annotated (possibly uncommon) images, and present a quasi-parametric human parsing model. Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image. Given a testing image, we first retrieve its KNN images from the annotated/manually-parsed human image corpus. Then each semantic region in each KNN image is matched with confidence to the testing image using M-CNN, and the matched regions from all KNN images are further fused, followed by a superpixel smoothing procedure to obtain the ultimate human parsing result. The M-CNN differs from the classic CNN [12] in that the tailored cross image matching filters are introduced to characterize the matching between the testing image and the semantic region of a KNN image. The cross image matching filters are defined at different convolutional layers, each aiming to capture a particular range of displacements. Comprehensive evaluations over a large dataset with 7,700 annotated human images well demonstrate the significant performance gain from the quasi-parametric model over the state-of-the-arts [28,29], for the human parsing task.

show abstract