2018
DOI: 10.1007/978-3-030-01234-2_37
|View full text |Cite
|
Sign up to set email alerts
|

Statistically-Motivated Second-Order Pooling

Abstract: Second-order pooling, a.k.a. bilinear pooling, has proven effective for deep learning based visual recognition. However, the resulting second-order networks yield a final representation that is orders of magnitude larger than that of standard, first-order ones, making them memory-intensive and cumbersome to deploy. Here, we introduce a general, parametric compression strategy that can produce more compact representations than existing compression techniques, yet outperform both compressed and uncompressed seco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(23 citation statements)
references
References 41 publications
(108 reference statements)
0
23
0
Order By: Relevance
“…Four datasets are used: Describing Texture Dataset (DTD) (Cimpoi et al 2014), MINC-2500 (MINC) (Bell et al 2015), MIT-Indoor (Indoor) (Quattoni and Torralba 2009), and Caltech-UCSD Bird (CUB) (Xie et al 2013), which are the texture dataset, the material dataset, the indoor scene dataset, and the finegrained dataset, respectively. Following the work in (Yu and Salzmann 2018), the size of input images in DTD, Indoor, and CUB is 448 × 448, and the size of input images in MINC is 224 × 224. We use the VGG-16 network as the backbone, and layers after the conv5-3 layer are removed.…”
Section: Evaluation On the Image Classification Taskmentioning
confidence: 99%
“…Four datasets are used: Describing Texture Dataset (DTD) (Cimpoi et al 2014), MINC-2500 (MINC) (Bell et al 2015), MIT-Indoor (Indoor) (Quattoni and Torralba 2009), and Caltech-UCSD Bird (CUB) (Xie et al 2013), which are the texture dataset, the material dataset, the indoor scene dataset, and the finegrained dataset, respectively. Following the work in (Yu and Salzmann 2018), the size of input images in DTD, Indoor, and CUB is 448 × 448, and the size of input images in MINC is 224 × 224. We use the VGG-16 network as the backbone, and layers after the conv5-3 layer are removed.…”
Section: Evaluation On the Image Classification Taskmentioning
confidence: 99%
“…Li et al [32] introduced a new structure to aggregate multiscale deep features to enhance feature representation ability and speed up experiment process for real-time semantic segmentation. To address these problem, Yu and Salzmann [33] proposed a parametric compression strategy to produce more compact representations than previous compression tactics. Gao et al [34] proposed NDDR layer to fuse single-task features by layerwise feature fusion for multitask feature learning.…”
Section: B Feature Fusionmentioning
confidence: 99%
“…However, using only first-order or second-order feature statistics as channel descriptor is limited in representing the global distribution of channel-wise feature responses, thus hindering the representational ability of CNNs. Meanwhile, recent works [3,4] have also shown that higher-order statistics are also helpful to improve discriminative ability of CNNs.…”
Section: Introductionmentioning
confidence: 99%