Fusing Object Semantics and Deep Appearance Features for Scene Recognition

Sun, Ning; Li, Wenli; Liu, Jixin; Han, Guang; Wu, Cong

doi:10.1109/tcsvt.2018.2848543

Cited by 44 publications

(26 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All the methods listed in Table I use CNN: Adi-Red [13], CCM [9], CNN-SMN [14], and SOSF + DFA + GAF [20] used information of the objects which appear in scene images. To obtain the object information, they used the CNN pre-trained on the object recognition dataset.…”

Section: Experimental Results On the Placesmentioning

confidence: 99%

“…When a mixed CCM-CCG model is used, our FOSNet achieves state-of-the-art accuracy of 60.14% on the Places 2, and it is the first time that the accuracy exceeds 60% on the dataset. [1] 56.2 Gaze Shifting-CNN+SVM [19] 56.2 MetaObject-CNN [15] 58.11 Places365-VGG-SVM [28] 63.24 Three [5] 70.17 Hybrid CNN [21] 70.69 Sparse Representation [23] 71.08 Multi-Resolution CNNs [7] 72.0 CNN-SMN [14] 72.6 PatchNet [22] 73.0 SDO [6] 73.41 Adi-Red [13] 73.59 SOSF+CFA+GAF [20] 78…”

Section: Experimental Results On the Placesmentioning

confidence: 99%

“…For a backbone network, the SE-ResNeXt-101 model, which is a combination of ResNeXt [27] with SE-Network [3], was used for Object-Net and PlacesNet in FOSNet. The standard 10-crop testing method [7] is used for comparison with other methods, and an Adi-Red [13] 41.87 -Places365-VGG [28] 55.24 -CCM [9] 56.82 86.92 CNN-SMN [14] 57.1 -SOSF+CFA+GAF [20] 57.27 -Multi-Resolution CNNs [7] 58.3 87.3 Places2-365-CNN [43] 58.93 88.52 SE-Resnet-152 [3] 59 evaluation measurement is the average classification accuracy of 10 crops.…”

Section: B Implementation Detailsmentioning

confidence: 99%

“…RBoW [18] 37.93 DPM+GIST+SP [17] 43.1 Adi-Red [13] 73.59 Gaze Shifting-CNN+SVM [19] 75.1 ResNet-152-DFT + [4] 76.5 Places365-VGG-SVM [28] 76.53 DAG-CNN [1] 77.5 MetaObject-CNN [15] 78.9 VS-CNN [16] 80.37 Hybrid CNN [21] 85.97 Three [5] 86.04 PatchNet [22] 86.2 CNN-SMN [14] 86.5 Multi-Resolution CNNs [7] 86.7 SDO [6] 86.76 Sparse Representation [23] 87.22 SOSF+CFA+GAF [20] 89 computationally very expensive.…”

Section: Accuracy (100%)mentioning

confidence: 99%

See 3 more Smart Citations

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

2020

View full text Add to dashboard Cite

Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-ofthe-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second highest performance of 77.28% is obtained on SUN 397.

show abstract

Section: Experimental Results On the Placesmentioning

confidence: 99%

Section: Experimental Results On the Placesmentioning

confidence: 99%

Section: B Implementation Detailsmentioning

confidence: 99%

Section: Accuracy (100%)mentioning

confidence: 99%

See 2 more Smart Citations

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

2020

View full text Add to dashboard Cite

show abstract

“…Table 5 compares PulseNetOne to the related work in this area, and it can be seen that both networks pruned by PulseNetOne outperform the state-of-the-art by over 6%. FOSNet CCG [30] and SOSF+CFA+GAF [50] were the current best published results on the MIT67 dataset, achieving 90.37% and 89.51% respectively, but were significantly beaten by PulseNetOne. Figure 4 shows that AlexNet almost achieved its theoretical performance on all experiments except for CPU inference timing, while the pruned network was approximately 3× faster than the original network.…”

Section: Methods Year Accuracymentioning

confidence: 90%

PulseNetOne: Fast Unsupervised Pruning of Convolutional Neural Networks for Remote Sensing

2020

View full text Add to dashboard Cite

Scene classification is an important aspect of image/video understanding and segmentation. However, remote-sensing scene classification is a challenging image recognition task, partly due to the limited training data, which causes deep-learning Convolutional Neural Networks (CNNs) to overfit. Another difficulty is that images often have very different scales and orientation (viewing angle). Yet another is that the resulting networks may be very large, again making them prone to overfitting and unsuitable for deployment on memory- and energy-limited devices. We propose an efficient deep-learning approach to tackle these problems. We use transfer learning to compensate for the lack of data, and data augmentation to tackle varying scale and orientation. To reduce network size, we use a novel unsupervised learning approach based on k-means clustering, applied to all parts of the network: most network reduction methods use computationally expensive supervised learning methods, and apply only to the convolutional or fully connected layers, but not both. In experiments, we set new standards in classification accuracy on four remote-sensing and two scene-recognition image datasets.

show abstract

Efficient Traffic Signs Recognition Based on CNN Model for Self-Driving Cars

Gadri

Adouane

2022

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

Fusing Object Semantics and Deep Appearance Features for Scene Recognition

Cited by 44 publications

References 44 publications

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition

PulseNetOne: Fast Unsupervised Pruning of Convolutional Neural Networks for Remote Sensing

Efficient Traffic Signs Recognition Based on CNN Model for Self-Driving Cars

Contact Info

Product

Resources

About