Remote Sensing Scene Classification by Unsupervised Representation Learning

Lu, Xiaoqiang; Zheng, Xiangtao; Yuan, Yuan

doi:10.1109/tgrs.2017.2702596

Cited by 271 publications

(106 citation statements)

References 51 publications

Supporting

Mentioning

106

Contrasting

Order By: Relevance

“…The compression methods for multisource image/video data are designed from the perspective of image features, which usually mine similarities between image blocks by matching feature points. Moreover, multiscale features for image representation are proposed to extend representation from single payload to multiple payloads, as being proposed in References [35][36][37][38], which is also a way to build relations between multiple data sources. However, computational complexity is high, and the actual correspondence between the selected image block and the coding object is often lacking, which is not conducive to large-area matching.…”

Section: Video Compression Of Multisource Image/video Datamentioning

confidence: 99%

Towards Real-Time Service from Remote Sensing: Compression of Earth Observatory Video Data via Long-Term Background Referencing

Xiao

Zhu

et al. 2018

Remote Sensing

View full text Add to dashboard Cite

City surveillance enables many innovative applications of smart cities. However, the real-time utilization of remotely sensed surveillance data via unmanned aerial vehicles (UAVs) or video satellites is hindered by the considerable gap between the high data collection rate and the limited transmission bandwidth. High efficiency compression of the data is in high demand. Long-term background redundancy (LBR) (in contrast to local spatial/temporal redundancies in a single video clip) is a new form of redundancy common in Earth observatory video data (EOVD). LBR is induced by the repetition of static landscapes across multiple video clips and becomes significant as the number of video clips shot of the same area increases. Eliminating LBR improves EOVD coding efficiency considerably. First, this study proposes eliminating LBR by creating a long-term background referencing library (LBRL) containing high-definition geographically registered images of an entire area. Then, it analyzes the factors affecting the variations in the image representations of the background. Next, it proposes a method of generating references for encoding current video and develops the encoding and decoding framework for EOVD compression. Experimental results show that encoding UAV video clips with the proposed method saved an average of more than 54% bits using references generated under the same conditions. Bitrate savings reached 25-35% when applied to satellite video data with arbitrarily collected reference images. Applying the proposed coding method to EOVD will facilitate remote surveillance, which can foster the development of online smart city applications.

show abstract

Section: Video Compression Of Multisource Image/video Datamentioning

confidence: 99%

Towards Real-Time Service from Remote Sensing: Compression of Earth Observatory Video Data via Long-Term Background Referencing

Xiao

Zhu

et al. 2018

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…To guarantee comparability between the accuracy of the proposed method and those reported in works presented in [2,3,5,[7][8][9][10][11][12][13], the labeled dataset is divided into training and testing sets using a training-testing ratio of 80-20%, and five-fold cross validation is conducted. That is, the labeled image patches are almost equally divided into five non-overlapping groups randomly, with one group used as the testing set and the remaining four groups used as the training set in each fold.…”

Section: Methodsmentioning

confidence: 99%

“…SPCK++ [2] 77.38 OMP-k [7] 81.70 Saliency + SC [13] 82.72 Multilayer learning [10] 89.10 UFL-SC [8] 90.26 Partlets [3] 91.33 Multipath SC [11] 91.95 Quaternion + Q-OMP [9] 92.29 LGF [5] 95.48 Deconvolution + SPM [12] 95.71 SSF-CNN 88.91 SSF-AlexNet 92.43 Table 4 proves that SSF-CNN outperforms methods including SPM-BoVW, OMP-k, and Saliency + SC, and achieves comparable accuracy with multilayer learning. It should be noted that the CNN Remote Sens.…”

Section: Methodsmentioning

confidence: 99%

“…The sparse features extracted from different paths were then concatenated to represent the whole image. Lu et al [12] adopted a shallow weighted deconvolution network to learn a set of feature maps and filters by minimizing the reconstruction error between the input image and the convolution result. Subsequently, they used a spatial pyramid model (SPM) to aggregate features at different scales to maintain the spatial layout of a high spatial resolution (HSR) image scene.…”

Section: Related Workmentioning

confidence: 99%

“…To illustrate the performance of SSF-CNN, we compare it with ten state-of-the-art research studies presented in [2,3,5,[7][8][9][10][11][12][13] respectively. These studies are selected because all of them have been evaluated on the UCMerced dataset under the same training-testing ratio (80-20%).…”

Section: Comparison With State-of-the-art Research On the Ucmerced Damentioning

confidence: 99%

See 2 more Smart Citations

Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters

Wang

Chen

et al. 2018

Remote Sensing

View full text Add to dashboard Cite

Semantic-level land-use scene classification is a challenging problem, in which deep learning methods, e.g., convolutional neural networks (CNNs), have shown remarkable capacity. However, a lack of sufficient labeled images has proved a hindrance to increasing the land-use scene classification accuracy of CNNs. Aiming at this problem, this paper proposes a CNN pre-training method under the guidance of a human visual attention mechanism. Specifically, a computational visual attention model is used to automatically extract salient regions in unlabeled images. Then, sparse filters are adopted to learn features from these salient regions, with the learnt parameters used to initialize the convolutional layers of the CNN. Finally, the CNN is further fine-tuned on labeled images. Experiments are performed on the UCMerced and AID datasets, which show that when combined with a demonstrative CNN, our method can achieve 2.24% higher accuracy than a plain CNN and can obtain an overall accuracy of 92.43% when combined with AlexNet. The results indicate that the proposed method can effectively improve CNN performance using easy-to-access unlabeled images and thus will enhance the performance of land-use scene classification especially when a large-scale labeled dataset is unavailable.

show abstract

Target Detection Approaches to Hyperspectral Image Classification

Chang

Bai

2022

Advances in Hyperspectral Image Processing Techniques

View full text Add to dashboard Cite

Remote Sensing Scene Classification by Unsupervised Representation Learning

Cited by 271 publications

References 51 publications

Towards Real-Time Service from Remote Sensing: Compression of Earth Observatory Video Data via Long-Term Background Referencing

Towards Real-Time Service from Remote Sensing: Compression of Earth Observatory Video Data via Long-Term Background Referencing

Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters

Target Detection Approaches to Hyperspectral Image Classification

Contact Info

Product

Resources

About