Scene change detection (SCD) is a task to identify changes of interest between bi-temporal images acquired at different times. A critical idea of SCD is how to identify interesting changes while overcoming noisy changes induced by camera motion or environment variation, such as viewpoint, dynamic changes and outdoor conditions. The noisy changes cause corresponding pixel pairs to have spatial difference (position relation) and temporal difference (intensity relation). Due to the limitation of local receptive field, it is difficult for traditional models based on convolutional neural network (CNN) to establish long-range relations for the semantic changes. In order to address the above challenges, we explore the potential of a transformer in SCD and propose a transformer-based SCD architecture (TransCD). From the intuition that a SCD model should be able to model both interesting and noisy changes, we incorporate a siamese vision transformer (SViT) in a feature difference SCD framework. Our motivation is that SViT is able to establish global semantic relations and model long-range context, which is more robust to noisy changes. In addition, different from the pure CNN-based models with high computational complexity, the proposed model is more efficient and has fewer parameters. Extensive experiments on the CDNet-2014 dataset demonstrate that the proposed TransCD (SViT-E1-D1-32) outperforms the state-of-the-art SCD models and achieves 0.9361 in terms of the F1 score with an improvement of 7.31%.
In high-speed train safety inspection, two changed images which are derived from corresponding parts of the same train and photographed at different times are needed to identify whether they are defects. The critical challenge of this change classification task is how to make a correct decision by using bitemporal images. In this paper, two convolutional neural networks are presented to perform this task. Distinct from traditional classification tasks which simply group each image into different categories, the two presented networks are capable of inherently detecting differences between two images and further identifying changes by using a pair of images. In doing so, even in the case that abnormal samples of specific components are unavailable in training, our networks remain capable to make inference as to whether they become abnormal using change information. This proposed method can be used for recognition or verification applications where decisions cannot be made with only one image (state). Equipped with deep learning, this method can address many challenging tasks of high-speed train safety inspection, in which conventional methods cannot work well. To further improve performance, a novel multishape training method is introduced. Extensive experiments demonstrate that the proposed methods perform well.
This paper proposed a novel anomaly detection (AD) approach of high-speed train images based on convolutional neural networks and the Vision Transformer. Different from previous AD works, in which anomalies are identified with a single image using classification, segmentation, or object detection methods, the proposed method detects abnormal difference between two images taken at different times of the same region. In other words, we cast anomaly detection problem with a single image into a difference detection problem with two images. The core idea of the proposed method is that the “anomaly” commonly represents an abnormal state instead of a specific object, and this state should be identified by a pair of images. In addition, we introduced a deep feature difference AD network (AnoDFDNet) which sufficiently explored the potential of the Vision Transformer and convolutional neural networks. To verify the effectiveness of the proposed AnoDFDNet, we gathered three datasets, a difference dataset (Diff dataset), a foreign body dataset (FB dataset), and an oil leakage dataset (OL dataset). Experimental results on the above datasets demonstrate the superiority of the proposed method. In terms of the F1-score, the AnoDFDNet obtained 76.24%, 81.04%, and 83.92% on Diff dataset, FB dataset, and OL dataset, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.