Deep convolutional neural network (CNN) models are typically trained on high-resolution images. When we apply them directly to low-resolution infrared images, for example, the performances will not always be satisfactory. This is due to CNN layers that operate in a local neighborhood, which is already poor in information for infrared images. To overcome these weaknesses and increase information of global nature, a hybrid architecture based on CNN with self-attention mechanism is proposed. This later provides information about the global context by capturing the long-range interactions between the different parts of an image. In this paper, we have incorporated a convolutional–attentional form in the top layers of two pre-trained networks VGGNet and ResNet. The convolutional–attentional form is a concatenation of two paths; the original convolutional feature maps of the pre-trained network, and the output of a relative multi-head attentional block. Extensive experiments are conducted in the FLIR starter thermal dataset, where we achieve a [Formula: see text] overall accuracy in the four-class FLIR starter thermal dataset. Moreover, the proposed architectures exceed the state of the art in target recognition on two-class FLIR starter thermal dataset with a [Formula: see text] improvement in overall classification accuracy. In addition, a study on the effect of different hyper-parameters and error analysis is carried out to give some research forward directions.
The limited information contained in infrared images present a serious problem, therefore it is necessary to form a powerful feature descriptor that allows extracting the maximum information and describing the image efficiently. To address this challenge, we propose a novel approach named multi-model fusion of encoding methods (MMFEM). First, several encoding methods for Bag Of Visual Words (BOVW) model were evaluated. Then, we fuse the best encoding methods obtained using three levels of fusion: feature-level fusion, decision-level fusion and hybrid-level fusion. Finally, the outputs of the fusion process were used to form a final decision for target recognition in infrared images. Two infrared datasets were employed to evaluate the performance of the proposed approach. The first one is Visible and Infrared Spectrum (VAIS) dataset comprising six categories of ships and the second dataset is a subset of Forward-Looking InfraRed (FLIR) thermal dataset comprising two object categories, vehicles and pedestrians. The proposed approach has exceed the state of the art for both datasets and we have reached 96.96% for FLIR and 71.26% for VAIS in overall classification accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.