A method for detecting objects in high-resolution images is proposed that is based on representing an image as a set of its copies of decreasing scale, splitting it into blocks with overlap at each level of the image pyramid except for the top one, detecting objects in the blocks, and analyzing objects at the boundaries of adjacent blocks to merge them. The number of pyramid layers is determined by the size of the image and the input layer of the convolutional neural network (CNN). At all levels except for the top one, a block splitting is performed, and the use of overlap allows one to improve the correct classification of objects, which are divided into fragments and located in adjacent blocks. The decision to merge such fragments is made based on the analysis of the metric of intersection over union and membership in the same class. The proposed approach is evaluated for 4K and 8K images. To carry out experiments, a database is prepared with objects of two classes, person and vehicle, marked in such images. Networks of the You Only Look Once (YOLO) family of the third and fourth versions are used as CNNs. A quantitative
Aiming at the current problems of poor dynamic reconstruction of UAV aerial remote sensing images and low image clarity, the dynamic reconstruction method of UAV aerial remote sensing images based on compression perception is proposed. Construct a quality reduction model for UAV aerial remote sensing images, obtain image feature information, and further noise reduction preprocessing of UAV aerial remote sensing images to better improve the resolution, spectral and multi-temporal trends of UAV aerial remote sensing images, and effectively solve the problems of resource waste such as large amount of sampled data, long sampling time and large amount of data transmission and storage. Maximize the UAV aerial remote sensing images sampling rate, reduce the complexity of dynamic reconstruction of UAV aerial remote sensing images, and effectively obtain the research requirements of high-quality image reconstruction. The experimental results show that the proposed dynamic reconstruction method of UAV aerial remote sensing images based on compressed sensing is correct and effective, which is better than the current mainstream methods.
The paper proposes a deep neural network architecture based on the integration of the convolutional neural network Faster R-CNN with the Feature Pyramid Network module. Based on this approach, an algorithm for detecting and classifying vehicles in images and a corresponding model have been developed.
A cross-platform environment ML.NET was used to train the proposed model. The results of comparing the effectiveness of the proposed approach and convolutional neural networks YOLO v4 and Faster R-CNN are presented. The improvement of the accuracy of detection and localization of different types of vehicles in ultra-high resolutions images is shown. Examples of processing ultra-high resolutions remote sensing images and appropriate recommendations are given.
In recent years, the advancement of deep learning technology has led to excellent performance in synthetic aperture radar (SAR) automatic target recognition (ATR) technology. However, due to the interference of speckle noise, the task of classifying SAR images remains challenging. To address this issue, a multi-scale local–global feature fusion network (MFN) integrating a convolution neural network (CNN) and a transformer network was proposed in this study. The proposed network comprises three branches: a CovNeXt-SimAM branch, a Swin Transformer branch, and a multi-scale feature fusion branch. The CovNeXt-SimAM branch extracts local texture detail features of the SAR images at different scales. By incorporating the SimAM attention mechanism to the CNN block, the feature extraction capability of the model was enhanced from the perspective of spatial and channel attention. Additionally, the Swin Transformer branch was employed to extract SAR image global semantic information at different scales. Finally, the multi-scale feature fusion branch was used to fuse local features and global semantic information. Moreover, to overcome the problem of poor accuracy and inefficiency of the model due to empirically determined model hyperparameters, the Bayesian hyperparameter optimization algorithm was used to determine the optimal model hyperparameters. The model proposed in this study achieved average recognition accuracies of 99.26% and 94.27% for SAR vehicle targets under standard operating conditions (SOCs) and extended operating conditions (EOCs), respectively, on the MSTAR dataset. Compared with the baseline model, the recognition accuracy has been improved by 12.74% and 25.26%, respectively. The results demonstrated that Bayes-MFN reduces the inter-class distance of the SAR images, resulting in more compact classification features and less interference from speckle noise. Compared with other mainstream models, the Bayes-MFN model exhibited the best classification performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.