Recognition of space objects including spacecraft and debris is one of the main components in the space situational awareness (SSA) system. Various tasks such as satellite formation, on-orbit servicing, and active debris removal require object recognition to be done perfectly. The recognition task in actual space imagery is highly complex because the sensing conditions are largely diverse. The conditions include various backgrounds affected by noise, several orbital scenarios, high contrast, low signal-to-noise ratio, and various object sizes. To address the problem of space recognition, this paper proposes a multi-modal learning solution using various deep learning models. To extract features from RGB images that have spacecraft and debris, various convolutional neural network (CNN) based models such as ResNet, EfficientNet, and DenseNet were explored. Furthermore, RGB based vision transformer was demonstrated. Additionally, End-to-End CNN was used for classification of depth images. The final decision of the proposed solution combines the two decisions from RGB based and Depth-based models. The experiments were carried out using a novel dataset called SPARK which was generated under a realistic space simulation environment. The dataset includes various images with eleven categories, and it is divided into 150 k of RGB images and 150 k of depth images. The proposed combination of RGB based vision transformer and Depth-based End-to-End CNN showed higher performance and better results in terms of accuracy (85%), precision (86%), recall (85%), and F1 score (84%). Therefore, the proposed multi-modal deep learning is a good feasible solution to be utilized in real tasks of SSA system.
Diabetes is one of the top ten causes of death among adults worldwide. People with diabetes are prone to suffer from eye disease such as diabetic retinopathy (DR). DR damages the blood vessels in the retina and can result in vision loss. DR grading is an essential step to take to help in the early diagnosis and in the effective treatment thereof, and also to slow down its progression to vision impairment. Existing automatic solutions are mostly based on traditional image processing and machine learning techniques. Hence, there is a big gap when it comes to more generic detection and grading of DR. Various deep learning models such as convolutional neural networks (CNNs) have been previously utilized for this purpose. To enhance DR grading, this paper proposes a novel solution based on an ensemble of state-of-the-art deep learning models called vision transformers. A challenging public DR dataset proposed in a 2015 Kaggle challenge was used for training and evaluation of the proposed method. This dataset includes highly imbalanced data with five levels of severity: No DR, Mild, Moderate, Severe, and Proliferative DR. The experiments conducted showed that the proposed solution outperforms existing methods in terms of precision (47%), recall (45%), F1 score (42%), and Quadratic Weighted Kappa (QWK) (60.2%). Finally, it was able to run with low inference time (1.12 seconds). For this reason, the proposed solution can help examiners grade DR more accurately than manual means.
Human detection and activity recognition (HDAR) in videos plays an important role in various real-life applications. Recently, object detection methods such as "you only look once" (YOLO), faster region based convolutional neural network (R-CNN), and EfficientDet have been used to detect humans in videos for subsequent decision-making applications. This paper aims to address the problem of human detection in aerial captured video sequences using a moving camera attached to an aerial platform with dynamical events such as varied altitudes, illumination changes, camera jitter, and variations in viewpoints, object sizes and colors. Unlike traditional datasets that have frames captured by a static ground camera with medium or large regions of humans in these frames, the UCF-ARG aerial dataset is more challenging because it contains videos with large distances between the humans in the frames and the camera. The performance of human detection methods that have been described in the literature are often degraded when input video frames are distorted by noise, blur, illumination changes, and the like. To address these limitations, the object detection methods used in this study were trained on the COCO dataset and evaluated on the publicly available UCF-ARG dataset. The comparison between these detectors was done in terms of detection accuracy. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that EfficientDetD7 was able to outperform other detectors with 92.9% average accuracy in detecting all activities and various conditions including blurring, addition of Gaussian noise, lightening, and darkening. Additionally, deep pre-trained convolutional neural networks (CNNs) such as ResNet and EfficientNet were used to transfer learning from the ImageNet dataset to the UCF-ARG dataset and to extract highly informative features from the detected and cropped human patches. The extracted spatial features were utilized by Long Short-Term Memory (LSTM) to consider temporal relations between features for human activity recognition (HAR). Experimental results found that the EfficientNetB7-LSTM was able to outperform existing HAR methods in terms of average accuracy (80%), average precision (83%), average recall (80%), average F1 score (80%), average false negative rate (FNR) (20%), average false positive rate (FPR) (4.8%), and average Area Under Curve (AUC) (94%). The outcome is a robust HAR system which combines EfficientDetD7, and EfficientNetB7 with LSTM for human detection and activity classification, respectively.
Space situational awareness (SSA) system requires recognition of space objects that are varied in sizes, shapes, and types. The space images are challenging because of several factors such as illumination and noise and thus make the recognition task complex. Image fusion is an important area in image processing for various applications including RGB-D sensor fusion, remote sensing, medical diagnostics, and infrared and visible image fusion. Recently, various image fusion algorithms have been developed and they showed a superior performance to explore more information that are not available in single images. In this paper, we compared various methods of RGB and Depth image fusion for space object classification task. The experiments were carried out, and the performance was evaluated using 13 fusion performance metrics. It was found that the guided filter context enhancement (GFCE) outperformed other image fusion methods in terms of average gradient (8.2593), spatial frequency (28.4114), and entropy (6.9486). additionally, due to its ability to balance between good performance and inference speed (11.41 second), GFCE was selected for RGB and Depth image fusion stage before feature extraction and classification stage. The outcome of fusion method is fused images that were used to train a deep ensemble of CoAtNets to classify space objects into ten categories. The deep ensemble learning methods including bagging, boosting, and stacking were trained and evaluated for classification purposes. It was found that combination of fusion and stacking was able to improve classification accuracy largely compared to the baseline methods by producing an average accuracy of 89 % and average F1 score of 89 %.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.