Deep neural networks (DNNs) have found widespread applications in interpreting remote sensing (RS) imagery. However, it has been demonstrated in previous works that DNNs are vulnerable to different types of noises, particularly adversarial noises. Surprisingly, there has been a lack of comprehensive studies on the robustness of RS tasks, prompting us to undertake a thorough survey and benchmark on the robustness of image classification and object detection in RS. To our best knowledge, this study represents the first comprehensive examination of both natural robustness and adversarial robustness in RS tasks. Specifically, we have curated and made publicly available datasets that contain natural and adversarial noises. These datasets serve as valuable resources for evaluating the robustness of DNNs-based models. To provide a comprehensive assessment of model robustness, we conducted meticulous experiments with numerous different classifiers and detectors, encompassing a wide range of mainstream methods. Through rigorous evaluation, we have uncovered insightful and intriguing findings, which shed light on the relationship between adversarial noise crafting and model training, yielding a deeper understanding of the susceptibility and limitations of various models, and providing guidance for the development of more resilient and robust models
The input method based on free-hand gestures has gradually become a hot research direction in the field of human-computer interaction. Hand gestures like sign languages, however, demand quite a lot of knowledge and practice for interaction, and air writing methods require their users to hold the arm and hand in mid-air for a period of time. These methods limit the user experience and get severer when a large number of gestures are required. To address the problem, this paper presents a novel human-3DTV interaction system based on a set of simple free-hand gestures for direct-touch interaction with a virtual interface to facilitate human-3DTV interaction. Specifically, our system projects a virtual interface in front of the user who wears the 3D shutter glass, and the user just stretches the arm and touches the virtual interface like performing on a smart phone with a touch screen, using gestures such as Click, Slide, Hold, Drag and Zoom In/Out. Our system is able to recognize the user's gesture fast and accurately, as the system only needs to search for a small region neighboring the virtual interface for a small set of gesture types. Because we adopt the key gestures using on smart phones, our free-hand gestures can be easily used by anyone with only a brief training. The users feel more comfortable than traditional gesture input methods and can effectively interact with 3DTV using our system. We report a comprehensive user study on accuracy and speed to validate the advantages of the proposed human-3DTV interaction system.
Finding tiny persons under the drone vision was, is and remains to be an integral and challenging task. Unmanned Aerial Vehicles (UAVs) with high-speed, low-altitude and multiperspective flight bring about violently various scales of objects, which burdens the optimization of models. Moreover, the detection performance of densely and faintly discernible person characteristics is far less than that of large objects in highresolution aerial images. In this paper, we introduce the image cropping strategy and attention mechanism based on YOLOv5 to address small person detection in the optimized VisDrone2019 dataset. Specifically, we propose a Densely Cropped and Local Attention of object detector Network (DCLANet), which is inspired by the observation that less area occupied by small objects should be fully focused and relatively magnified in the original image. DCLANet assembled Density Map Guided Object Detection (DMNet) in Aerial Images and You Only Look Twice (YOLT): Rapid Multi-Scale Object Detection In Satellite Imagery to crop images upon training and testing stage, meanwhile, added Bottleneck Attention Mechanism (BAM) to YOLOv5 baseline framework, which more focus on person objects other than irrelevant categories. To achieve further improvement of DCLANet, we also provide bags of useful strategies: data augmentation, label fusion, category filtering and hyperparameter evolution. Extensive experiments on the VisDrone2019 show that DCLANet achieves state-of-the-art performance, the detection result of person category AP val @0.5 is 50.04% with test-dev subset, which is substantially better than the previous SOTA method(DPNetV3) by 12.01%. In addition, on our optimized VisDrone2019 dataset, AP
Remote-vision-based image processing plays a vital role in the safety helmet and harness monitoring of construction sites, in which computer-vision-based automatic safety helmet and harness monitoring systems have attracted significant attention for practical applications. However, many problems have not been well solved in existing computer-vision-based systems, such as the shortage of safety helmet and harness monitoring datasets and the low accuracy of the detection algorithms. To address these issues, an attribute-knowledge-modeling-based safety helmet and harness monitoring system is constructed in this paper, which elegantly transforms safety state recognition into images’ semantic attribute recognition. Specifically, a novel transformer-based end-to-end network with a self-attention mechanism is proposed to improve attribute recognition performance by making full use of the correlations between image features and semantic attributes, based on which a security recognition system is constructed by integrating detection, tracking, and attribute recognition. Experimental results for safety helmet and harness detection demonstrate that the accuracy and robustness of the proposed transformer-based attribute recognition algorithm obviously outperforms the state-of-the-art algorithms, and the presented system is robust to challenges such as pose variation, occlusion, and a cluttered background.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.