Unmanned Aerial Vehicles and the increasing variety of their applications are raising in popularity. The growing number of UAVs, emphasizes the significance of drones' reliability and robustness. Thus, there is a need for an efficient self-observing sensing mechanism to detect real-time anomalies in drone behavior. Previous works suggested prediction models from control theory, yet, they are complex by nature and hard to implement, while Deep Learning solutions are of great utility. In this paper, we propose a real-time framework to detect anomalies in drones by analyzing the sound emitted from them. For this purpose, we construct a hybrid Deep Learning based Transformer and a Convolutional Neural Network inspired by the well-known VGG architecture. Our approach is examined over a dataset that is collected from a single microphone set located on a micro drone in real-time. Our approach achieves an F1score of 88.4% in detecting anomalies and outperforms the VGG-16 architecture. Moreover, the framework presented in this paper reduces the number of parameters of the well-known VGG-16 from 138M, into a shrunk version with 3.6M parameters only. Additionally, our real-time approach, results in a smaller number of parameters in the neural network, and yet yields high accuracy in anomaly detection in drones with an average inference time of 0.2 seconds per second. Moreover, with an earphone that weighs less than 100 grams on top of the UAV, our method is shown to be beneficial, even in extreme conditions such as a microsize dataset that is composed of three hours of flight recordings. The presented self-observing method can be implemented by simply adding a microphone to drones and transmitting the captured audio for analysis to the remote control or performing it onboard the drone using a dedicated microcontroller.
Speaker Diarization (SD) consists of splitting or segmenting an input audio burst according to speaker identities. In this paper, we focus on the crucial task of the SD problem which is the audio segmenting process and suggest a solution for the Change Point Detection (CPD) problem. We empirically demonstrate the negative correlation between an increase in the number of speakers and the Recall and F1-Score measurements. This negative correlation is shown to be the outcome of a massive experimental evaluation process, which accounts its superiority to recently developed voice based solutions. In order to overcome the number of speakers issue, we suggest a robust solution based on a novel Natural Language Processing (NLP) technique, as well as a metadata features extraction process, rather than a vocal based alone. To the best of our knowledge, we are the first to tackle this variant on the SD problem (or CPD) from the intelligent NLP standpoint, and with a dataset in the Hebrew language which is an issue in its own right. We empirically show, based on two distinct datasets, that our method is abled to accurately identify the CPD's in an audio burst with 82.12% and 89.02% of success in the Recall and F1-score measurements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.