The objective of a sound event detector is to recognize anomalies in an audio clip and return their onset and offset. However, detecting sound events in noisy environments is a challenging task. This is due to the fact that in a real audio signal several sound sources co-exist and that the characteristics of polyphonic audio are different from isolated recordings. It is also necessary to consider the presence of noise (e.g. thermal and environmental). In this contribution, we present a sound anomaly detection system based on a fully convolutional network which exploits image spatial filtering and an Atrous Spatial Pyramid Pooling module. To cope with the lack of datasets specifically designed for sound event detection, a dataset for the specific application of noisy bus environments has been designed. The dataset has been obtained by mixing background audio files, recorded in a real environment, with anomalous events extracted from monophonic collections of labelled audios. The performances of the proposed system have been evaluated through segment-based metrics such as error rate, recall, and F1-Score. Moreover, robustness and precision have been evaluated through four different tests. The analysis of the results shows that the proposed sound event detector outperforms both state-of-the-art methods and general purpose deep learning-solutions.
In the near future, the broadcasting scenario will be characterized by immersive content. One of the systems for capturing the 3D content of a scene is the Light Field imaging. The huge amount of data and the specific transmission scenario impose strong constraints on services and applications. Among others, the evaluation of the quality of the received media cannot rely on the original signal but should be based only on the received data. In this direction, we propose a no-reference quality metric for light field images which is based on spatial and angular characteristics. In more details, the estimated saliency and cyclopean maps of light field images are exploited to extract the spatial features. The angular consistency features are, instead, measured with the use of the Global Luminance Distribution knowledge and the Weighted Local Binary Patterns operator on Epipolar Plane Images. The effectiveness of the proposed metric is assessed by comparing its performance with state-of-the-art quality metrics using 4 datasets: SMART, Win5-LID, VALID 10-bit, and VALID 8-bit. Furthermore, the performance is analyzed in crossdatasets, with different distortions, and for different saliency maps. The achieved results show that the performance of the proposed model outperforms state-of-the-art approaches and perform well for different distortion types and with various saliency models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.