Machine learning for video event recognition

Avola, Danilo; Cascio, Marco; Cinque, Luigi; Foresti, Gian Luca; Pannone, Daniele

doi:10.3233/ica-210652

Cited by 10 publications

(4 citation statements)

References 159 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The large amount of data available has encouraged active research on analysis techniques that extract knowledge in different settings [15,16]. These techniques are able to perform different tasks in diverse fields such as the estimation of variables like the strain of a structural member in buildings [17] or the evaporation in cooling towers [18].…”

Section: Related Workmentioning

confidence: 99%

Gap imputation in related multivariate time series through recurrent neural network-based denoising autoencoder

Alonso,

Morán,

Pérez

et al. 2024

ICA

View full text Add to dashboard Cite

Technological advances in industry have made it possible to install many connected sensors, generating a great amount of observations at high rate. The advent of Industry 4.0 requires analysis capabilities of heterogeneous data in form of related multivariate time series. However, missing data can degrade processing and lead to bias and misunderstandings or even wrong decision-making. In this paper, a recurrent neural network-based denoising autoencoder is proposed for gap imputation in related multivariate time series, i.e., series that exhibit spatio-temporal correlations. The denoising autoencoder (DAE) is able to reproduce input missing data by learning to remove intentionally added gaps, while the recurrent neural network (RNN) captures temporal patterns and relationships among variables. For that reason, different unidirectional (simple RNN, GRU, LSTM) and bidirectional (BiSRNN, BiGRU, BiLSTM) architectures are compared with each other and to state-of-the-art methods using three different datasets in the experiments. The implementation with BiGRU layers outperforms the others, effectively filling gaps with a low reconstruction error. The use of this approach is appropriate for complex scenarios where several variables contain long gaps. However, extreme scenarios with very short gaps in one variable or no available data should be avoided.

show abstract

Section: Related Workmentioning

confidence: 99%

Gap imputation in related multivariate time series through recurrent neural network-based denoising autoencoder

Alonso,

Morán,

Pérez

et al. 2024

ICA

View full text Add to dashboard Cite

show abstract

“…In this way, environment background and personal information are removed from the input, enabling the model to focus exclusively on the subject and its dynamics, 84 i.e., the person moving in the scene, like in most real camera-based surveillance scenarios. 85 Instead, the sanitized amplitudes are extracted from the CSI measurements of sequential Wi-Fi data packets as signal-based features describing human poses in the radio domain. 86 This paired input enables the cross-modality supervision to learn a mapping from one domain to another during the network training phase.…”

Section: Related Workmentioning

confidence: 99%

Human Silhouette and Skeleton Video Synthesis through Wi-Fi signals

Avola,

Cascio,

Cinque

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The increasing availability of wireless access points (APs) is leading towards human sensing applications based on Wi-Fi signals as support or alternative tools to the widespread visual sensors, where the signals enable to address well-known vision-related problems such as illumination changes or occlusions. Indeed, using image synthesis techniques to translate radio frequencies to the visible spectrum can become essential to obtain otherwise unavailable visual data. This domain-to-domain translation is feasible because both objects and people affect electromagnetic waves, causing radio and optical frequencies variations. In literature, models capable of inferring radio-to-visual features mappings have gained momentum in the last few years since frequency changes can be observed in the radio domain through the channel state information (CSI) of Wi-Fi APs, enabling signal-based feature extraction, e.g., amplitude. On this account, this paper presents a novel two-branch generative neural network that effectively maps radio data into visual features, following a teacher-student design that exploits a cross-modality supervision strategy. The latter conditions signal-based features in the visual domain to completely replace visual data. Once trained, the proposed method synthesizes human silhouette and skeleton videos using exclusively Wi-Fi signals. The approach is evaluated on publicly available data, where it obtains remarkable results for both silhouette and skeleton videos generation, demonstrating the effectiveness of the proposed cross-modality supervision strategy.

show abstract

“…More recently, there have been several attempts to connect AI mechanisms with the learning algorithms in neural networks, which are raising a research hotspot in a wide range of possible applications, including network intrusion detection (Martina & Foresti, 2021), person re‐identification (Gómez‐Silva et al., 2021), and video event recognition (Avola et al., 2021). Deep‐learning‐based methods have outperformed traditional models in many machine learning tasks (Lara‐Benitez et al., 2021).…”

Section: Introductionmentioning

confidence: 99%

Transformer‐optimized generation, detection, and tracking network for images with drainage pipeline defects

Fang

Wang

et al. 2023

Computer aided Civil Eng

View full text Add to dashboard Cite

Regular detection of defects in drainage pipelines is crucial. However, some problems associated with pipeline defect detection, such as data scarcity and defect counting difficulty, need to be addressed. Therefore, a Transformer‐optimized generation, detection, and counting method for drainage‐pipeline defects was established in this paper. First, a generation network called Trans‐GAN‐Cla was developed for data augmentation. A classification network was trained to improve the quality of the generated images. Second, a detection and tracking model called Trans‐Det‐Tra was developed to track and count the number of defects. Third, the feature extraction capability of the proposed method was improved by leveraging Transformers. Compared with some well‐known convolutional neural network‐based methods, the proposed network achieved the best classification and detection accuracies of 87.2% and 87.57%, respectively. Furthermore, the F1 scores were 87.7% and 91.9%. Finally, two pieces of onsite videos were detected and tracked, and the numbers of misalignments and obstacles were accurately counted. The results indicate that the established Transformer‐optimized method can generate high‐quality images and realize the high‐accuracy detection and counting of drainage pipeline defects.

show abstract

Machine learning for video event recognition

Cited by 10 publications

References 159 publications

Gap imputation in related multivariate time series through recurrent neural network-based denoising autoencoder

Gap imputation in related multivariate time series through recurrent neural network-based denoising autoencoder

Human Silhouette and Skeleton Video Synthesis through Wi-Fi signals

Transformer‐optimized generation, detection, and tracking network for images with drainage pipeline defects

Contact Info

Product

Resources

About