Deep Edge Computing for Videos

Kim, Jun-Hwa; Kim, Nam-Ho; Won, Chee Sun

doi:10.1109/access.2021.3109904

Cited by 10 publications

(2 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the temporal stream, T frames, each having dimensions of W × H × C, are selected by uniform sampling. Then, these T frames are condensed into a single frame of W × H × C by the S3D (Shallow 3D CNN) motion module [27]. Notably, the S3D module does not utilize fixed weights; instead, its weights are initialized and updated during training, enabling a more adaptive and robust representation of motion features.…”

Section: Data Processingmentioning

confidence: 99%

Audio-Visual Action Recognition Using Transformer Fusion Network

Kim,

Won

2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

Our approach to action recognition is grounded in the intrinsic coexistence of and complementary relationship between audio and visual information in videos. Going beyond the traditional emphasis on visual features, we propose a transformer-based network that integrates both audio and visual data as inputs. This network is designed to accept and process spatial, temporal, and audio modalities. Features from each modality are extracted using a single Swin Transformer, originally devised for still images. Subsequently, these extracted features from spatial, temporal, and audio data are adeptly combined using a novel modal fusion module (MFM). Our transformer-based network effectively fuses these three modalities, resulting in a robust solution for action recognition.

show abstract

Section: Data Processingmentioning

confidence: 99%

Audio-Visual Action Recognition Using Transformer Fusion Network

Kim,

Won

2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…The aim is to utilize available resources by task partitioning and pre-processing the video data using a DNN model. [ 52 ] proposes a solution for real-time videos by designing a Front-CNN consisting of a Shallow 3D CNN and pre-trained 2D CNN as the Back-CNN. This end-to-end trainable architecture is capable of learning both spatiotemporal information of videos thereby achieving state-of-the-art performance.…”

Section: Anomaly Detection At Edge Devices Using Machine Learningmentioning

confidence: 99%

Anomaly detection using edge computing in video surveillance system: review

Patrikar

Parate

2022

Int J Multimed Info Retr

View full text Add to dashboard Cite

The current concept of smart cities influences urban planners and researchers to provide modern, secured and sustainable infrastructure and gives a decent quality of life to its residents. To fulfill this need, video surveillance cameras have been deployed to enhance the safety and well-being of the citizens. Despite technical developments in modern science, abnormal event detection in surveillance video systems is challenging and requires exhaustive human efforts. In this paper, we focus on evolution of anomaly detection followed by survey of various methodologies developed to detect anomalies in intelligent video surveillance. Further, we revisit the surveys on anomaly detection in the last decade. We then present a systematic categorization of methodologies for anomaly detection. As the notion of anomaly depends on context, we identify different objects-of-interest and publicly available datasets in anomaly detection. Since anomaly detection is a time-critical application of computer vision, we explore the anomaly detection using edge devices and approaches explicitly designed for them. The confluence of edge computing and anomaly detection for real-time and intelligent surveillance applications is also explored. Further, we discuss the challenges and opportunities involved in anomaly detection using the edge devices.

show abstract