2022
DOI: 10.3390/electronics11111785
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning-Based Context-Aware Video Content Analysis on IoT Devices

Abstract: Integrating machine learning with the Internet of Things (IoT) enables many useful applications. For IoT applications that incorporate video content analysis (VCA), deep learning models are usually used due to their capacity to encode the high-dimensional spatial and temporal representations of videos. However, limited energy and computation resources present a major challenge. Video captioning is one type of VCA that describes a video with a sentence or a set of sentences. This work proposes an IoT-based deep… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…By decoupling the spatial-temporal representation into the "first-spatial-then-temporal" paradigm, the whole model can be trained end to end by connecting the pre-training task with the downstream study. Gad et al [11] proposed two real-time video caption methods based on Transformer and LSTM by integrating machine learning and the Internet of Things (IoT). The neural network is trained by reading many video caption pairs to restrict the caption to a subject-verb-object (SVO) template while replacing multiple lyrics with one word.…”
Section: Video Captionmentioning
confidence: 99%
“…By decoupling the spatial-temporal representation into the "first-spatial-then-temporal" paradigm, the whole model can be trained end to end by connecting the pre-training task with the downstream study. Gad et al [11] proposed two real-time video caption methods based on Transformer and LSTM by integrating machine learning and the Internet of Things (IoT). The neural network is trained by reading many video caption pairs to restrict the caption to a subject-verb-object (SVO) template while replacing multiple lyrics with one word.…”
Section: Video Captionmentioning
confidence: 99%