Deep Learning-Based Context-Aware Video Content Analysis on IoT Devices

Gad, Gad; Gad, Eyad; Cengiz, Korhan; Fadlullah, Zubair Md.; Mokhtar, Bassem

doi:10.3390/electronics11111785

Cited by 6 publications

(1 citation statement)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By decoupling the spatial-temporal representation into the "first-spatial-then-temporal" paradigm, the whole model can be trained end to end by connecting the pre-training task with the downstream study. Gad et al [11] proposed two real-time video caption methods based on Transformer and LSTM by integrating machine learning and the Internet of Things (IoT). The neural network is trained by reading many video caption pairs to restrict the caption to a subject-verb-object (SVO) template while replacing multiple lyrics with one word.…”

Section: Video Captionmentioning

confidence: 99%

MFVC: Urban Traffic Scene Video Caption Based on Multimodal Fusion

Zhang

et al. 2022

Electronics

View full text Add to dashboard Cite

With the development of electronic technology, intelligent cars can gradually realize more complex artificial intelligence algorithms. The video caption algorithm is one of them. However, current video caption algorithms only consider single-visual information when applied to urban traffic scenes, which leads to the failure to generate accurate captions of complex sets. The multimodal fusion algorithm based on Transformer is one of the solutions to this problem. However, the existing algorithms have the difficulties of a low fusion performance and high computational complexity. We propose a new video caption Transformer-based model, the MFVC (Multimodal Fusion for Video Caption), to solve these issues. We introduce audio modal data and the attention bottleneck module to increase the available information to describe the generative model and improve the model effect with less operation costs through the attention bottleneck module. Finally, the experiment is conducted on the available datasets, MSR-VTT and MSVD. Meanwhile, to verify the effect of the model on the urban traffic scene, the experiment is carried out on the self-built traffic caption dataset BUUISE, and the evaluation index confirms the model. This model can achieve good results on both available datasets and urban traffic datasets and has excellent application prospects in the intelligent driving industry.

show abstract

Section: Video Captionmentioning

confidence: 99%

MFVC: Urban Traffic Scene Video Caption Based on Multimodal Fusion

Zhang

et al. 2022

Electronics

View full text Add to dashboard Cite

show abstract

The research landscape on generative artificial intelligence: a bibliometric analysis of transformer-based models

Marchena Sekli

2024

View full text Add to dashboard Cite

PurposeThe aim of this study is to offer valuable insights to businesses and facilitate better understanding on transformer-based models (TBMs), which are among the widely employed generative artificial intelligence (GAI) models, garnering substantial attention due to their ability to process and generate complex data.Design/methodology/approachExisting studies on TBMs tend to be limited in scope, either focusing on specific fields or being highly technical. To bridge this gap, this study conducts robust bibliometric analysis to explore the trends across journals, authors, affiliations, countries and research trajectories using science mapping techniques – co-citation, co-words and strategic diagram analysis.FindingsIdentified research gaps encompass the evolution of new closed and open-source TBMs; limited exploration across industries like education and disciplines like marketing; a lack of in-depth exploration on TBMs' adoption in the health sector; scarcity of research on TBMs' ethical considerations and potential TBMs' performance research in diverse applications, like image processing.Originality/valueThe study offers an updated TBMs landscape and proposes a theoretical framework for TBMs' adoption in organizations. Implications for managers and researchers along with suggested research questions to guide future investigations are provided.

show abstract