2020
DOI: 10.3390/app10093056
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Transformation of a Video Using Multimodal Information for an Engaging Exploration Experience

Abstract: Exploring the content of a video is typically inefficient due to the linear streamed nature of its media and the lack of interactivity. While different approaches have been proposed for enhancing the exploration experience of video content, the general view of video content has remained basically the same, that is, a continuous stream of images. It is our contention that such a conservative view on video limits its potential value as a content source. This paper presents An Alternative Representation of Video … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 69 publications
(103 reference statements)
0
2
0
Order By: Relevance
“…With technological developments in single-modal AI technology, there is a developing desire among academics for multimodal information technology. Methods for multimodal text recognition attempt to combine visual feature information such as image, video, audio, etc., first through translation [17] and alignment [18] of modal information. The model is then pretrained to facilitate cross-modal interaction between features to learn about the concealed wealth of information between modalities.…”
Section: Public Opinion Scene Text Recognitionmentioning
confidence: 99%
“…With technological developments in single-modal AI technology, there is a developing desire among academics for multimodal information technology. Methods for multimodal text recognition attempt to combine visual feature information such as image, video, audio, etc., first through translation [17] and alignment [18] of modal information. The model is then pretrained to facilitate cross-modal interaction between features to learn about the concealed wealth of information between modalities.…”
Section: Public Opinion Scene Text Recognitionmentioning
confidence: 99%
“…Color, Motion, Low-level Descriptor, Clustering based approaches, [21,28,29,32] Lecture Color , Motion and Clustering based approach [23,30,29,32] Surveillance Graph and Event-based approaches [9] Wild Life Color and Clustering based approaches Motion-based Rugby and Soccer Sports videos Keyframe Subjective [16] Deep Learning Based YoutTube videos ,SumMe and TVSum Dataset…”
Section: Movies/ Cartoonmentioning
confidence: 99%