Automatic CNN-Based Enhancement of 360° Video Experience With Multisensorial Effects

Sexton, John Patrick; Simiscuka, Anderson Augusto; McGuinness, Kevin; Muntean, Gabriel‐Miro

doi:10.1109/access.2021.3115701

Cited by 12 publications

(10 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results show a statistically significant benefit for the presence of odor and wind in the QoE. Sexton et al [11] developed an algorithm to automatically add multisensorial information to 360°videos, combining hapics and olfaction. A playback system was designed to improve on the works of Comsa et al [9] and Bi et al [28].…”

Section: A Mulsemediamentioning

confidence: 99%

“…CNNs can accurately perform action and scene recognition for the automatic generation of multisensory effects from multimedia inputs, as manually adding these effects is a lengthy process. The work described in [11] proposed an initial CNN-based solution for multisensory systems, but left several questions to be answered: how action detection can improve the effects dispensed, what other metrics can be analyzed to indicate the feasibility of the solution (e.g., GFLOPs, FPR, FNR), and how the solution can be generalized to work with a large number of videos, categories, scents and datasets. Several CNNs were also described in this section and a thorough evaluation needs to be performed for the identification of a suitable network to work with the proposed solution and help achieve best results in terms of performance, complexity and accuracy.…”

Section: B Convolutional Neural Network For Video Processingmentioning

confidence: 99%

“…None of these works consider action detection-based haptics generation, which is possible in the proposed solution due to the adoption of action detection with an approach such as SlowFast. 10 https://github.com/Fjuzi/traction_base/blob/main/scents.pdf Haptic events are be classified into two categories in an additional dataset 11 : constant stimuli and single stimuli. In constant stimuli, effects are triggered throughout the entire action, while in single stimuli, the haptic feedback happens at a key moment.…”

Section: B Selection Of Cnns and Datasetsmentioning

confidence: 99%

“…Action recognition in videos requires temporal windows, while scene recognition can be performed in each frame. The length of the temporal segments is based on the findings of Sexton et al [11], as well as the constraints imposed by the SlowFast architecture. In [11] an experiment suggested that users took, on average, 2 seconds to notice a change in scents generated by the olfaction dispenser.…”

Section: E Scene and Action Detection Processmentioning

confidence: 99%

“…The work of Sexton et al [11] proposed a first attempt to automatically generate haptic and olfactory effects based on 360°content using both video and audio. Scents are generated via scene recognition performed by neural networks while haptic content relies on audio cues.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A CNN-Based Framework for Enhancing 360° VR Experiences With Multisensorial Effects

Szabó

Simiscuka

Masneri

et al. 2023

IEEE Trans. Multimedia

Self Cite

View full text Add to dashboard Cite

Improving user experience during the delivery of immersive content is crucial for its success for both the content creators and audience. Creators can express themselves better with multisensory stimulation, while the audience can experience a higher level of involvement. The rapid development of mulsemedia devices provides better access for stimuli such as olfaction and haptics. Nevertheless, due to the required manual annotation process of adding mulsemedia effects, the amount of content available with sensorial effects is still limited. This work introduces an innovative mulsemedia-enhancement solution capable of automatically generating olfactory and haptic content based on 360°video content, with the use of neural networks. Two parallel neural networks are responsible for automatically adding scents to 360°videos: a scene detection network (responsible for static, global content) and an action detection network (responsible for dynamic, local content). A 360°video dataset with scent labels is also created and used for evaluating the robustness of the proposed solution. The solution achieves a 69.19% olfactory accuracy and 72.26% haptics accuracy during evaluation using two different datasets.

show abstract

Section: A Mulsemediamentioning

confidence: 99%

Section: B Convolutional Neural Network For Video Processingmentioning

confidence: 99%