In this paper, we propose a learning-based method for predicting dense depth values of a scene from a monocular omnidirectional image. An omnidirectional image has a full field-of-view, providing much more complete descriptions of the scene than perspective images. However, fullyconvolutional networks that most current solutions rely on fail to capture rich global contexts from the panorama. To address this issue and also the distortion of equirectangular projection in the panorama, we propose Cubemap Vision Transformers (CViT), a new transformer-based architecture that can model long-range dependencies and extract distortion-free global features from the panorama. We show that cubemap vision transformers have a global receptive field at every stage and can provide globally coherent predictions for spherical signals. To preserve important local features, we further design a convolution-based branch in our pipeline (dubbed GLPanoDepth) and fuse global features from cubemap vision transformers at multiple scales. This global-to-local strategy allows us to fully exploit useful global and local features in the panorama, achieving stateof-the-art performance in panoramic depth estimation.
Emotion elicitation experiments are conducted to collect biological signals from a subject who is in a state of emotion. The recorded signals are used as training/test dataset for constructing an emotion recognition system by means of machine learning. In conventional emotion elicitation experiments, affective images or videos were provided for a subject to draw out an emotion from them. However, the authors have concerns about the effectiveness. To surely evoke a specific emotion from subjects, we have produced several Virtual Reality (VR) scenes and provided the subjects with the scenes through a Head Mount Display (HMD) in emotion elicitation experiments. Usability and effectiveness of the VR scenes with the HMD for emotion elicitation were experimentally verified. It was confirmed that experience of the VR scenes with the HMD was effective in evoking emotions, but we have to improve how subjects learn a way of playing VR scenes and provide measures against VR sickness at any cost. Moreover, Support Vector Machine classifiers as an emotion recognition system were constructed using the biological signals measured from the subjects in the emotion elicitation experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.