In this work, we present a novel and promising approach to autonomously detect different levels of simultaneous and spatiotemporal activity in multidimensional data. We introduce a new multilabeling technique, which assigns different labels to different regions of interest in the data, and thus, incorporates the spatial aspect. Each label is built to describe the level of activity/motion to be monitored in the spatial location that it represents, in contrast to existing approaches providing only a binary result as the presence or absence of activity. This novel Spatially and Motion-Level Descriptive (SMLD) labeling schema is combined with a Convolutional Long Short Term Memory-based network for classification to capture different levels of activity both spatially and temporally without the use of any foreground or object detection. The proposed approach can be applied to various types of spatiotemporal data captured for completely different application domains. In this paper, it was evaluated on video data as well as respiratory sound data. Metrics commonly associated with multilabeling, namely Hamming Loss and Subset Accuracy, as well as confusion matrix-based measurements are used to evaluate performance. Promising testing results are achieved with an overall Hamming Loss for video datasets close to 0.05, Subset Accuracy close to 80% and confusion matrix-based metrics above 0.9. In addition, our proposed approach's ability in detecting frequent motion patterns based on predicted spatiotemporal activity levels is discussed. Encouraging results have been obtained on the respiratory sound dataset as well, while detecting abnormalities in different parts of the lungs. The experimental results demonstrate that the proposed approach can be applied to various types of spatiotemporal data captured for different application domains.
Humanity's desire to enable machines to "understand" us drives research that seeks to uncover the mysteries of human beings and of their reactions. That is because a computer's ability to correctly classify our emotions will lead to an enhanced experience for a user. Making use of the eye of the computer, a webcam, we can acquire human reaction data through the acquisition of facial images in response to stimuli. The data of interest in this research are changes in pupil size and gaze patterns in conjunction with classification of facial expression. Although fusion of these measurements has been considered in the past by Xiang and Kankanhalli [14] as well as Valverde et al. [15], their approach was quite different from ours. Both groups used a multimodal set-up: an eye tracker alongside a webcam and the stimulus was visual. A novel approach is to avoid costly eye trackers and rely on images acquired only from a standard webcam to measure changes in pupil size, gaze patterns and facial expression in response to auditory stimuli. The auditory mode is often preferred since luminance does not need to be accounted for, unlike visual stimulation from a monitor. The fusion of the information from these features is then used to distinguish between negative, neutral and positive emotional states. In this paper we discuss an experiment (n = 15) where the stimuli from the auditory version of the international affective picture system (IAPS) are used to elicit these three main emotions in participants. Webcam data is recorded during the experiments and advanced signal processing and feature extraction techniques are used on the resulting image files to achieve a model capable of predicting neutral, positive, and negative emotional states.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.