Recently, various dimensionality reduction approaches have been proposed as alternatives to PCA or LDA. These improved approaches do not rely on a linearity assumption, and are hence capable of discovering more complex embeddings within different regions of the data sets. Despite their success on artificial datasets, it is not straightforward to predict which technique is the most appropriate for a given real dataset. In this paper, we empirically evaluate recent techniques on two real audio use cases: musical instrument loops used in music production and sound effects used in sound editing. ISOMAP and t-SNE are being compared to PCA in a visualization problem, where we end up with a two-dimensional view. Various evaluation measures are used: classification performance, as well as trustworthiness/continuity assessing the preservation of neighborhoods. Although PCA and ISOMAP can yield good continuity performance even locally (samples in the original space remain close-by in the low-dimensional one), they fail to preserve the structure of the data well enough to ensure that distinct subgroups remain separate in the visualization. We show that t-SNE presents the best performance, and can even be beneficial as a pre-processing stage for improving classification when the amount of labeled data is low.Index Terms-Dimensionality reduction, multimedia information retrieval, audio and music analysis.
This paper aims at investigating the relationship between gestures’ expressivity and the amount of attention they attract. We present a technique for quantifying behavior saliency, here understood as the capacity to capture one’s attention, by the rarity of selected motion and gestural expressive features. This rarity index is based on the real-time computation of the occurrence probability of expressive motion features numerical values. Hence, the time instants that correspond to rare unusual dynamic patterns of an expressive feature are singled out. In a multi-user scenario, the rarity index highlights the person in a group which shows the most different behavior with respect to the others. In a mono-user scenario, the rarity index highlights when the expressive content of a gesture changes. Those methods can be considered as preliminary steps toward context-aware expressive gesture analysis. This work has been partly carried out in the framework of the eNTERFACE 2008 workshop (Paris, France, August 2008) and is partially supported by the EU ICT SAME Project (www.sameproject.eu) and by the NUMEDIART Project (www.numediart.org)
In this paper, we present a comparison between four HMMbased real-time decoding algorithms for stylistic gait recognition and following. The approach is based on a probabilistic modelling of walking gestures recorded through motion capture. The algorithms are evaluated on their ability to recover the progression of the performed gestures over time in real-time, i.e. as the gestures are performed, and their robustness when the decoding is only performed on a subset of the model dimensions. The performance of studied algorithms are also evaluated in the context of a framework for "gait reconstruction", i.e. where the walking gestures recognised on lower body dimensions are used to synchronously regenerate the upper body dimensions (and vice-versa).
Abstract. This paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.