In this paper we present a framework for learning a three layered model of human shape, pose and garment deformation. The proposed deformation model provides intuitive control over the three parameters independently, while producing aesthetically pleasing deformations of both the garment and the human body. The shape and pose deformation layers of the model are trained on a rich dataset of full body 3D scans of human subjects in a variety of poses. The garment deformation layer is trained on animated mesh sequences of dressed actors and relies on a novel technique for human shape and posture estimation under clothing.The key contribution of this paper is that we consider garment deformations as the residual transformations between a naked mesh and the dressed mesh of the same subject.
Convolutional neural networks (CNNs) with log-mel spectrum features have shown promising results for acoustic scene classification tasks. However, the performance of these CNN based classifiers is still lacking as they do not generalise well for unknown environments. To address this issue, we introduce an acoustic spectrum transformation network where traditional log-mel spectrums are transformed into imagined visual features (IVF). The imagined visual features are learned by exploiting the relationship between audio and visual features present in video recordings. An auto-encoder is used to encode images as visual features and a transformation network learns how to generate imagined visual features from log-mel. Our model is trained on a large dataset of Youtube videos. We test our proposed method on the scene classification task of DCASE and ESC-50, where our method outperforms other spectrum features, especially for unseen environments.
We introduce ShapeMate, a framework for human body shape estimation and classification for on-line fashion applications. Given a single image of a subject our framework is able to simultaneously estimate detailed 3D human body shape and compute foreground segmentation with minimal user input. Once the body shape has been estimated, various semantic parameters are extracted for garment size and style recommendation. Preliminary results demonstrate that a single image holds enough information for accurate shape classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.