Recently, Image-to-Image Translation (IIT) has achieved great progress in image style transfer and semantic context manipulation for images. However, existing approaches require exhaustively labelling training data, which is labor demanding, difficult to scale up, and hard to adapt to a new domain. To overcome such a key limitation, we propose Sparsely Grouped Generative Adversarial Networks (SG-GAN) as a novel approach that can translate images in sparsely grouped datasets where only a few train samples are labelled. Using a one-input multi-output architecture, SG-GAN is well-suited for tackling multi-task learning and sparsely grouped learning tasks. The new model is able to translate images among multiple groups using only a single trained model. To experimentally validate the advantages of the new model, we apply the proposed method to tackle a series of attribute manipulation tasks for facial images as a case study. Experimental results show that SG-GAN can achieve comparable results with state-of-the-art methods on adequately labelled datasets while attaining a superior image translation quality on sparsely grouped datasets 1 .
Positive emotions are of great significance to people's daily life, such as human-computer/robot interaction. However, the structure of extensive positive emotions is not clear yet and effective standardized inducing materials containing as many positive emotional categories as possible are lacking. Thus, this paper aims to establish a Chinese positive emotion database (CPED) to (1) effectively elicit positive emotion categories as many as possible, (2) provide both the subjective feelings of different positive emotions and a corresponding peripheral physiological database, and (3) explore the structure and framework of positive emotion categories. 42 video clips of 16 positive emotion categories were screened from more than 1000 online clips. Then a total of 312 participants watched and rated these video clips during which PPG and GSR signals were recorded. 34 video clips that met hit rate and intensity standards were systemically clustered into four emotion categories (empathy, fun, creativity and esteem). Eventually, 22 film clips of these four major categories formed the CPED database. 113 features from PPG and GSR signals were extracted and entered into a SVM classifier that serves as a baseline classification method. A classification accuracy of 45.84% for four major categories of positive emotions was achieved.
Cartoon faces are widely used in social media, animation production, and social robots because of their attractive ability to convey different emotional information. Despite their popular applications, the mechanisms of recognizing emotional expressions in cartoon faces are still unclear. Therefore, three experiments were conducted in this study to systematically explore a recognition process for emotional cartoon expressions (happy, sad, and neutral) and to examine the influence of key facial features (mouth, eyes, and eyebrows) on emotion recognition. Across the experiments, three presentation conditions were employed: (1) a full face; (2) individual feature only (with two other features concealed); and (3) one feature concealed with two other features presented. The cartoon face images used in this study were converted from a set of real faces acted by Chinese posers, and the observers were Chinese. The results show that happy cartoon expressions were recognized more accurately than neutral and sad expressions, which was consistent with the happiness recognition advantage revealed in real face studies. Compared with real facial expressions, sad cartoon expressions were perceived as sadder, and happy cartoon expressions were perceived as less happy, regardless of whether full-face or single facial features were viewed. For cartoon faces, the mouth was demonstrated to be a feature that is sufficient and necessary for the recognition of happiness, and the eyebrows were sufficient and necessary for the recognition of sadness. This study helps to clarify the perception mechanism underlying emotion recognition in cartoon faces and sheds some light on directions for future research on intelligent human-computer interactions.
In this work, we tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples, which makes the learning problem harder, resulting in degraded performance and limited generalization in indoor environments and long-sequence visual odometry application. To address this issue, we propose a novel system that explicitly disentangles scale from the network estimation. Instead of relying on PoseNet architecture, our method recovers relative pose by directly solving fundamental matrix from dense optical flow correspondence and makes use of a two-view triangulation module to recover an up-to-scale 3D structure. Then, we align the scale of the depth prediction with the triangulated point cloud and use the transformed depth map for depth error computation and dense reprojection check. Our whole system can be jointly trained end-to-end. Extensive experiments show that our system not only reaches state-of-the-art performance on KITTI depth and flow estimation, but also significantly improves the generalization ability of existing self-supervised depth-pose learning methods under a variety of challenging scenarios, and achieves state-of-the-art results among self-supervised learning-based methods on KITTI Odometry and NYUv2 dataset. Furthermore, we present some interesting findings on the limitation of PoseNet-based relative pose estimation methods in terms of generalization ability. Code is available at https://github.com/B1ueber2y/TrianFlow.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.