Person Image Synthesis aims at transferring the appearance of the source person image into a target pose. Existing methods cannot handle large pose variations and therefore suffer from two critical problems, 1) synthesis distortion due to the entanglement of pose and appearance information among different body components; and 2) failure in preserving original semantics (
e.g.
, the same outfit). In this paper, we explicitly address these two problems by proposing a Pose and Attribute Consistent Person Image Synthesis Network (PAC-GAN). To reduce pose and appearance matching ambiguity, we propose a component-wise transferring model consisting of two stages. The former stage focuses only on synthesizing target poses, while the latter renders target appearances by explicitly transferring the appearance information from the source image to the target image in a component-wise manner. In this way, source-target matching ambiguity is eliminated due to the component-wise disentanglement of pose and appearance synthesis. Second, to maintain attribute consistency, we represent the input image as an attribute vector and impose a high-level semantic constraint using this vector to regularize the target synthesis. Extensive experimental results on the DeepFashion dataset demonstrate the superiority of our method over the state-of-the-arts, especially for maintaining pose and attribute consistencies under large pose variations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.