We present a new method for improving the performances of variational autoencoder (VAE). In addition to enforcing the deep feature consistent principle thus ensuring the VAE output and its corresponding input images to have similar deep features, we also implement a generative adversarial training mechanism to force the VAE to output realistic and natural images. We present experimental results to show that the VAE trained with our new method outperforms state of the art in generating face images with much clearer and more natural noses, eyes, teeth, hair textures as well as reasonable backgrounds. We also show that our method can learn powerful embeddings of input face images, which can be used to achieve facial attribute manipulation. Moreover we propose a multi-view feature extraction strategy to extract effective image representations, which can be used to achieve state of the art performance in facial attribute prediction.recovery [8,9,10], image privacy protection [11], unsupervised dimension reduction [12] and many other applications [13,14,15]. Deep convolutional generative models, as a branch of unsupervised learning technique in machine learning, have become an area of active research in recent years. A generative model trained with a given image database can be useful in several ways. One is to learn the essence of a dataset and generate realistic images similar to those in the dataset from random inputs. The whole dataset is "compressed" into the learned parameters of the model, which are significantly smaller than the size of the training dataset. The other is to learn reusable feature representations from unlabeled image datasets for a variety of supervised learning tasks such as image classification.In this paper, we propose a new method to train the variational autoencoder (VAE)[16] to improve its performance. In particular, we seek to improve the quality of the generated images to make them more realistic and less blurry. To achieve this, we employ objective functions based on deep feature consistent principle [17] and generative adversarial network [18,19] instead of the problematic per-pixel loss functions. The deep feature consistent can help capture important perceptual features such as spatial correlation through the learned convolutional operations, while the adversarial training helps to produce images that reside on the manifold of natural images. We also introduce several techniques to improve the convergence of GAN training in this context. In particular, instead of directly using the generated images and the real images in pixel space, the corresponding deep features extracted from pretrained networks are used to train the generator and the discriminator network. We also propose to further relax the constraint on the output of the discriminator network to balance the image reconstruction loss and the adversarial loss. We present experimental results to show that our new method can generate face images with much clearer facial parts such as eyes, nose, mouth, teeth, ears and hair textur...