Recent advancements in deep learning opened new opportunities for learning a high-quality 3D model from a single 2D image given sufficient training on large-scale data sets. However, the significant imbalance between available amount of images and 3D models, and the limited availability of labeled 2D image data (i.e. manually annotated pairs between images and their corresponding 3D models), severely impacts the training of most supervised deep learning methods in practice. In this paper, driven by a novel design of adversarial networks, we have developed an unsupervised learning paradigm to reconstruct 3D models from a single 2D image, which is free of manually annotated pairwise input image and its associated 3D model. Particularly, the paradigm begins with training an adaption network via autoencoder with adversarial loss, which embeds unpaired 2D synthesized image domain with real world image domain to a shared latent vector space. Then, we jointly train a 3D deconvolutional network to transform the latent vector space to the 3D object space together with the embedding process. Our experiments verify our network's robust and superior performance in handling 3D volumetric object generation from a single 2D image.Existing works on 3D object reconstruction from 2D image(s) can be broadly categorized as two of the following: traditional methods without learning; deep learning based methods. 3D reconstruction without learning. The majority of traditional reconstruction methods based on SFM or SLAM [1,2] are subject to a dense number of views, and most of them rely on the hypothesis that features can be matched across views. 2D to 3D reconstruction models such as multi-view stereo [9, 10], space carving [11], multiple moving object and large scale structure from motion [3][4][5], have all demonstrated good performance in solving the 2D to 3D reconstruction problem. However these methods require high calibrated cameras and segmentation of objects from their background, which are less applicable in practice. Deep Neural Networks in 3D visual computing. Nowadays, by generating 3D volumetric data [12], prominent deep learning models such as the deep 2D convolutional neural networks can be naturally extended to learn 3D objects. Deep learning models have proven to have strong capabilities in learning latent representative vector space of 3D objects [12]. Multi-View CNN, Conv-DAE, Voxnet, Gift, T-L embedding, 3DGAN and so on, have uncovered great potential for solving retrieval, classification, 3D reconstruction problem, etc. on [13][14][15][16][17][18].In contrast to the vast amount of research and accomplishments in the field of 3D object classification and retrieval, there are fewer research and far less accomplished results on 3D object reconstruction. Recently, researchers began to utilize 3D deconvolutional neural network to generate 3D volumetric objects from 2D images, for instance, 3D- GAN[18] and T-L embedding [17] strive to learn a latent vector space representation of 2D images, and then transform it to gene...