Image based social networks are among the most popular social networking services in recent years. With tremendous images uploaded everyday, understanding users' preferences on user-generated images and making recommendations have become an urgent need. In fact, many hybrid models have been proposed to fuse various kinds of side information (e.g., image visual representation, social network) and user-item historical behavior for enhancing recommendation performance. However, due to the unique characteristics of the user generated images in social image platforms, the previous studies failed to capture the complex aspects that influence users' preferences in a unified framework. Moreover, most of these hybrid models relied on predefined weights in combining different kinds of information, which usually resulted in sub-optimal recommendation performance. To this end, in this paper, we develop a hierarchical attention model for social contextual image recommendation. In addition to basic latent user interest modeling in the popular matrix factorization based recommendation, we identify three key aspects (i.e., upload history, social influence, and owner admiration) that affect each user's latent preferences, where each aspect summarizes a contextual factor from the complex relationships between users and images. After that, we design a hierarchical attention network that naturally mirrors the hierarchical relationship (elements in each aspects level, and the aspect level) of users' latent interests with the identified key aspects. Specifically, by taking embeddings from state-of-the-art deep learning models that are tailored for each kind of data, the hierarchical attention network could learn to attend differently to more or less content. Finally, extensive experimental results on real-world datasets clearly show the superiority of our proposed model. !