Cross media retrieval engines have gained massive popularity with rapid development of the Internet. Users may perform queries in a corpus consisting of audio, video, and textual information. To make such systems practically possible for large mount of multimedia data, two critical issues must be carefully considered: (a) reduce the storage as much as possible; (b) model the relationship of the heterogeneous media data. Recently academic community have proved that encoding the data into compact binary codes can drastically reduce the storage and computational cost. However, it is still unclear how to integrate multiple information sources properly into the binary code encoding scheme.In this paper, we study the cross media indexing problem by learning the discriminative hashing functions to map the multi-view datum into a shared hamming space. Not only meaningful withinview similarity is required to be preserved, we also incorporate the between-view correlations into the encoding scheme, where we map the similar points close together and push apart the dissimilar ones. To this end, we propose a novel hashing algorithm called Iterative Multi-View Hashing (IMVH) by taking these information into account simultaneously. To solve this joint optimization problem efficiently, we further develop an iterative scheme to deal with it by using a more flexible quantization model. In particular, an optimal alignment is learned to maintain the between-view similarity in the encoding scheme. And the binary codes are obtained by directly solving a series of binary label assignment problems without continuous relaxation to avoid the unnecessary quantization loss. In this way, the proposed algorithm not only greatly improves the retrieval accuracy but also performs strong robustness. An extensive set of experiments clearly demonstrates the superior performance of the proposed method against the state-of-the-art techniques on both multimodal and unimodal retrieval tasks.