Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine‐grained image ranking. We propose to combine attribute‐based representation and online passive‐aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine‐grained image ranking, we introduce a supervised constrained clustering method to gather class‐balanced training instances for local PA‐based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state‐of‐the‐art methods in terms of accuracy and speed.