Social image data refer to the annotated image with tags in social media, in which the tags are always labeled by users. Integrating the visual and textual information of social image can obtain accurate and comprehensive feature and improve clustering performance. However, the heterogeneous gap between tags and images makes it difficult to reasonably organize the social images. In addition, the tags are often sparse and incomplete due to personal preference and cognition differences of users. To solve these problems, we propose a novel knowledge-aware progressive clustering (KAPC) method, which employs human knowledge to guide the cross-modal clustering of social images. Firstly, we design a dual-similarity semantic expansion strategy to complement the sparse tags with human knowledge, which constructs a more complete semantic similarity matrix for tags through knowledge graphs. Secondly, we define an objective function based on information theory to bridge the heterogeneous gap, which align inter-modal cluster distribution to explore the correlation between visual and textual information. Finally, a progressive iteration method is designed to make the two modalities guide each other and obtain better performance of social image clustering. Extensive experiments on four social image datasets verify the effectiveness of the proposed KAPC method.