Multi-modal retrieval has received widespread consideration since it can commendably provide massive related data support for the development of Computational Social Systems (CSS). However, the existing works still face the following challenges: (1) Rely on the tedious manual marking process when extended to CSS, which not only introduces subjective errors but also consumes abundant time and labor costs; (2) Only using strongly aligned data for training, lacks concern for the adjacency information, which makes the poor robustness and semantic heterogeneity gap difficult to be effectively fit; (3) Mapping features into real-valued forms, which leads to the characteristics of high storage and low retrieval efficiency. To address these issues in turn, we have designed a multi-modal retrieval framework based on web knowledge-driven, called Unsupervised and Robust Graph Convolutional Hashing (URGCH). The specific implementations are as follows: First, a "secondary semantic selffusion" approach is proposed, which mainly extracts semanticrich features through pre-trained neural networks, constructs the joint semantic matrix through semantic fusion, and eliminates