This paper proposes a diversity identification method based on information fusion for quantitatively identifying mixed urban functional zones (UFZ), addressing the critical need for better city planning and management. This method integrates both social and physical sensing data, considering the frequency of urban functional occurrences and the intensity of human activity. Specifically, we extract “dynamic” human activity features from crowdsourced smart device data and “static” visual features from street view images. Based on the fused multi‐modal data, our method infers the large‐scale distribution of UFZs more accurately. We also create a standardized mixed UFZ dataset for model training and testing, which includes residential, commercial, public services, industrial, and ecological categories. In general, the method transforms the functional label recognition task into a probability distribution recognition task. It addresses complex land use distributions rather than simply assigning a single label to each zone. The result shows that our method could achieve a Cosine similarity of (0.542 ± 0.143), the lowest Chebyshev of (0.785 ± 0.043), and L1 distances of (0.264 ± 0.080), indicating more accurate and consistent predictions and closer match to true distributions.