Unmanned aerial vehicles (UAVs), widely used due to their low cost and versatility, pose security and privacy threats, which calls for their reliable recognition at low altitudes. However, strong ground clutter and multipath effects severely interfere with the weak radar echoes reflected off the micro-UAVs, resulting in severe degradation of recognition reliability. Based on channel modelling and UAV recognisability analysis, a time-frequency transform-aided contrastive learning model is proposed to suppress the severe ground clutter and reliably recognise low-altitude UAVs. In the proposed framework, a timefrequency transform unit is first applied to suppress the multipath-induced ambiguity effect and ease the semantic feature extraction via Zhao-Atlas-Marks transform and morphological operation. Thereafter, a contrastive-learning-based feature extraction and fusion unit is established to suppress non-target clutter interference and extract recognisable semantic UAV features. Finally, a gated recurrent unit-based classifier is designed for UAV recognition. Sufficient experiments are carried out on both real and simulated data sets, and the comparative results verify that the proposed model outperforms the mainstream algorithms and improves the detection accuracy by more than 5% under severe ground clutter interference.