With the rise of mobile devices, bel canto practitioners increasingly utilize smart devices as auxiliary tools for improving their singing skills. However, they frequently encounter timbre abnormalities during practice, which, if left unaddressed, can potentially harm their vocal organs. Existing singing assessment systems primarily focus on pitch and melody and lack real-time detection of bel canto timbre abnormalities. Moreover, the diverse vocal habits and timbre compositions among individuals present significant challenges in cross-user recognition of such abnormalities. To address these limitations, we propose TimbreSense, a novel bel canto timbre abnormality detection system. TimbreSense enables real-time detection of the five major timbre abnormalities commonly observed in bel canto singing. We introduce an effective feature extraction pipeline that captures the acoustic characteristics of bel canto singing. By applying temporal average pooling to the Short-Time Fourier Transform (STFT) spectrogram, we reduce redundancy while preserving essential frequency-domain information. Our system leverages a transformer model with self-attention mechanisms to extract correlation and semantic features of overtones in the frequency domain. Additionally, we employ a few-shot learning approach involving pre-training, meta-learning, and fine-tuning to enhance the system’s cross-domain recognition performance while minimizing user usage costs. Experimental results demonstrate the system’s strong cross-user domain recognition performance and real-time capabilities.