This study addresses the critical challenge of Speech Enhancement (SE) in noisy environments, where the deployment of Deep Neural Networksolutions on microcontrollers is hindered by their extensive computational demands. Focusing on this gap, our research introduces a novel SE method optimized for MCUs, employing a 2-layer GRU model that capitalizes on perceptual speech properties and innovative training methodologies. By incorporating self-reference signals and a dual strategy of compression and recovery based on the Mel scale, we develop an efficient model tailored for low-latency applications. Our GRU-2L-128 model demonstrates a significant reduction in size and computational requirements, achieving a 14.2× decrease in model size and a 409.1× reduction in operations compared to traditional DNN methods like DCCRN, without compromising performance. This advancement offers a promising solution for enhancing speech intelligibility in resource-constrained devices, marking a pivotal step in SE research and application.