Color image binarization plays a pivotal role in image preprocessing work, significantly impacting subsequent tasks, particularly in text recognition. This paper concentrates on Document Image Binarization (DIB), aiming to separate a image into foreground (text) and background (non-text content). Through a thorough analysis of conventional and deep learning-based approaches, we conclude that prevailing DIB methods leverage deep learning technology. Furthermore, we explore the receptive fields pre- and post-network training to underscore the Transformer model's advantages. Subsequently, we introduce a lightweight model based on the U-Net structure, enhanced with the Mobile ViT module to better capture global information features in document images. Given its adeptness at learning both local and global features, our proposed model exhibits superior performance on two standard datasets (DIBCO2012 and DIBCO2017) compared to state-of-the-art methods. Notably, our proposed DIB method presents a straightforward end-to-end model devoid of additional image preprocessing or post-processing. Moreover, its parameter count is less than a quarter of the HIP'23 model, which achives best results on three datasets(DIBCO2012, DIBCO2017 and DIBCO2018). Finally, two sets of ablation experiments were conducted to verify the effectiveness of the proposed binarization model.