Traffic sign recognition is crucial for intelligent transportation and autonomous driving, ensuring road safety and efficient traffic management. In this paper, a lightweight enhanced MobileViT model (E-MobileViT) is proposed. It is based on the MobileViT model, combining the advantages of CNN and Transformer. We integrate Efficient Local Attention (ELA) and Convolutional Block Attention Module (CBAM) mechanisms in the model to improve feature extraction. The proposed model improves the feature fusion structure, and significantly reduces the number of model parameters. We evaluated the model on German Traffic Sign Recognition Benchmark (GTSRB), Belgian Traffic Signs Database (BTSD) and China Traffic Signs Database (TSRD) datasets, and its accuracy reaches 99.61%, 99.26% and 97.34%, respectively, which outperforms traditional and advanced models. We confirmed the key role of ELA and CBAM mechanisms through ablation experiments. With fewer parameters than mainstream models, our E-MobileViT model is suitable for resource-constrained environments such as mobile devices, providing a balanced solution for traffic sign recognition tasks.