Fault diagnosis of rolling bearings in complex environments is a difficult problem. First, the median filter can remove the noise in the vibration signals, however, it cannot adaptively adjust the filter weights according to the input signals. Second, the popular vision transformer (ViT) cannot extract local feature information under complex conditions and has a large number of parameters, which result in increased computational complexity. To solve these problems, a lightweight multi-feature fusion ViT bearing fault diagnosis method with strong local awareness in complex environments is proposed. Firstly, to learn the features and statistical distributions of the input signals, the gradient descent method is used to continuously and iteratively update the weights and filter the signals. Then, to better extract critical local fault information, a local sensing module is constructed using MWCNN. Finally, an improved lightweight multi-feature fusion ViT is constructed to perform global feature extraction and fault identification. The results show that the proposed method has better noise reduction effect and feature extraction ability, and can accurately identify the fault types under the complex environments.