In contemporary industrial processes, vibration signals collected from bearings often contain significant noise, challenging the efficacy of conventional predictive models in extracting critical degradation features and accurately predicting the remaining useful life (RUL) of bearings. Addressing these challenges, this paper introduces a novel method for predicting bearing RUL under noisy conditions, leveraging a dual-branch multi-scale convolutional attention network (DMCSA) integrated with a dense residual feature fusion network (DRF). Initially, the method applies continuous wavelet trans-form (CWT) to vibration signals to extract color time-frequency image data, followed by grayscale processing to construct a comprehensive color-grayscale time-frequency image dataset, thereby augmenting the model's input features. Enhanced channel and spatial attention mechanisms, combined with multi-scale convolutions, facilitate supe-rior feature extraction and selection. The model's resilience to noise is fortified by in-corporating noise into the training dataset. Subsequently, selected color-gray time-frequency features undergo fusion and relearning through the DRF framework at the model's backend. The crayfish optimization algorithm (COA) is deployed for the astute determination of the model's critical hyperparameters. The proposed DMCSA-DRF model is then applied to predict the health indicator (MSCA-DRF-HI) of the test dataset, culminating in the accurate prediction of the bearings' RUL. Validation experiments demonstrate that our method surpasses comparative models in prediction accuracy un-der diverse noise interferences, signifying a substantial advancement in predictive performance.