Corrosion monitoring is crucial for ensuring the structural integrity of equipment. Acoustic emission (AE) and electrochemical noise (EN) have been proven to be highly effective for the detection of corrosion. Due to the complementary nature of these two techniques, previous studies have demonstrated that combining both signals can facilitate research on corrosion monitoring. However, current machine learning models have not yet been able to effectively integrate these two different modal types of signals. Therefore, a new deep learning framework, CorroNet, is designed to synergistically integrate AE and EN signals at the algorithmic level for the first time. The CorroNet leverages multimodal learning, enhances accuracy, and automates the monitoring process. During training, paired AE-EN data and unpaired EN data are used, with AE signals serving as anchors to help the model better align EN signals with the same corrosion stage. A new feature alignment loss function and a probability distribution consistency loss function are designed to facilitate more effective feature learning to improve classification performance. Experimental results demonstrate that CorroNet achieves superior accuracy in corrosion stage classification compared to other state-of-the-art models, with an overall accuracy of 97.01%. Importantly, CorroNet requires only EN signals during the testing phase, making it suitable for stable and continuous monitoring applications. This framework offers a promising solution for real-time corrosion detection and structural health monitoring.