Ozone (O 3 ) pollution has surfaced as a significant threat to urban air quality in contemporary years. The precise and efficient forecast of ozone levels is fundamental in the mitigation and management of ozone pollution. Even though the air quality monitoring network offers useful multi-source pollutant concentration data for predicting ozone levels, existing models still grapple with issues arising from outlier and redundant sites influencing prediction precision, and cross-contamination between different pollutants. Also, the non-linear and volatile nature of monthly runoff makes accurate prediction more complex, provide a more granular and timely view of atmospheric flow variations. In this research, we introduce a hybrid model that unites Variational Modal Decomposition (VMD), particularly useful for separating mixed signals or extracting meaningful patterns from noisy or complex data, Convolutional Long Short-Term Memory Neural Network (CNN-LSTM) is designed for processing sequences of data with grid-like structures, such as images or video frames. CNN-LSTMs use convolutional operations to capture spatial patterns and LSTM units to model temporal dependencies, making them effective for tasks like video analysis, image sequence prediction, and spatiotemporal data processing, and VMD-CNN-LSTM to counter these issues. We commence by deconstructing the historical data series from the Nanjing air quality monitoring stations using VMD. Then, the Ensemble Empirical Mode