The evolution of shale gas production has reshaped North America's energy profile. Utilizing the vast amounts of data generated from production and operations, machine learning offers significant advantages in production forecasting and performance optimization. This study proposed a pioneering hybrid model integrating tabular, spatial, and temporal modalities to enhance production forecasting in unconventional shale gas reservoirs. Despite traditional methods such as artificial neural networks (ANN) and XGBoost, which rely solely on tabular data for training and prediction, this study proposes a novel 3D-parameterization method. This approach tokenizes the formation property distribution into 3-axis tensors, enabling a more comprehensive representation of spatial data. Then, a 3D-convolutional neural network (3D-CNN) with the attention mechanism module was established to process the created spatial data. For temporal modality, the long short-term memory (LSTM) module was used to accept the dynamic input and predict the monthly production simultaneously. A total of 677 wells data from Duvernay formation was collected, pre-processed and fed into the according module based on their modality. The results show that the model combined three modalities achieved an impressive level of accuracy, with a coefficient of determination (R2) of 0.8771, surpassing the tabular (0.7841) and tabular-spatial (0.8230) modalities models. Additionally, global optimization was applied to further enhance the model performance by optimizing the architecture of each module and model hyperparameters, and a 1.88% improvement was achieved from the empirical design. These advancements set a new benchmark for predictive modelling in unconventional shale gas reservoirs, highlighting the importance of utilizing data from different modalities in improving production forecast prediction.