Manufacturing industries involve both business processes and complex manufacturing processes. Predictive process monitoring techniques are effective for managing process executions by making multi-perspective real-time predictions, preventing issues such as delivery delays. Conventional predictive process monitoring for business processes focuses on predicting the next activity, next event time, and remaining time using single-task learning, which is costly and complex. For complex manufacturing processes, predictive process monitoring primarily aims to predict the remaining time, that is, product cycle time. However, single-task learning methods fail to capture all the variations within the historical process executions. To address them, we propose the multi-gate mixture of transformer-based experts framework, which leverages a transformer network within the multi-gate mixture-of-experts multi-task learning architecture to extract sequential features and employs gated expert networks to model task commonalities and differences. Empirical results demonstrate that multi-gate mixture of transformer-based experts outperforms three alternative architectures across five real-life event logs, highlighting its generalization, effectiveness, and efficiency in predictive process monitoring.