Modern chemical process plants are typically very complex in nature due to the various material and energy recycling streams, including the implementation of process intensification in order to improve the sustainability factor. Looking at each individual unit operation, there are inherent nonlinear properties that may often involve physical and chemical phenomena occurring on different time scales. In addition, there exist chemical processes that also inherently have dynamics that are multi time-scale in nature. However, designing the control system for those integrated and intensified processes poses significant difficulties because they naturally lead to the dynamics of the network having multiple timescale behaviors and a reduced number of degrees of freedom. The status quo approach to handle multiple time scales would be to separate the plant-wide dynamics into their fast and slow components and implement a hierarchical control structure, but the current practices would lead to computational issues when being applied to model-based controllers due to the inversion of ill-conditioned and stiff differential algebraic equation models under large time-scale separation. To address this issue, this study proposes an alternative approach to modeling multi time-scale processes based on the use of multiple time-scale recurrent neural networks (MTRNNs). The analysis is demonstrated via a benchmark multi time-scale continuous stirred tank reactor (CSTR) case study, using input−output data generated via simulation of nonlinear dynamical equations of the CSTR with large parameters (reciprocal of the singular perturbation small parameter) in the form of 1/ε as the large heat transfer coefficient. The MTRNN model is constructed with groups of neuronal nodes grouped together based on their assigned time constant (also known as decay rate), which leads to an improved prediction performance when compared to other common modeling methods. The prediction performance of the proposed MTRNN model when applied to the multi time-scale CSTR results in an R 2 value of 0.9997, in addition to an average of 75 times lower root mean square error when compared to that of the nonlinear autoregressive network with exogenous inputs (NARX) and transfer function methods, implying its higher potential efficacy in modeling complex multi timescale processes.