Advancements in deep learning tools originally designed for natural language processing are also applied to applications in the field of process control. Transformers, in particular, have been used to leverage self-attention mechanisms and effectively capture long-range dependencies. However, these architectures require extensive data representative of a specific process, which is not always available. To address this issue, transfer learning has emerged as a machine learning technique that enables pretrained models to adapt to new tasks with minimal additional training. This paper demonstrates a process that combines transfer learning with transformer architectures to enable a data-driven approach to control tasks, such as system identification and surrogate control modeling, when data are scarce. In this study, large amounts of data from a source system are used to train a transformer that models the dynamics of target systems for which limited data are available. The paper compares the predictive performance of models trained only on target system data with models using transfer learning including a modified transformer architecture with a physics-informed neural network (PINN) component. The results demonstrate improved predictive accuracy in system identification by up to 45% with transfer learning and up to 74% with both transfer learning and a PINN architecture. Similar accuracy improvements were observed in surrogate control tasks, with enhancements of up to 44% using transfer learning and up to 98% with transfer learning and a PINN architecture.