In the absence of knowledge about challenging dynamic phenomena involved in batch distillation processes, e.g., complex flow regimes or appearing and vanishing phases, generation of accurate mechanistic models is limited. Real plant data containing this missing information is scarce, also limiting the use of data‐driven models. To exploit the information contained in measurement data and a related but inaccurate first‐principles model, transfer learning from simulated to real plant data is analyzed. For the use case of a batch distillation column, the adapted model provides more accurate predictions than a data‐driven model trained exclusively on scarce real plant data or simulated data. Its enhanced convergence and lower computational cost make it suitable for optimization in real‐time.