Modern sensorization, communication and computational technologies provide collecting and storing huge amounts of raw data from large cyber-physical systems. These data should serve as the basis to take better decisions at all levels (from the design to operation and management). Nevertheless, raw data need to be transformed in useful information, usually in the form of prediction models. Machine learning plays a key role in this task. Process industry is not alien to this digital transformation, although large processing plants present particularities that differentiate them from other systems. These differences, if neglected, can make machine learning for general purpose fail in extracting the right information from data, leading thus to unreliable process models. As such models are the basis on which the ideas towards the cognitive plant rely, this issue is of major importance for a successful full digitalization of the process industry. In this paper the authors discuss these aspects, as well as some suitable machine-learning approaches, through their experience gained from applying advanced engineering in an industrial case study.