CRISP-DM (cross-industry standard process for data mining) methodology was developed as an intuitive tool for data scientists, to help them with applying Big Data methods in the complex technological environment of Industry 4.0. The review of numerous recent papers and studies uncovered that most of papers focus either on the application of existing methods in case studies, summarizing existing knowledge, or developing new methods for a certain kind of problem. Although all of these types of research are productive and required, we identified a lack of complex best practices for a specific field. Therefore, our goal is to propose best practices for the data analysis in production industry. The foundation of our proposal is based on three main points: the CRISP-DM methodology as the theoretical framework, the literature overview as an expression of current needs and interests in the field of data analysis, and case studies of projects we were directly involved in as a source of real-world experience. The results are presented as lists of the most common problems for selected phases (‘Data Preparation’ and ‘Modelling’), proposal of possible solutions, and diagrams for these phases. These recommendations can help other data scientists avoid certain problems or choose the best way to approach them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.