The recent increase in digitalization of industrial systems has resulted in a boost in data availability in the industrial environment. This has favored the adoption of machine learning (ML) methodologies for the analysis of data, but not all contexts boast data abundance. When data are scarce or costly to collect, Design of Experiments (DOE) can be used to provide an informative dataset for analysis using ML techniques. This article aims to provide a systematic overview of the literature on the joint application of DOE and ML in product innovation (PI) settings. To this end, a systematic literature review (SLR) of two major scientific databases is conducted, retrieving 388 papers, of which 86 are selected for careful analysis. The results of this review delineate the state of the art and identify the main trends in terms of experimental designs and ML algorithms selected for joint application on PI. The gaps, open problems, and research opportunities are identified, and directions for future research are provided.
Variable selection plays a fundamental role in the analysis of data containing several variables which are redundant or irrelevant to the problem of interest. The ability to identify and discard these variables would make it possible to improve predictive performances and data interpretation, thus reducing costs and computational time. Although many methods have been proposed for feature selection, in some fields there is more interest in selecting groups of variables because of the continuous nature and covariance of adjacent data. This is the case for near‐infrared spectroscopy, where several methods, mainly based on partial least squares regression, have been proposed to deal with interval selection. In this article, we consider some of these methods and propose an additional solution based on a variable clustering procedure (Cov/VSURF), Lasso regression and permutation tests. We compare their performances on four different public datasets and discuss the impact of interval selection on the predictive performances of the considered models.
In the context of product innovation, there is an emerging trend to use Machine Learning (ML) models with the support of Design Of Experiments (DOE). The paper aims firstly to review the most suitable designs and ML models to use jointly in an Active Learning (AL) approach; it then reviews ALPERC, a novel AL approach, and proves the validity of this method through a case study on amorphous metallic alloys, where this algorithm is used in combination with a Random Forest model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.