This paper presents a benchmark of supervised Automated Machine Learning (AutoML) tools. Firstly, we analyze the characteristics of eight recent open-source AutoML tools (Auto-Keras, Auto-PyTorch, Auto-Sklearn, AutoGluon, H2O AutoML, rminer, TPOT and TransmogrifAI) and describe twelve popular OpenML datasets that were used in the benchmark (divided into regression, binary and multi-class classification tasks). Then, we perform a comparison study with hundreds of computational experiments based on three scenarios: General Machine Learning (GML), Deep Learning (DL) and XGBoost (XGB). To select the best tool, we used a lexicographic approach, considering first the average prediction score for each task and then the computational effort. The best predictive results were achieved for GML, which were further compared with the best OpenML public results. Overall, the best GML AutoML tools obtained competitive results, outperforming the best OpenML models in five datasets. These results confirm the potential of the general-purpose AutoML tools to fully automate the Machine Learning (ML) algorithm selection and tuning.
Recently, the term "Industry 4.0" has emerged to characterize several Information Technology and Communication (ICT) adoptions in production processes (e.g., Internet-of-Things, implementation of digital production support information technologies). Business Analytics is often used within the Industry 4.0, thus incorporating its data intelligence (e.g., statistical analysis, predictive modeling, optimization) expert system component. In this paper, we perform a Systematic Literature Review (SLR) on this Business Analytics usage, covering a selection of 169 papers obtained from six major scientific publication sources from 2010 to March 2020. The selected papers were first classified in three major types, namely, Practical Application, Reviews and Framework Proposal. Then, we analyzed with more detail the practical application studies which were further divided into three main categories of the Gartner analytical maturity model, Descriptive Analytics, Predictive Analytics and Prescriptive Analytics. In particular, we characterized the distinct analytics studies in terms of the industry application and data context used, impact (in terms of their Technology Readiness Level) and selected data modeling method. Our SLR analysis provides a mapping of how data-based Industry 4.0 expert systems are currently used, disclosing also research gaps and future research opportunities.
This paper presents a novel Machine Learning (ML) approach to support the creation of woven fabrics. Using data from a textile company, two CRoss-Industry Standard Process for Data Mining (CRISP-DM) iterations were executed, aiming to compare three input feature representation strategies related with fabric design and finishing processes. During the modeling stage of CRISP-DM, an Automated ML (AutoML) procedure was used to select the best regression model among six distinct state-of-the-art ML algorithms. A total of nine textile physical properties were modeled (e.g., abrasion, elasticity, pilling). Overall, the simpler yarn representation strategy obtained better predictive results. Moreover, for eight fabric properties (e.g., elasticity, pilling) the addition of finishing features improved the quality of the predictions. The best ML models obtained low predictive errors (from 2% to 7%) and are potentially valuable for the textile company, since they can be used to reduce the number of production attempts (saving time and costs).
Recently, there have been advances in using unsupervised learning methods for Acoustic Anomaly Detection (AAD). In this paper, we propose an improved version of two deep AutoEncoders (AE) for unsupervised AAD for six types of working machines, namely Dense and Convolutional AEs. A large set of computational experiments was held, showing that the two proposed deep autoencoders, when combined with a mel-spectrogram sound preprocessing, are quite competitive and outperform a recently proposed AE baseline. Overall, a high-quality class discrimination level was achieved, ranging from 72% to 92%.
This paper presents a two-stage Machine Learning (ML) model to predict the arrival time of In-Process Control (IPC) samples at the quality testing laboratories of a chemical company. The model was developed using three iterations of the CRoss-Industry Standard Process for Data Mining (CRISP-DM) methodology, each focusing on a different regression approach. To reduce the ML analyst effort, an Automated Machine Learning (AutoML) was adopted during the modeling stage of CRISP-DM. The AutoML was set to select the best among six distinct state-of-the-art regression algorithms. Using recent real-world data, the three main regression approaches were compared, showing that the proposed two-stage ML model is competitive and provides interesting predictions to support the laboratory management decisions (e.g., preparation of testing instruments). In particular, the proposed method can accurately predict 70% of the examples under a tolerance of 4 time units.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.