A 2011 report on big data authored by McKinsey Global Institute, an economic and business research arm of McKinsey and Company, highlighted big data analytics as a key driver in the next wave of economic innovation [1]. However, the report suggests that this innovation may be impeded by a shortage of personnel with the skills needed to derive insights from big data, with demand in the US predicted to double between 2008 and 2018. This prediction seems credible when current data growth estimates are considered, with one estimate suggesting that the worlds data is doubling approximately every 1.5 years [2], and another estimate proposing that 2.5 quintillion bytes of
The manufacturing industry is currently in the midst of a data-driven revolution, which promises to transform traditional manufacturing facilities in to highly optimised smart manufacturing facilities. These smart facilities are focused on creating manufacturing intelligence from real-time data to support accurate and timely decision-making that can have a positive impact across the entire organisation. To realise these efficiencies emerging technologies such as Internet of Things (IoT) and Cyber Physical Systems (CPS) will be embedded in physical processes to measure and monitor real-time data from across the factory, which will ultimately give rise to unprecedented levels of data production. Therefore, manufacturing facilities must be able to manage the demands of exponential increase in data production, as well as possessing the analytical techniques needed to extract meaning from these large datasets. More specifically, organisations must be able to work with big data technologies to meet the demands of smart manufacturing. However, as big data is a relatively new phenomenon and potential applications to manufacturing activities are wide-reaching and diverse, there has been an obvious lack of secondary research undertaken in the area. Without secondary research, it is difficult for researchers to identify gaps in the field, as well as aligning their work with other researchers to develop strong research themes. In this study, we use the formal research methodology of systematic mapping to provide a breadth-first review of big data technologies in manufacturing.
Using 10-minute wind turbine supervisory control and data acquisition (SCADA) system data to predict faults can be an attractive way of working toward a predictive maintenance strategy without needing to invest in extra hardware. Classification methods have been shown to be effective in this regard, but there have been some common issues in their application within the literature. To use these data-driven methods effectively, historical SCADA data must be accurately labelled with the periods when turbines were down due to faults, as well as with the reason for the fault. This can be manually achieved using maintenance logs, but can be highly tedious and time-consuming due to the often unstructured format in which this information is stored. Alarm systems can also help, but the sheer volume of alarms and false positives generated complicate efforts. Furthermore, a way to implement and evaluate the field deployed system beyond simple classification metrics is needed. In this work, we present a prescribed and reproducible framework for: (i) automatically identifying periods of faulty operation using rules applied to the turbine alarm system; (ii) using this information to perform classification which avoids some of the common pitfalls observed in literature; and (iii) generating alerts based on a sliding window metric to evaluate the performance of the system in a real-world scenario. The framework was applied to a dataset from an operating wind farm and the results show that the system can automatically and accurately label historical stoppages from the alarms data. For fault prediction, classification scores are quite low, with precision of 0.16 and recall of 0.49, but it is envisaged that this can be greatly improved with more training data. Nonetheless, the sliding window metric compensates for the low raw classification scores and shows that 71% of faults can be predicted with an average of 30 h notice, with false alarms being active for 122 h of the year. By adjusting some of the parameters of the fault prediction alerts, the duration of false alarms can be drastically reduced to 2 h, but this also reduces the number of predicted faults to 8%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.