Artificial intelligence (AI)-enabled Industrial Internet of Things (IIoT) marks the rise of systems at the convergence of tremendous amounts of data from multiple IoT devices for complex machine learning/AI software that supports decision making and predictive maintenance in various industries. However, the omnipresent neglect of data quality leads to the accumulation of dark data and the impregnation of biases in AI systems. We address the problem of taming data quality in AI-enabled IIoT systems by devising machine learning pipelines as part of a decentralized edge-to-cloud architecture. These pipelines generate services for (i) erroneous data repair and (ii) unsupervised detection of events and deviations in sensor data. We present the design and deployment of our approach from an AI Engineering perspective using two industrial case studies.
THE INDUSTRIAL INTERNET OF THINGS(IIOT) revolutionizes several industries, such as manufacturing, transportation, and energy. It is a major driving force behind Industry 4.0 and employs Artificial Intelligence (AI) techniques, e.g., Machine Learning (ML), to exploit the massive interconnection and large volumes of IIoT data. AI-enabled Industrial IoT systems (IIoTs) improve decision-making [1] and perform predictive maintenance [2] (e.g., tool wear and product defect prediction in the manufacturing domain) in industrial processes. The quality and continuity of IIoT data is a bottleneck and makes these systems rather conservative in what they can achieve. Furthermore, the growing neglect of data quality in AI-enabled IIoTs [3] leads to the accumulation of dark data (unstructured, untagged, and untapped data not analyzed) [4] and the impregnation of biases [5].IIoT data endures a long journey on the edgecloud continuum: (i) data obtained by sensors observing industrial processes is consumed by a rugged industrial computer to control actuators, such as a machine tool in manufacturing; (ii) it is transferred to an edge device over wired/wireless communication channels using industrial communication protocols (e.g., OPC-UA, OPC-DA, NMEA, Bluetooth); and (iii) it is aggregated on edge to be transferred to the cloud using API protocols (e.g., REST, RPC, SOAP, GraphQL). Taming data quality in AI-enabled IIoTs aims to detect and manage data quality issues (bias, freezing, precision degradation, data drift in sensors) on this journey and preserve data continuity on the edge-cloud continuum. Sensor bias is an offset shifting sensor output by a constant value. A sensor freezes when its output is constant in successive samples. Precision degradation occurs IT Professional