Fault-tolerance (FT) support is a key challenge for ensuring dependable Internet of Things (IoT) systems. Many existing FT-support mechanisms in IoT are static, tightly coupled, inflexible implementations that struggle to adapt in dynamic IoT environments. This paper proposes Complex Patterns of Failure (CPoF), an approach to providing reactive and proactive FT using Complex Event Processing (CEP) and Machine Learning (ML). Error-detection strategies are defined as nondeterministic finite automata (NFA) and implemented via CEP systems. Reactive-FT support is monitored and learned from to train ML models that proactively handle imminent future occurrences of known errors. We evaluated CPoF on an indoor agriculture system with experiments that used time and error correlations to preempt battery-depletion failures. We trained predictive models to learn from reactive-FT support and provide preemptive error recovery. CCS CONCEPTS • Computer systems organization → Embedded and cyberphysical systems; Dependable and fault-tolerant systems and networks; • Computing methodologies → Machine learning.
Providing fault-tolerance (FT) support to Internet of Things (IoT) systems is an open challenge, with many implementations providing static, tightly coupled FT support that does not adapt and evolve like IoT systems do. This paper proposes a pluggable framework based on a microservices architecture that implements FT support as two complementary microservices: one that uses complex event processing for realtime FT detection, and another that uses online machine learning to detect fault patterns and pre-emptively mitigate faults before they are activated. We provide an early evaluation of how our framework can handle a real-world scenario.
Fault-tolerance (FT) support is a key challenge for ensuring dependable Internet of Things (IoT) systems. Many existing FT-support mechanisms for IoT are static, tightly coupled, and inflexible, and so they struggle to provide effective support for dynamic IoT environments. This paper proposes Complex Patterns of Failure (CPoF), an approach to providing FT support for IoT systems using Complex Event Processing (CEP) that promotes modularity and reusability in FT-support design. System defects are defined using our Vulnerabilities, Faults, and Failures (VFF) framework, and error-detection strategies are defined as nondeterministic finite automata (NFA) implemented via CEP systems. We evaluated CPoF on an automated agriculture system and demonstrated its effectiveness against three types of errordetection checks: reasonableness, timing, and reversal. Using CPoF, we identified unreasonable environmental conditions and performance degradation via sensor data analysis.
Providing effective fault-tolerance (FT) support for Internet of Things (IoT) systems is hampered by the many ad hoc ways that it is implemented. We propose BoboCEP, a Complex Event Processing (CEP) system that provides resilient FT support for IoT systems, where errors are defined as nondeterministic finite automata. BoboCEP is designed to be distributed at the network edge, which facilitates resilient event processing and load balancing due to the active replication of FT support across the edge. We evaluated BoboCEP on a vertical farming use case to demonstrate long-term FT support and load balancing, and stress tested it under scenarios with high data throughput.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.