To guarantee high availability, automation systems must be fault-tolerant. To this end, they must provide redundant solutions for the critical parts of the system. Classical fault tolerance patterns such as standby or N-modular redundancy provide system stability in the case of a fault. Fault tolerance is subsequently degraded or, depending on the number of deployed replicas, often even unavailable until the system has been repaired.We introduce a combination of a component-based framework, redundancy patterns, and a runtime manager, which is able to provide fault tolerance, to detect host failures, and to trigger a reconfiguration of the system at runtime. This combined solution maintains system operation in case a fault occurs and automatically restores fault tolerance. The proposed solution is validated using a case study of an industrial distributed automation system. The validation shows how our solution quickly restores fault tolerance without the need for operator intervention or immediate hardware replacement while limiting the impact on other applications.
Determining the trade-off between performance and costs of a distributed software system is important as it enables fulfilling performance requirements in a cost-efficient way. The large amount of design alternatives for such systems often leads software architects to select a suboptimal solution, which may either waste resources or cannot cope with future workloads. Recently, several approaches have appeared to assist software architects with this design task. In this paper, we present a case study applying one of these approaches, i.e. PerOpteryx, to explore the design space of an existing industrial distributed software system from ABB. To facilitate the design exploration, we created a highly detailed performance and cost model, which was instrumental in determining a cost-efficient architecture solution using an evolutionary algorithm. The case study demonstrates the capabilities of various modern performance modeling tools and a design space exploration tool in an industrial setting, provides lessons learned, and helps other software architects in solving similar problems.
To meet end-user performance expectations, precise performance requirements are needed during development and testing, e.g., to conduct detailed performance and load tests. However, in practice, several factors complicate performance requirements elicitation: lacking skills in performance requirements engineering, outdated or unavailable functional specifications and architecture models, the specification of the system's context, lack of experience to collect good performance requirements in an industrial setting with very limited time, etc. From the small set of available non-functional requirements engineering methods, no method exists that alone leads to precise and complete performance requirements with feasible effort and which has been reported to work in an industrial setting. In this paper, we present our experiences in combining existing requirements engineering methods into a performance requirements method called PROPRE. It has been designed to require no up-to-date system documentation and to be applicable with limited time and effort. We have successfully applied PROPRE in an industrial case study from the process automation domain. Our lessons learned show that the stakeholders gathered good performance requirements which now improve performance testing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.