IntroductionMany big data systems have been developed and realised to provide end user services (Netflix, Facebook, Twitter, LinkedIn etc.). Also, underlying architectures and technologies of the enabling systems have been published [1-3], and RAs have been designed and proposed [4][5][6]. Edge/5G computing is an emerging technological field [7], and the first products are being shipped to the markets. However, the utilisation of machine learning (ML) as part of the edge computing infrastructure is still an area for further research [8]. Particularly, it should be understood, how data is collected, and how models are Abstract Background: Augmented reality, computer vision and other (e.g. network functions, Internet-of-Things (IoT)) use cases can be realised in edge computing environments with machine learning (ML) techniques. For realisation of the use cases, it has to be understood how data is collected, stored, processed, analysed, and visualised in big data systems. In order to provide services with low latency for end users, often utilisation of ML techniques has to be optimized. Also, software/service developers have to understand, how to develop and deploy ML models in edge computing environments. Therefore, architecture design of big data systems to edge computing environments may be challenging.
Findings:The contribution of this paper is reference architecture (RA) design of a big data system utilising ML techniques in edge computing environments. An earlier version of the RA has been extended based on 16 realised implementation architectures, which have been developed to edge/distributed computing environments. Also, deployment of architectural elements in different environments is described. Finally, a system view is provided of the software engineering aspects of ML model development and deployment.
Conclusions:The presented RA may facilitate concrete architecture design of use cases in edge computing environments. The value of RAs is reduction of development and maintenance costs of systems, reduction of risks, and facilitation of communication between different stakeholders.