The Internet of Things (IoT) is an emerging technology paradigm where millions of sensors and actuators help monitor and manage, physical, environmental and human systems in real-time. The inherent closedloop responsiveness and decision making of IoT applications make them ideal candidates for using low latency and scalable stream processing platforms. Distributed Stream Processing Systems (DSPS) hosted on Cloud data-centers are becoming the vital engine for real-time data processing and analytics in any IoT software architecture. But the efficacy and performance of contemporary DSPS have not been rigorously studied for IoT applications and data streams. Here, we develop RIoTBench, a Realtime IoT Benchmark suite, along with performance metrics, to evaluate DSPS for streaming IoT applications. The benchmark includes 27 common IoT tasks classified across various functional categories and implemented as reusable micro-benchmarks. Further, we propose four IoT application benchmarks composed from these tasks, and that leverage various dataflow semantics of DSPS. The applications are based on common IoT patterns for data pre-processing, statistical summarization and predictive analytics. These are coupled with four stream workloads sourced from real IoT observations on smart cities and fitness, with peak streams rates that range from 500 − 10, 000 messages/sec and diverse frequency distributions. We validate the RIoTBench suite for the popular Apache Storm DSPS on the Microsoft Azure public Cloud, and present empirical observations. This suite can be used by DSPS researchers for performance analysis and resource scheduling, and by IoT practitioners to evaluate DSPS platforms. arXiv:1701.08530v1 [cs.DC] 30 Jan 2017 1. We classify different characteristics of streaming applications, their composition semantics, and their data sources, in § 3.2. Then, in § 4, we propose categories of tasks that are essential for IoT applications and the key features of input data streams they operate upon.3. We identify performance metrics of DSPS that are necessary to meet the latency and scalability needs of streaming IoT applications, in § 5.4. We propose the RIoTBench real-time IoT benchmark for DSPS based on representative micro-benchmark tasks, drawn from the above categories, in § 6. We design four reference IoT applications that span Data preprocessing, Statistical analytics and Predictive Analytics, and are composed from these tasks. We also identify four real-world streams with different distributions as workloads on which to evaluate them.5. Lastly, we validate the proposed benchmark suite for the popular Apache Storm DSPS, and present empirical results for the same in § 7.Our contributions benefit two classes of audience. One, for developers and users in IoT domains, RIoTBench offers a set of realistic IoT tasks and applications that they can customize and configure to help evaluate candidate DSPS platforms for their performance and scalability needs. Two, for researchers on Big Data, it provides a reference micro and applicat...
The Internet of things (IoT) is emerging as the next big wave of digital presence for billions of devices on the Internet. Smart cities are a practical manifestation of IoT, with the goal of efficient, reliable, and safe delivery of city utilities like water, power, and transport to residents, through their intelligent management. A data-driven IoT software platform is essential for realizing manageable and sustainable smart utilities and for novel applications to be developed upon them. Here, we propose such service-oriented software architecture to address 2 key operational activities in a smart utility: the IoT fabric for resource management and the data and application platform for decision-making. Our design uses Open Web standards and evolving network protocols, cloud and edge resources, and streaming big data platforms. We motivate our design requirements using the smart water management domain; some of these requirements are unique to developing nations. We also validate the architecture within a campus-scale IoT testbed at the Indian Institute of Science, Bangalore and present our experiences. Our architecture is scalable to a township or city while also generalizable to other smart utility domains. Our experiences serve as a template for other similar efforts, particularly in emerging markets and highlight the gaps and opportunities for a data-driven IoT software architecture for smart cities.As the number of IoT devices soon reaches the billions, it is essential to have a distributed software architecture that facilitates the sustainable management of these physical devices and communication networks and access to their data streams and controls for developing innovative IoT applications. Three synergistic concepts come together to enable this. Service-oriented architecture (SOA) 7,8 offers standard mechanisms and protocols for discovery, addressing, access control, invocation, and composition of services that are available on the World Wide Web (WWW), by leveraging and extending Web-based protocols such as the Hypertext Transfer Protocol (HTTP) and open representation models like the Extensible Markup Language (XML). 9 Cloud computing is a manifestation of this paradigm where infrastructure, platform, and software resources are available "as a service" (IaaS, PaaS, and SaaS), often served from geographically distributed data centers worldwide. These offer economies of scale and enable access to elastic resources using a pay-as-you-go model. 10 Such commodity clusters on the cloud have also enabled the growth of big data platforms that allow data-driven applications to be composed and scaled on tens or hundreds of virtual machines (VMs) and deal with both data volume and velocity, among other dimensions. 11 Unlike traditional enterprise or scientific applications, however, the IoT domain is distinct in the way these technologies converge to support emerging applications. (1) IoT integrates hardware, communication, software, and analytics and links the physical and the digital world. Hence, infrastructu...
Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data from Smart City cyber-infrastructure, and make active utility management decisions. As the ecosystem of such IoT applications that leverage shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping dataflows, thereby improving the resource efficiency. In this paper, we propose dataflow reuse algorithms that given a submitted dataflow, identifies the intersection of reusable tasks and streams from a collection of running dataflows to form a merged dataflow. Similar algorithms to unmerge dataflows when they are removed are also proposed. We implement these algorithms for the popular Apache Storm DSPS, and validate their performance and resource savings for 35 synthetic dataflows based on public OPMW workflows with diverse arrival and departure distributions, and on 21 real IoT dataflows from RIoTBench. We see that our Reuse algorithms reduce the count of running tasks by 38 − 46% for the two workloads, and a reduction in cumulative CPU usage of 36 − 51%, that can result in real cost savings on Cloud resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.