Monitoring has always been a key element on ensuring the performance of complex distributed systems, being a first step to control quality of service, detect anomalies, or make decisions about resource allocation and job scheduling, to name a few. Edge computing is a new type of distributed computing, where data processing is performed by a large number of heterogeneous devices close to the place where the data is generated. Some of the differences between this approach and more traditional architectures, like cloud or high performance computing, are that these devices have low computing power, have unstable connectivity, and are geo-distributed or even mobile. All of these aforementioned characteristics establish new requirements for monitoring tools, such as customized monitoring workflows or choosing different back-ends for the metrics, depending on the device hosting them. In this paper, we present a study of the requirements that an edge monitoring tool should meet, based on motivating scenarios drawn from literature. Additionally, we implement these requirements in a monitoring tool named FMonE. This framework allows deploying monitoring workflows that conform to the specific demands of edge computing systems. We evaluate FMonE by simulating a fog environment in the Grid’5000 testbed and we demonstrate that it fulfills the requirements we previously enumerated.
International audienceThe increasingly growing data sets processed on HPC platforms raise major challenges for the underlying storage layer. A promising alternative to POSIX-IO-compliant file systems are simpler blobs (binary large objects), or object storage systems. They offer lower overhead and better performance at the cost of largely unused features such as file hierarchies or permissions. Similarly, blobs are increasingly considered for replacing distributed file systems for big data analytics or as a base for storage abstractions like key-value stores or time-series databases. This growing interest in such object storage on HPC and big data platforms raises the question: Are blobs the right level of abstraction to enable storage-based convergence between HPC and Big Data? In this paper we take a first step towards answering the question by analyzing the applicability of blobs for both platforms
HPC and Big Data stacks are completely separated today. The storage layer offers opportunities for convergence, as the challenges associated with HPC and Big Data storage are similar: trading versatility for performance. This motivates a global move towards dropping file-based, POSIX-IO compliance systems. However, on HPC platforms this is made difficult by the centralized storage architecture using file-based storage. In this paper we advocate that the growing trend of equipping HPC compute nodes with local storage redistributes the cards by enabling object storage to be deployed alongside the application on the compute nodes. Such integration of application and storage not only allows fine-grained configuration of the storage system, but also improves application portability across platforms. In addition, the single-user nature of such application-specific storage obviates the need for resource-consuming storage features like permissions or file hierarchies offered by traditional file systems. In this article we propose and evaluate Blobs (Binary Large Objects) as an alternative to distributed file systems. We factually demonstrate that it offers drop-in compatibility with a variety of existing applications while improving storage throughput by up to 28%.
Performing diagnostics in IT systems is an increasingly complicated task, and it is not doable in satisfactory time by even the most skillful operators. Systems and their architecture change very rapidly in response to business and user demand. Many organizations see value in the maintenance and management model of NoOps that stands for No Operations. One of the implementations of this model is a system that is maintained automatically without any human intervention. The path to NoOps involves not only precise and fast diagnostics but also reusing as much knowledge as possible after the system is reconfigured or changed. The biggest challenge is to leverage knowledge on one IT system and reuse this knowledge for diagnostics of another, different system. We propose a framework of weighted graphs which can transfer knowledge, and perform high-quality diagnostics of IT systems. We encode all possible data in a graph representation of a system state and automatically calculate weights of these graphs. Then, thanks to the evaluation of similarity between graphs, we transfer knowledge about failures from one system to another and use it for diagnostics. We successfully evaluate the proposed approach on Spark, Hadoop, Kafka and Cassandra systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.