Nowadays, many enterprises commit to the extraction of actionable knowledge from huge datasets as part of their core business activities. Applications belong to very different domains such as fraud detection or one-to-one marketing, and encompass business analytics and support to decision making in both private and public sectors. In these scenarios, a central place is held by the MapReduce framework and in particular its open source implementation, Apache Hadoop. In such environments, new challenges arise in the area of jobs performance prediction, with the needs to provide Service Level Agreement guarantees to the end-user and to avoid waste of computational resources. In this paper we provide performance analysis models to estimate MapReduce job execution times in * Acknowledgments: This work has received funding from the European Union Horizon 2020 research and innovation program under grant agreement No. 644869 (DICE). Experimental data are available as open data at https://zenodo.org/record/58847#.V5i0wmXA45Q. 1Hadoop clusters governed by the YARN Capacity Scheduler. We propose models of increasing complexity and accuracy, ranging from queueing networks to stochastic well formed nets, able to estimate job performance under a number of scenarios of interest, including also unreliable resources. The accuracy of our models is evaluated by considering the TPC-DS industry benchmark running experiments on Amazon EC2 and the CINECA Italian supercomputing center. The results have shown that the average accuracy we can achieve is in the range 9-14%.
The need for general-purpose algorithms for studying biological properties in phylogenetics motivates research into formal verification frameworks. Researchers can focus their efforts exclusively on evolution trees and property specifications. To this end, model checking, a mature automated verification technique originating in computer science, is applied to phylogenetic analysis. Our approach is based on three cornerstones: a logical modeling of the evolution with transition systems; the specification of both phylogenetic properties and trees using flexible temporal logic formulas; and the verification of the latter by means of automated computer tools. The most conspicuous result is the inception of a formal framework which allows for a symbolic manipulation of biological data (based on the codification of the taxa). Additionally, different logical models of evolution can be considered, complex properties can be specified in terms of the logical composition of others, and the refinement of unfulfilled properties as well as the discovery of new properties can be undertaken by exploiting the verification results. Some experimental results using a symbolic model verifier support the feasibility of the approach.
The development of Information Systems today faces the era of Big Data. Large volumes of information need to be processed in realtime, for example, for Facebook or Twiter analysis. This paper addresses the redesign of NewsAsset, a commercial product that helps journalists by providing services, which analyze millions of media items from the social network in realtime. Technologies like Apache Storm can help enormously in this context. We have quantitatively analyzed the new design of NewsAsset to assess whether the introduction of Apache Storm can meet the demanding performance requirements of this media product. Our assessment approach, guided by the Unified Modeling Language (UML), takes advantage, for performance analysis, of the software designs already used for development. In addition, we converted UML into a domain-specific modeling language (DSML) for Apache Storm, thus creating a profile for Storm. Later, we transformed said DSML into an appropriate language for performance evaluation, specifically, stochastic Petri nets. The assessment ended with a successful software design that certainly met the scalability requirements of NewsAsset.
International audienceThe quantitative analysis of concurrent systems requires expressive and user-friendly property languages combining temporal, data-handling, and quantitative aspects. In this paper, we aim at facilitating the quantitative analysis of systems modeled as PTSs (Probabilistic Transition Systems) labeled by actions containing data values and probabilities. We propose a new regular probabilistic operator that computes the probability measure of a path specified by a generalized regular formula involving arbitrary computations on data values. This operator, which subsumes the Until operators of PCTL and their action-based counterparts, can provide useful quantitative information about paths having certain (e.g., peak) cost values. We integrated the regular probabilistic operator into MCL (Model Checking Language) and we devised an associated on-the-fly model checking method, based on a combined local resolution of linear and Boolean equation systems. We implemented the method in the EVALUATOR model checker of the CADP toolbox and experimented it on realistic PTSs modeling concurrent systems
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.