The proliferation of Big Data applications puts pressure on improving and optimizing the handling of diverse datasets across different domains. Among several challenges, major difficulties arise in data-sensitive domains like banking, telecommunications, etc., where strict regulations make very difficult to upload and experiment with real data on external cloud resources. In addition, most Big Data research and development efforts aim to address the needs of IT experts, while Big Data analytics tools remain unavailable to non-expert users to a large extent. In this paper, we report on the work-in-progress carried out in the context of the H2020 project I-BiDaaS (Industrial-Driven Big Data as a Self-service Solution) which aims to address the above challenges. The project will design and develop a novel architecture stack that can be easily configured and adjusted to address crosssectoral needs, helping to resolve data privacy barriers in sensitive domains, and at the same time being usable by non-experts. This paper discusses and motivates the need for Big Data as a self-service, reviews the relevant literature, and identifies gaps with respect to the challenges described above. We then present the I-BiDaaS paradigm for Big Data as a self-service, position it in the context of existing references, and report on initial work towards the conceptual specification of the I-BiDaaS software architecture.
There has been significant interest in distributed optimization algorithms, motivated by applications in Big Data analytics, smart grid, vehicle networks, etc. While there have been extensive theory and theoretical advances, a proportionally small body of scientific literature focuses on numerical evaluation of the proposed methods in actual practical, parallel programming environments. This paper considers a general algorithmic framework of first and second order methods with sparsified communications and computations across worker nodes. The considered framework subsumes several existing methods. In addition, a novel method that utilizes unidirectional sparsified communications is introduced and theoretical convergence analysis is also provided. Namely, we prove R-linear convergence in the expected norm. A thorough empirical evaluation of the methods using Message Passing Interface (MPI) on a High Performance Computing (HPC) cluster is carried out and several useful insights and guidelines on the performance of algorithms and inherent communication-computational trade-offs in a realistic setting are derived.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.