This paper describes the achievements of the H2020 project INDIGO-DataCloud. The project has provided e-infrastructures with tools, applications and cloud framework enhancements to manage the demanding requirements of scientific communities, either locally or through enhanced interfaces. The middleware developed allows to federate hybrid resources, to easily write, port and run scientific applications to the cloud. In particular, we have extended existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT interfederation policies, thus guaranteeing transparency and trust in the provisioning of such services. Our middleware facilitates the execution of applications using containers on Cloud and Grid based infrastructures, as well as on HPC clusters. Our developments are freely downloadable as open source components, and are already being integrated into many scientific applications.
In this paper we propose a distributed architecture to provide machine learning practitioners with a set of tools and cloud services that cover the whole machine learning development cycle: ranging from the models creation, training, validation and testing to the models serving as a service, sharing and publication. In such respect, the DEEP-Hybrid-DataCloud framework allows transparent access to existing e-Infrastructures, effectively exploiting distributed resources for the most compute-intensive tasks coming from the machine learning development cycle. Moreover, it provides scientists with a set of Cloud-oriented services to make their models publicly available, by adopting a serverless architecture and a DevOps approach, allowing an easy share, publish and deploy of the developed models. INDEX TERMS Cloud computing, computers and information processing, deep learning, distributed computing, machine learning, serverless architectures.
As the data needs in every field continue to grow, storage systems have to grow and therefore need to adapt to the increasing demands of performance, reliability and fault tolerance. This also increases their complexity and costs. Improving the performance and scalability of storage systems while maintaining low costs is thus crucial. The evaluated open source storage system Ceph promises to reliably store data distributed across many nodes. Ceph is targeted at commodity hardware. This study investigates how Ceph performs in different setups and compares this with the theoretical maximum performance of the hardware. We used a bottom-up approach to benchmark Ceph at different architectural levels. We varied the amount of storage nodes and clients to test the scalability of the system. Our experiments revealed that Ceph delivers the promised scalability, and uncovered several points with improvement potential. We observed a significant increase of the write throughput by moving the Ceph journal to a faster location (in memory). Moreover, while the system scaled with the increasing number of clients operating the cluster, we noticed a slight performance degradation after the saturation point. We tested two optimisation strategies-increasing the available RAM or the object sizeand noted a write throughput increase of up to 9% and 27%, respectively. Our findings improve the understanding of Ceph and should benefit future users through the presented strategies for tackling various performance limitations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.