The combined impact of new computing resources and techniques with an increasing avalanche of large datasets, is transforming many research areas and may lead to technological breakthroughs that can be used by billions of people. In the recent years, Machine Learning and especially its subfield Deep Learning have seen impressive advances. Techniques developed within these two fields are now able to analyze and learn from huge amounts of real world examples in a disparate formats. While the number of Machine Learning algorithms is extensive and growing, their implementations through frameworks and libraries is also extensive and growing too. The software development in this field is fast paced with a large number of open-source software coming from the academy, industry, start-ups or wider open-source communities. This survey presents a recent time-slide comprehensive overview with comparisons as well as trends in development and usage of cutting-edge Artificial Intelligence software. It also provides an overview of massive parallelism support that is capable of scaling computation effectively and efficiently in the era of Big Data.
Continuous high frequency water quality monitoring is becoming a critical task to support water management. Despite the advancements in sensor technologies, certain variables cannot be easily and/or economically monitored in-situ and in real time. In these cases, surrogate measures can be used to make estimations by means of datadriven models. In this work, variables that are commonly measured in-situ are used as surrogates to estimate the concentrations of nutrients in a rural catchment and in an urban one, making use of machine learning models, specifically Random Forests. The results are compared with those of linear modelling using the same number of surrogates, obtaining a reduction in the Root Mean Squared Error (RMSE) of up to 60.1%. The profit from including up to seven surrogate sensors was computed, concluding that adding more than 4 and 5 sensors in each of the catchments respectively was not worthy in terms of error improvement.
This paper describes the achievements of the H2020 project INDIGO-DataCloud. The project has provided e-infrastructures with tools, applications and cloud framework enhancements to manage the demanding requirements of scientific communities, either locally or through enhanced interfaces. The middleware developed allows to federate hybrid resources, to easily write, port and run scientific applications to the cloud. In particular, we have extended existing PaaS (Platform as a Service) solutions, allowing public and private e-infrastructures, including those provided by EGI, EUDAT, and Helix Nebula, to integrate their existing services and make them available through AAI services compliant with GEANT interfederation policies, thus guaranteeing transparency and trust in the provisioning of such services. Our middleware facilitates the execution of applications using containers on Cloud and Grid based infrastructures, as well as on HPC clusters. Our developments are freely downloadable as open source components, and are already being integrated into many scientific applications.
Containers are increasingly used as means to distribute and run Linux services and applications. In this paper we describe the architectural design and implementation of udocker, a tool which enables the user to execute Linux containers in user mode. We also present a few practical applications, using a range of scientific codes characterized by different requirements: from single core execution to MPI parallel execution and execution on GPGPUs.
Private cloud infrastructures are now widely deployed and adopted across
technology industries and research institutions. Although cloud computing has
emerged as a reality, it is now known that a single cloud provider cannot fully
satisfy complex user requirements. This has resulted in a growing interest in
developing hybrid cloud solutions that bind together distinct and heterogeneous
cloud infrastructures. In this paper we describe the orchestration approach for
heterogeneous clouds that has been implemented and used within the
INDIGO-DataCloud project. This orchestration model uses existing open-source
software like OpenStack and leverages the OASIS Topology and Specification for
Cloud Applications (TOSCA) open standard as the modeling language. Our approach
uses virtual machines and Docker containers in an homogeneous and transparent
way providing consistent application deployment for the users. This approach is
illustrated by means of two different use cases in different scientific
communities, implemented using the INDIGO-DataCloud solutions
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.