Scientists are increasingly relying on computational resources, both compute and storage, to expand scientific knowledge. For example, the data deluge is quickly overcoming the capacity of storage systems and the increasing use of simulation requires large compute capabilities. Thus, scientists need to expand their local resources with highly available and scalable systems. We consider cloud computing to be the solution that provides scientific applications with the computational resources needed.However, the services offered by the cloud providers do not address several important issues: how to meet the data requirements with the storage systems available, and how to optimize cost and other performance metrics. The variety of storage and compute choices with different characteristics and prices, the growth of the data stored in terms of size and number and the data management requirements make these tasks overwhelmingly complex for individual users.To address these challenges, we focus on four key elements of data management: the analysis of current storage services, the expression of data requirements and storage capabilities, data management algorithms and data-aware scheduling algorithms. We combine the information from our analysis of the storage services with their capabilities in a machine-readable format that can be processed by our implementation of the user's data requirements. Thus, we can obtain within a few milliseconds a list of storage services per application dataset that meet the user's requirements, and provide cost and performance estimates. Our unique approach to data management generates an integer linear programming problem with this list. The solution to this problem is an optimal assignment of the application's data to cloud services. Our implementation can provide optimal ii iii solutions for our use cases in less than one second. We have also created new scheduling algorithms for two types of cloud applications (MapReduce and watershed model calibration) that balance cost and execution time. The scheduling decisions are Pareto optimal and, therefore, superior to other strategies. We believe that these four elements can provide the users with a comprehensive solution to the data management problem, and allow them to take advantage of the new opportunities that cloud computing offers. has been hard for me too. I hope that in the future I will be able to spend more time with them.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.