d h a r y , G. M e m i k , M . Kandemir,* S. M o r e , G. T h i r u v a t h u k a l , t and A, S i n g h t C e n t e r for Parallel a n d D i s t r i b u t e d C o m p u t i n g D e p a r t m e n t o f E l e c t r i c a l and C o m p u t e r E n g i n e e r i n g N o r t h w e s t e r n U n i v e r s i t y E v a n s t o n , I L 60208 { x h s h e n , w k l i a o , c h o u d h a r , m e m i k , s s m o r e } @ e c e . n w u . e d u
AbstractEffective high-level data management is becoming an important issue with more and more scientific applications manipulating huge amounts of secondary-storage and tertiary-storage data using parallel processors. A major problem facing the current solutions to this data management problem is that these solutions either require a deep understanding of specific data storage architectures and file layouts to obtain the best performance. In this paper, we discuss the design, implementation, and evaluation of a novel application development environment for scientific computations. This environment includes a number of components that make it easy for the programmers to code and run their applications without much programming effort, and at the same time, to harness the available computational and storage power on parallel architectures. Embarking on this ambitious goal, we first present a performanceoriented meta-data management system that governs data flow between storage devices and applications. Another component of our environment is a data analysis and visualization tool which has been integrated with the recta-data management system, storage subsystem, and user applications. We also present an automatic code generator component (ACG) to help users utilize the information in the meta-data management system when they are developing new applications. All these components are tied together using an integrated Java graphical user interface (IJ-GUI) through which the user can launch her applications, can query the meta-data management system to obtain accurate information about the datasets she is interested in and about the current state of the storage devices, and can carry out data analysis and visualization, all in a unified framework. Finally, we present performance numbers from our initial implementation. Our results demonstrate that our novel application development environment provides both ease-of-use and high performance for large-scale, I/O-intensive scientific applications.