Abstract. The medical community is producing and manipulating a tremendous volume of digital data for which computerized archiving, processing and analysis is needed. Grid infrastructures are promising for dealing with challenges arising in computerized medicine but the manipulation of medical data on such infrastructures faces both the problem of interconnecting medical information systems to grid middlewares and of preserving patients' privacy in a wide and distributed multiuser system. These constraints are often limiting the use of grids for manipulating sensitive medical data.This paper describes our design of a medical data management system taking advantage of the advanced gLite data management services, developed in the context of the EGEE project, to fulfill the stringent needs of the medical community. It ensures medical data protection through strict data access control, anonymization and encryption. The multi-level access control provides the flexibility needed for c 2007 Kluwer Academic Publishers. Printed in the Netherlands.GRID172.tex; 1 2 J. Montagnat, A. Frohner, D. Jouvenot, C. Pera et al implementing complex medical use-cases. Data anonymization prevents the exposure of most sensitive data to unauthorized users, and data encryption guarantees data protection even when it is stored at remote sites. Moreover, the developed prototype provides a grid Storage Resource Manager (SRM) interface to standard medical DICOM servers thereby enabling transparent access to medical data without interfering with medical practice.Keywords: Secure grid storage, gLite middleware, medical data management 1. Context
ObjectivesMany scientific areas benefit from large and distributed storage capabilities provided by grid infrastructures. On top of physical storage resources, the EGEE [17] grid data management system eases the manipulation of large data volumes and provides high level functionality such as data distribution, replication and optimized access. To build a data management system that can adapt to the heterogeneous data storage resources (disk, tapes, silos...), the grid community has adopted standard interfaces to virtualize the underlying resources. In particular, gLite [23], the next generation EGEE middleware, has adopted the Storage Resource Manager (SRM) interface [29] standardized in the context of the Open Grid Forum [28]. The SRM's primary concern is to provide efficient access to large volumes of data. It provides, among other services, prefetching of data files recorded on secondary storage, management of storage space and reservation of storage resources. However, it does not provide any access control nor protection of data which severely limits its usability for applications manipulating sensitive data.In this paper, we address the problem of sensitive data management on the EGEE grid infrastructure and we introduce a data management service designed to handle medical records on grids. We first motivate our approach through an in-depth requirement analysis of data management in the medical ar...