“…One form is to optimize the use of distributed network data caches and replicas, so a request for data goes to a "nearby" rather than "distant" source [1,30], as well as to leverage high performance protocols that are more suitable than a TCP for bulk data transfers [1]. Other optimizations leverage data subsetting, filtering, and progressive transmission from remote sources to reduce the amount of data payload crossing the network [11,23]. Some systems, like the Distributed Parallel Storage System (DPSS), provide a scalable, high performance, distributed-parallel data storage system that can be optimized for data access patterns and the characteristics of the underlying network [31].…”