As the gap between the speed of computing elements and the disk subsystem widens it becomes increasingly important to understand and model disk I/O. While the speed of computational resources continues to grow, potentially scaling to multiple peta flops and millions of cores, the growth in the performance of I/O systems lags well behind. In this context, data-intensive applications that run on current and future systems depend on the ability of the I/O system to move data to the distributed memories. As a result, the I/O system becomes a bottleneck for application performance. Additionally, due to the higher risk of component failure that results from larger scales, the frequency of application checkpointing is expected to grow and put an additional burden on the disk I/O system [1]. [8,9,13]. In this paper we apply and extend a modeling methodology developed for spinning disk and use it to model disk I/O time on DASH. We studied two data-intensive applications, MADbench2 [6] and an application for geological imaging [5]. Our results show that the prediction errors for total I/O time are 14.79% for MADbench2 and our efforts for geological imaging yield error of 9% for one category of read calls; this application made a total of 3 categories of read/write. We are still investigating the geological application, and in this paper we present our results thus far for both applications.
Emergence of new technologies such as flash-based Solid State Drives (SSDs) presents an opportunity to narrow the gap between speed of computing and I/O systems. With this in mind, SDSC's PMAC lab is investigating the use of flash drives in a new prototype system called DASH