Summary
Huge requirements of emerging Big Data applications combined with the performance impairment of current I/O subsystems pose a great challenge on data storage, management, and access performance. In order to design efficient storage systems, a clear understanding about how factors in the I/O path affect the performance of a data‐intensive application is of utmost importance. This paper reports our ongoing research toward addressing this issue, presenting a coarse‐grained page cache aware multivariate analytical model for the performance of write operations in a parallel file system. The proposed model was developed to reflect the performance behavior observed in an extensive experimental effort, in which the impact of 14 parameters in the response time and throughput of the OrangeFS was investigated. More than one million experiments were carried out using four distinct computing infrastructures, providing a detailed performance characterization. Additionally, a thorough evaluation of the proposed model, covering more than 14 000 scenarios, is reported, discussing both qualitative and quantitative aspects. Evaluation results indicate that the model succeeded in representing the behavior of the parallel file system performance, achieving a Mean Absolute Percentage Error of 39.94%.