Summary
In the new millennium, a myriad of large‐scale applications (e.g., social networks, ecommerce, internet of things, and scientific experiments) often generate large volumes of data. Given their volumes, their heterogeneous and distributed nature, the management of such data constitutes a challenge for distributed systems, particularly cloud computing. In this regard, data replication is a well‐known and effective data management technique that consists in creating multiple copies of the same data in different storage resources. In this article, we propose a popularity and correlation based data replication strategy, called PCDR. The main idea of our strategy is to replicate a set of the most popular correlated file groups based on files access history analysis. In this respect, the popularity of the files in each group is determined while taking into consideration temporal locality. Moreover, a clustering technique is used to find groups of correlated files. Using the CloudSim simulator, extensive experimentations show that our proposed strategy outperforms other strategies for several evaluation metrics.