Caching can prepare data for computational tasks in advance by tracking the requirements and behaviors of distributed geographical information systems to reduce network latency and improve computational performance. This paper presents an enhanced method to actively cache data for data-intensive computations that considers both data relationships and the timeliness of those relationships. First, the access correlations, the correlation steps and the times of the correlations are computed based on the behaviors of the computational tasks. Because the influence of historically accessed records will decrease gradually over time, only recently accessed records are used. To track changes in the relationships and prevent cache waste problems, each record is given a different age-based weight. A conditional caching probability can then be computed based on the timeliness relationships, which can be used to find the appropriate data to compute simultaneously. Finally, we present several experiments that compare the proposed method with techniques that use other data placement strategies, active caching strategies and passive caching algorithms. The results show that the proposed model has better performance than other algorithms in all respects. In addition, the proposed model results in a lower cache replacement ratio. The experiments with different data sets on different data scales indicate that the proposed algorithm can also be used in large-scale distributed environments.
B Shaoming Pan