Distributed file systems, such as HDFS, DFS, etc., are adopted to support cloud storage and are designed for optimizing large files access. But unfortunately, the problem of massive small files is neglected and seriously restricts the performance of distributed file systems. To improve and even solve the small file problem, in this paper, user access task is defined. The correlations among the access tasks, applications and access files are constructed by the improved PLSA, and the research object is transferred from file-level to task-level. Then, an effective strategy is proposed to improving small file problem in distributed file system. The strategy merges small files in term of access tasks and selects prefetching targets based on the transition probability of the tasks. Finally, the system efficiency analysis model is established and experimental results, compared with original HDFS, HAR and the schemes of Dong, demonstrate that the proposed strategy effectively reduce the MDS workload and the request response delay.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.