Available Storage Space Sensitive Replica Placement Strategy of HDFS

At present, in the actual big data HDFS environment, because a large amount of data is stored locally, it is necessary to optimize the data security and transmission rate. At present, there are two main problems in HDFS. The background cluster is mainly divided into management process and business process. Therefore, in the actual use process, because the management process only has the main and standby dual machine backup, all the All the services of HDFS need to query the metadata through the namenode of the management end of HDFS, which leads to the emergence of hot spots, blocking the transmission link and increasing the load pressure of the equipment. In addition, the design concept of HDFS is that the underlying hardware is not reliable, which leads to the definition of three copy mechanism for data storage. The underlying device has already sacrificed part of the space (between 2 / 3-1 / 2) through raid data protection. When the data is protected through the triple copy mechanism, it will lead to too many data storage samples, which is not conducive to the actual storage space Use.This paper introduces the architecture and storage strategy of big data HDFS, and puts forward several solutions and ideas for the existing problems.

show abstract

“…HDFS namenode automatically selects datanode to save a copy of the data. In the actual business, there are the following scenarios [7][8][9][10]:…”

Section: Hdfs Storage Strategymentioning

confidence: 99%