Load balance and power proportionality are both important aspects in constructing high-performance and costeffective distributed storage systems. However, traditional replica placement strategies towards load balance usually produce scattered replica layouts which disable power proportionality, while recent strategies towards power proportionality are typically based on uniform replication which compromises the ability of load balance. In this article, we introduce Superset (an organized non-uniform replica placement strategy) which takes both load balance and power proportionality into consideration. The main idea is to partition the whole system into multiple uniform replication based subsystems with the accommodated file subsets satisfying the 'superset' condition. We have conducted a series of simulations with real-world distributions of data popularity. Our results show that, compared to state of the art solutions, Superset consumes less energy to fulfill the same performance requirement while offers better performance subject to the same energy consumption constraint.
Large-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called 'snapshot plus log'. A snapshot records the whole data at a certain moment, while the log file is responsible for saving all the update operations. The historical state is then recreated from the nearest snapshot by redoing or undoing the related update operations saved in the log file. The challenge lies in how to minimize both the number of snapshots and that of redone and undone operations performed in historical state recreation. The traditional system stores snapshots at regular intervals. However, historical states do not share the same frequency of being requested. Therefore, the traditional strategy is very inefficient. This paper proposes a new strategy that determines the timestamps of the snapshots based on the distribution of the historical queries. First, the historical queries are clustered into a given number of groups according to the timestamps of the requested historical states, and the cluster centroids are calculated. Second, the snapshots are created according to the timestamps of the cluster centroids. Since the cluster centroids may change as time goes by, the above process is executed periodically. Experimental results show that with the same storage costs, the snapshot strategy proposed in this paper greatly improves the performance of recreating historical states, leading to at least 70.7% computation reduction in terms of the number of both redone and undone operations. Besides, with the same recreation performance guarantee, it brings nearly 78.9% storage reduction on average in terms of the number of snapshots.
Data placement considerably affects the I/O performance of distributed storage systems such as HDFS. An ideal placement algorithm should keep the I/O load evenly distributed among different storage nodes. Most of the existing placement algorithms with I/O load balance guarantee depend on the information of data popularity to make the placement decisions. However, the popularity information is typically not available in the data placement phase. Furthermore, it usually varies during the data lifecycle. In this paper, we propose a new placement algorithm called Balanced Distribution for Each Age Group (BEAG), which makes data placement decisions in the absence of the popularity information. This algorithm maintains multiple counters for each storage node, with each counter representing the amount of data belonging to a certain age group. It ensures that the data in each age group are equally scattered among the different storage nodes. As the popularity variance of the data belonging to the same age group is considerably smaller than that of the entire data, BEAG significantly improves the I/O load balance. Experimental results show that compared to other popularity independent algorithms, BEAG decreases the I/O load standard deviation by 11.6% to 30.4%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.