Towards cost-effective and high-performance caching middleware for distributed systems

Zhao, Dongfang; Qiao, Kan; Raicu, Ioan

doi:10.1504/ijbdi.2016.077358

Cited by 11 publications

(6 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the former, there is no support for automatically caching or uncaching files, while in the latter, there is support for a limited number of static policies for storing files on specific tiers [1]. A caching middleware has been proposed for using local SSDs as a read-only cache of local HDDs in HDFS [55], which uses a heuristic file-placement algorithm to improve the cache but assumes that a queue of requested files is known in advanced. hatS [31] and OctopusFS [29] extended HDFS to support fine-grained storage tiering based on which file blocks are replicated and stored across both the cluster nodes and the storage tiers (see Figure 1(b)).…”

Section: Distributed File Systems and Tieringmentioning

confidence: 99%

Automating distributed tiered storage management in cluster computing

Herodotou

Kakoulli

2019

Proc. VLDB Endow.

View full text Add to dashboard Cite

Data-intensive platforms such as Hadoop and Spark are routinely used to process massive amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and new hardware technologies (e.g., NVRAM, SSDs) have recently led to the introduction of storage tiering in such settings. However, users are now burdened with the additional complexity of managing the multiple storage tiers and the data residing on them while trying to optimize their workloads. In this paper, we develop a general framework for automatically moving data across the available storage tiers in distributed file systems. Moreover, we employ machine learning for tracking and predicting file access patterns, which we use to decide when and which data to move up or down the storage tiers for increasing system performance. Our approach uses incremental learning to dynamically refine the models with new file accesses, allowing them to naturally adjust and adapt to workload changes over time. Our extensive evaluation using realistic workloads derived from Facebook and CMU traces compares our approach with several other policies and showcases significant benefits in terms of both workload performance and cluster efficiency.

show abstract

Section: Distributed File Systems and Tieringmentioning

confidence: 99%

Automating distributed tiered storage management in cluster computing

Herodotou

Kakoulli

2019

Proc. VLDB Endow.

View full text Add to dashboard Cite

show abstract

“…Upon arrival of a new write request in a read-only cache architecture [34,55,10,68,65,39] where the accessing block is not located in SSD, the request is completed by successfully recording it to HDD via 8 . When it was already cached in SSD for priority read operations, the request is considered as completed only after updating the HDD copy of data and discarding the SSD copy, successfully.…”

Section: Ssd As a Read-only Cachementioning

confidence: 99%

A Survey on Tiering and Caching in High-Performance Storage Systems

Hoseinzadeh

2019

Preprint

View full text Add to dashboard Cite

Although every individual invented storage technology made a big step towards perfection, none of them is spotless. Different data store essentials such as performance, availability, and recovery requirements have not met together in a single economically affordable medium, yet. One of the most influential factors is price. So, there has always been a trade-off between having a desired set of storage choices and the costs. To address this issue, a network of various types of storing media is used to deliver the high performance of expensive devices such as solid state drives and non-volatile memories, along with the high capacity of inexpensive ones like hard disk drives. In software, caching and tiering are longestablished concepts for handling file operations and moving data automatically within such a storage network and manage data backup in low-cost media. Intelligently moving data around different devices based on the needs is the key insight for this matter. In this survey, we discuss some recent pieces of research that have been done to improve highperformance storage systems with caching and tiering techniques.

show abstract

“…In this type of storage architectures [42,43,53,66], when a new write request arrives and its accessing data is not located in SSD, the data needs to be recorded to HDD and the request is completed only when the recording is successful. If the access data is available in SSD, the data in HDD needs to be updated, and the data in SSD may be discarded, or updated as well.…”

Section: Ssd As a Read-only Cachementioning

confidence: 99%

“…To reduce the unnecessary write operations, [42] proposes a method to check the data hotness based on the demotion counter and the proposed control metric, and migrate the hot data blocks to SSD. In [66], a heuristic file-placement algorithm is designed to improve the cache performance by considering the IO patterns of the incoming workloads. Meanwhile, a distributed caching middleware is implemented at user level to detect and manipulate the frequently accessed data.…”

Section: Ssd As a Read-only Cachementioning

confidence: 99%

“…To take care of the issue of limited SSD write cycles, researchers have proposed lots of storage algorithms [39,44,53,74] and architectures [48] to avoid the unnecessary writes to SSD. Some of them [42,43,53,66] apply SSD as the read-only cache to save the SSD lifetime, however, it decreases the write performance. Some researches [107,108] even treat HDD as the write cache for SSD to save the SSD lifetime at the cost of reducing overall system performance.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Hybrid system analysis and design for scale-out storage environments

Niu¹

View full text Add to dashboard Cite

First of all, I would like to express my great appreciation to my supervisor, Prof. Lihua Xie, for his continuous guidance, encouragement and support throughout my research efforts towards this dissertation. Without his inspiration, deep insight, and invaluable advice, this thesis would not have been possible. I have benefited tremendously from his extensive knowledge, professional experience, and strong commitment towards the excellence of research.

show abstract

Towards cost-effective and high-performance caching middleware for distributed systems

Cited by 11 publications

References 58 publications

Automating distributed tiered storage management in cluster computing

Automating distributed tiered storage management in cluster computing

A Survey on Tiering and Caching in High-Performance Storage Systems

Hybrid system analysis and design for scale-out storage environments

Contact Info

Product

Resources

About