Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery

Cai, Yu; Ghose, Saugata; Haratsch, Erich F.; Luo, Yixin; Mutlu, Onur

doi:10.48550/arxiv.1711.11427

Cited by 22 publications

(79 citation statements)

References 159 publications

(544 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work, we target the impact of power outage and high operating temperature on the reliability of I/O caches. Other important parameters such as SSD aging can affect the reliability of data in SSDs, which is partially reported in [22,[26][27][28][29][30][31].…”

Section: Related Workmentioning

confidence: 99%

Evaluating Reliability of SSD-Based I/O Caches in Enterprise Storage Systems

Ahmadian

Taheri

Asadi

2021

IEEE Trans. Emerg. Topics Comput.

View full text Add to dashboard Cite

I/O caching techniques are widely employed in enterprise storage systems in order to enhance performance of I/O intensive applications in large-scale data centers. Due to higher performance compared to Hard Disk Drives (HDDs) and lower price and nonvolatility compared to Dynamic Random-Access Memories (DRAM), Flash-based Solid-State Drives (SSDs) are used as a main media in the caching layer of storage systems. Although SSDs are known as non-volatile devices but recent studies have reported large number of data failures due to power outage in SSDs. To overcome the reliability implications of SSD-based I/O caching schemes, RAID-1 (mirrored) configuration is commonly used to avoid data loss due to uncommitted write operations. Such configuration, however, may still experience data loss in the cache layer due to correlated failures in SSDs. To our knowledge, none of previous studies have investigated the reliability of SSD-based I/O caching schemes in enterprise storage systems.In this paper, we present a comprehensive analysis investigating the reliability of SSD-based I/O caching architectures used in enterprise storage systems under power failure and high-operating temperature. We explore variety of SSDs from top vendors and investigate the cache reliability in mirrored configuration. To this end, we first develop a physical fault injection and failure detection platform and then investigate the impact of workload dependent parameters on the reliability of I/O cache in the presence of two common failure types in data centers, power outage and high temperature faults. We implement an I/O cache scheme using an open-source I/O cache module in Linux operating system. The experimental results obtained by conducting more than twenty thousand of physical fault injections on the implemented I/O cache with different write policies reveal that the failure rate of the I/O cache is significantly affected by workload dependent parameters. Our results show that unlike workload requests access pattern, the other workload dependent parameters such as request size, Working Set Size (WSS), and sequence of the accesses have considerable impact on the I/O cache failure rate. We observe a significant growth in the failure rate in the workloads by decreasing the size of the requests (by more than 14X). Furthermore, we observe that in addition to writes, the read accesses to the I/O cache are subjected to failure in presence of sudden power outage (the failure mainly occurs during promoting data to the cache). In addition, we observe that I/O cache experiences no data failure upon high temperature faults.

show abstract

Section: Related Workmentioning

confidence: 99%

Evaluating Reliability of SSD-Based I/O Caches in Enterprise Storage Systems

Ahmadian

Taheri

Asadi

2021

IEEE Trans. Emerg. Topics Comput.

View full text Add to dashboard Cite

show abstract

“…The basic idea of using a pre-decoder dedicated to the correction of simple configurations is also central in the design of a flash memory controller where a hard-decision belief propagation (BP) decoder is used as a pre-decoder and, in case of failure, multiple levels of soft-decision BP are performed [63]. However, the noise rate of flash cells is far more favorable than in quantum hardware, allowing for using a single decoding unit to correct many encoded blocks in flash memory.…”

Section: The Lazy Decoder As a Decoder Acceleratormentioning

confidence: 99%

“…Note that the execution time current flash BP decoders are far too long for the quantum setting if we suppose that the decoding must be implemented in dµs. (80µs for hard decision decoder + 80µs per level of soft-BP) [63].…”

Section: The Lazy Decoder As a Decoder Acceleratormentioning

confidence: 99%

Hierarchical decoding to reduce hardware requirements for quantum computing

Delfosse

2020

Preprint

View full text Add to dashboard Cite

“…Intelligent controllers are already in widespread use in another key part of a modern computing system. In solid-state drives (SSDs) consisting of NAND flash memory, the flash controllers that manage the SSDs are designed to incorporate a significant level of intelligence in order to improve both performance and reliability [143,144,145,146,147]. Modern flash controllers need to take into account a wide variety of issues such as remapping data, performing wear leveling to mitigate the limited lifetime of NAND flash memory devices, refreshing data based on the current wearout of each NAND flash cell, optimizing voltage levels to maximize memory lifetime, and enforcing fairness across different applications accessing the SSD.…”

Section: The Need For Intelligent Memory Controllers To Enhance Memor...mentioning

confidence: 99%

“…Modern flash controllers need to take into account a wide variety of issues such as remapping data, performing wear leveling to mitigate the limited lifetime of NAND flash memory devices, refreshing data based on the current wearout of each NAND flash cell, optimizing voltage levels to maximize memory lifetime, and enforcing fairness across different applications accessing the SSD. Much of the complexity in flash controllers is a result of mitigating issues related to the scaling of NAND flash memory [143,144,145,148,149]. We argue that in order to overcome scaling issues in DRAM, the time has come for DRAM memory controllers to also incorporate significant intelligence.…”

Section: The Need For Intelligent Memory Controllers To Enhance Memor...mentioning

confidence: 99%

Processing Data Where It Makes Sense: Enabling In-Memory Computation

Mutlu¹,

Ghose²,

Gómez-Luna³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory is already a key bottleneck as applications become more data-intensive and memory bandwidth and energy do not scale well, (2) energy consumption is a key constraint in especially mobile and server systems, (3) data movement is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today.At the same time, conventional memory technology is facing many scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic as well as the adoption of error correcting codes inside DRAM chips, and the necessity for designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend.In this work, we discuss some recent research that aims to practically enable computation close to data. After motivating trends in applications as well as technology, we discuss at least two promising directions for processingin-memory (PIM): (1) performing massively-parallel bulk operations in memory by exploiting the analog operational properties of DRAM, with low-cost changes, (2) exploiting the logic layer in 3D-stacked memory technology to accelerate important data-intensive applications. In both approaches, we describe and tackle relevant cross-layer research, design, and adoption challenges in devices, architecture, systems, and programming models. Our focus is on the development of in-memory processing designs that can be adopted in real computing platforms at low cost.

show abstract

Errors in Flash-Memory-Based Solid-State Drives: Analysis, Mitigation, and Recovery

Cited by 22 publications

References 159 publications

Evaluating Reliability of SSD-Based I/O Caches in Enterprise Storage Systems

Evaluating Reliability of SSD-Based I/O Caches in Enterprise Storage Systems

Hierarchical decoding to reduce hardware requirements for quantum computing

Processing Data Where It Makes Sense: Enabling In-Memory Computation

Contact Info

Product

Resources

About