3D NAND flash memory with advanced multi-level cell techniques provides high storage density, but suffers from significant performance degradation due to a large number of read-retry operations. Although the read-retry mechanism is essential to ensuring the reliability of modern NAND flash memory, it can significantly increase the read latency of an SSD by introducing multiple retry steps that read the target page again with adjusted read-reference voltage values. Through a detailed analysis of the read mechanism and rigorous characterization of 160 real 3D NAND flash memory chips, we find new opportunities to reduce the read-retry latency by exploiting two advanced features widely adopted in modern NAND flash-based SSDs: 1) the CACHE READ command and 2) strong ECC engine. First, we can reduce the read-retry latency using the advanced CACHE READ command that allows a NAND flash chip to perform consecutive reads in a pipelined manner. Second, there exists a large ECC-capability margin in the final retry step that can be used for reducing the chip-level read latency. Based on our new findings, we develop two new techniques that effectively reduce the read-retry latency: 1) Pipelined Read-Retry (PR 2 ) and 2) Adaptive Read-Retry (AR 2 ). PR 2 reduces the latency of a read-retry operation by pipelining consecutive retry steps using the CACHE READ command. AR 2 shortens the latency of each retry step by dynamically reducing the chip-level read latency depending on the current operating conditions that determine the ECC-capability margin. Our evaluation using twelve real-world workloads shows that our proposal improves SSD response time by up to 31.5% (17% on average) over a state-of-the-art baseline with only small changes to the SSD controller. CCS CONCEPTS• Hardware → External storage.
A recent ultra-large SSD (e.g., a 32-TB SSD) provides many benefits in building cost-efficient enterprise storage systems. Owing to its large capacity, however, when such SSDs fail in a RAID storage system, a long rebuild overhead is inevitable for RAID reconstruction that requires a huge amount of data copies among SSDs. Motivated by modern SSD failure characteristics, we propose a new recovery scheme, called reparo , for a RAID storage system with ultra-large SSDs. Unlike existing RAID recovery schemes, reparo repairs a failed SSD at the NAND die granularity without replacing it with a new SSD, thus avoiding most of the inter-SSD data copies during a RAID recovery step. When a NAND die of an SSD fails, reparo exploits a multi-core processor of the SSD controller in identifying failed LBAs from the failed NAND die and recovering data from the failed LBAs. Furthermore, reparo ensures no negative post-recovery impact on the performance and lifetime of the repaired SSD. Experimental results using 32-TB enterprise SSDs show that reparo can recover from a NAND die failure about 57 times faster than the existing rebuild method while little degradation on the SSD performance and lifetime is observed after recovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.