2012
DOI: 10.1147/jrd.2011.2177106
|View full text |Cite
|
Sign up to set email alerts
|

IBM zEnterprise redundant array of independent memory subsystem

Abstract: The IBM zEnterprise A system introduced a new and innovative redundant array of independent memory (RAIM) subsystem design as a standard feature on all zEnterprise servers. It protects the server from single-channel errors such as sudden control, bus, buffer, and massive dynamic RAM (DRAM) failures, thus achieving the highest System z A memory availability. This system also introduced innovations such as DRAM and channel marking, as well as a novel dynamic cyclic redundancy code channel marking. This paper des… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(31 citation statements)
references
References 10 publications
0
31
0
Order By: Relevance
“…Chipkill improves reliability by interleaving error detection and correction data among multiple DRAM chips [10]. RAIM [11] [15,22,38]) have shown that the OS retiring memory pages after a certain number of errors can eliminate up to 96.8% of detected memory errors. These techniques, though they improve system reliability, still require costly ECC hardware for detecting and identifying memory pages with errors.…”
Section: B Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Chipkill improves reliability by interleaving error detection and correction data among multiple DRAM chips [10]. RAIM [11] [15,22,38]) have shown that the OS retiring memory pages after a certain number of errors can eliminate up to 96.8% of detected memory errors. These techniques, though they improve system reliability, still require costly ECC hardware for detecting and identifying memory pages with errors.…”
Section: B Related Workmentioning
confidence: 99%
“…In terms of performance, existing error detection and correction techniques incur a slowdown on each memory access due to their additional circuitry [15,16] [10] 2/8 chips (1/8 chips) 12.5% High RAIM [11] 1/5 modules (1/5 modules) 40.6% High Mirroring [12] 2/8 chips (1/2 modules) 125% Low an additional 10% slowdown due to techniques that operate DRAM at a slower speed to reduce the chances of random bit flips due to electrical interference in higher-density devices that pack more and more cells per square nanometer [17]. In addition, whenever an error is detected or corrected on modern hardware, the processor raises an interrupt that must be serviced by the system firmware (BIOS), incurring up to 100 µs latency-roughly 2000× a typical 50 ns memory access latency [18]-leading to unpredictable slowdowns.…”
Section: Introductionmentioning
confidence: 99%
“…inline memory modules (DIMMs) for 3 RAIM [11] protected memory ports, 2 GX++ I/O links, and 5 PCIe x16 Gen3 I/O links. Up to four processor drawers are plugged into a frame, interconnected by passive electric cables that form the off-drawer ABus network.…”
Section: System Topologymentioning
confidence: 99%
“…The primary developments have been Chipkill [20], SDDC [21], Chipspare [22], and a redundant array of independent memory (RAIM) [23]. The first three implementations are adequately similar to be discussed as one advanced ECC method.…”
Section: Overview Of Error Correctionmentioning
confidence: 99%
“…This interleaving can be performed very quickly in hardware along with the (72, 64) ECC logic, and thus adds negligible latency as compared to a standard ECC protocol. The overhead of an RAIM can also be negligible, due to the use of the advanced ECC techniques detailed above; however, the correction of some hardware errors may incur many microseconds of overhead [23].…”
Section: Overview Of Error Correctionmentioning
confidence: 99%