In the development of 3D graphic systems for higher resolution and more realistic modeling and rendering, graphic memories also have been playing a critical role to offer the required high bandwidth. Currently, GDDR5 SDRAM's provide with 7Gbps per pin speed [1], reaching their physical limit originated from single-ended signaling nature: noise in reference voltage and power, and channel crosstalk. Especially, the channel crosstalk takes a dominating portion in 7Gbps timing budget, becoming the main barrier for further speed improvement. Although there has been research on crosstalk canceller in memory interface [2], it imposed stringent restrictions on signal ordering and trace length in PCB and package routing, and had limited performance. Therefore, improving the efficiency of DRAM core draws more attention than pin bandwidth now.A GDDR5 SDRAM supports quad-rate data transmission with burst length 8. Thus, at 7Gbps per pin speed, the core cycling time is only 1.14ns unless bank grouping is used. Bank grouping relaxes this timing constraint by prohibiting consecutive accesses to the banks in a bank group. However, this restriction inevitably degrades the graphic system performance by limiting the degree of freedom in memory access. In this work, 7Gbps operation without bank group restriction was achieved by regular calibration of IO sense amplifiers for highest performance and skewed control logic. The DRAM core efficiency was further improved by reducing the bank-to-bank active time (tRRD) to 2.5ns. While there is physical limit in reducing read/write latency due to the timing required for read/write operation, there is no such a limit in tRRD. With sufficiently short tRRD, a memory controller can activate any banks during active to read/write command delay (tRCDRD/tRCDWR), then read or write on different banks seamlessly, achieving high core efficiency.Figure 24.1.1 shows the top block diagram and data path structure. The control signal lines in COLDEC and WDATA lines were designed as skewed logic which means larger delay in the signal if the corresponding bit line is farther from the chip center area written as IO PAD. By allowing skews in the control logic, every bit line can have same maximized CSL pulse width. The timing skews between IO sense amplifier input data are removed by sampling all data by one FRP signal.For high speed core operation, IO sense amplifiers were designed as current sense amplifiers which offer higher operation speed than voltage sense amplifiers [3]. To take full advantage of this amplifier, a replica impedance monitor shown in Fig. 24.1.2 was placed on each IOCNT area. Assuming that the loads at the top are large enough, the input resistance of an IO sense amplifier is described as following equation:When R IN = 0, which means g mMN = g mMP , the input differential current is maximized, and sensing time is minimized. However, it is impossible to keep every chip in this condition due to process and temperature variation. A replica impedance monitor solves this problem by regular cal...