This paper presents a DRAM architecture that improves the DRAM performance/power trade-off to increase their usability on low power chip design using 3D interconnect technology. The use of a finer matrix subdivision and buffering the bitline signal at the localblock level allows to reduce both the energy per access and the access time. The obtained performances match those of a typical low power SRAM, while achieving a significant area and static power reduction compared to these memories.The 128 kb memory architecture proposed here achieves an access time of 1.3 ns for a dynamic energy of less than 0.2 pJ per bit. A localized refresh mechanism allows gaining a factor of 10 in static power consumption associated with the cell, and a factor of 2 in area, when compared with an equivalent SRAM.
I. CONTEXTAs feature size reduces, on-chip memory design is becoming more and more challenging. Reducing the typical dimensions and the supply voltage for SRAM memories degrades the cell stability [1]. The stability is degraded further by intradie variations which lead in addition to increased average power consumption. Several solutions have been investigated to reduce this issue, from changing the cell topology [2] [3] [4] to modifying the peripheral architecture [5]. However, these solutions increase the memory area and thus compromise scaling. Embedded DRAM (eDRAM) has been proposed for large memory arrays. eDRAM clock speed and access time have been improved to match the SRAM typical behavior [6]. However, using eDRAM requires to integrate more dense capacitors in the logic technology process, and thus needs costly additional process steps.3D interconnect enables the use of heterogeneous technologies on the same chip. 3D vias are typically smaller and have less parasitic capacitance than off-chip connections [7]. In addition, they can be spread across the chip. This reduces the routing energy, and increases the number of available connections between two stacked dies.These advantages allow to provide a better bandwidthenergy trade off for the routing between two stacked dies than between two packaged dies. A possible application of 3D interconnect is to separate the logic core of a system from the Fig. 1. Global architecture -WL/BL subdivision Local_Address Block_address Global_SA Mux GBL data_out LWL receiver Local SA 32x32 cells x16 x16 GWL memory it requires. Such systems have already been studied in [8] [9], with stacks of an SRAM matrix on top of a logic layer. It is also possible to stack DRAM on top of a logic layer.This solution offers numerous other advantages compared to packaged DRAM, including simpler inputs/outputs protocol, and can solve the terminations and clock synchronisation issues by using shorter connections. This allows using conventional DRAM instead of SRAM or embedded DRAM for the largest memories in SOC, bringing a higher density compared to SRAM, without the need to integrate dedicated capacitors in the logic process, as for eDRAM.However, traditional DRAM is outperformed by SRAM in several dom...