“…The value of n can be determined as the maximum of the delays of stages 1 D and 1 T divided by the maximum of the delay of stages 2D and 2T. For 8,16 and 32 subarrays the resulting cycle time will be denoted by opt.sa8, opt.sa16 and opt.sa32 respectively. We define the term speedup as {non-pipelined cache cycle time} / {pipelined cache cyele time}.…”
Section: Application Of the Derived Model And Discussionmentioning
confidence: 99%
“…If it is profitable to increase the pipeline depth, it is necessary to break the decoder in more stages. For example, in [16] a deeply pipelined architecture with a hierarchical design of the decoder is presented. Since breaking the decoder in smaller stages is implementation specific, we chose not to model any of the possibilities.…”
Abstract:The access time of the first level on-chip cache usually imposes the cyc1e time of high-perfonnance VLSI processors. The only way to reduce the effect of cache access time on processor cycle time is the use of pipelined caches. A timing model for on-chip caches has recently been presented in [1]. In this paper the timing model given in [1] is extended so as pipelined caches can be handled. Also the possible pipelined architectures of a cache memory are investigated. The speedup of the pipelined cache against the non-pipelined one is examined as a function of the pipeline depth, the organization and the physical implementation parameters.
“…The value of n can be determined as the maximum of the delays of stages 1 D and 1 T divided by the maximum of the delay of stages 2D and 2T. For 8,16 and 32 subarrays the resulting cycle time will be denoted by opt.sa8, opt.sa16 and opt.sa32 respectively. We define the term speedup as {non-pipelined cache cycle time} / {pipelined cache cyele time}.…”
Section: Application Of the Derived Model And Discussionmentioning
confidence: 99%
“…If it is profitable to increase the pipeline depth, it is necessary to break the decoder in more stages. For example, in [16] a deeply pipelined architecture with a hierarchical design of the decoder is presented. Since breaking the decoder in smaller stages is implementation specific, we chose not to model any of the possibilities.…”
Abstract:The access time of the first level on-chip cache usually imposes the cyc1e time of high-perfonnance VLSI processors. The only way to reduce the effect of cache access time on processor cycle time is the use of pipelined caches. A timing model for on-chip caches has recently been presented in [1]. In this paper the timing model given in [1] is extended so as pipelined caches can be handled. Also the possible pipelined architectures of a cache memory are investigated. The speedup of the pipelined cache against the non-pipelined one is examined as a function of the pipeline depth, the organization and the physical implementation parameters.
“…Equations (3)(4)(5) summarize the following energy distributions of the AND-NOR 4-to-16 decoder: (3) characterizes when a word line is reselected by the decoder, (4) describes when a different word line with equal MSBs is selected and the previously selected world line must be discharged, and (5) summarizes when a different word line with unequal MSBs is selected and the previously selected word line must be discharged. The selected word line (E s ), reselected word line (E rs ), and discharged word line (E dchg ) dissipate the most energy of the decoder's word lines.…”
Section: ) Conventionalmentioning
confidence: 99%
“…In order to simulate loading on the word lines, inverters with four times the minimum p-and n-type transistor widths were added on each word line. This simulates sufficient loading in order to drive a pipeline register as used in high-performance pipelined memories [5,6].…”
Abstract-Two novel memory decoder designs for reducing energy consumption and delay are presented in this paper. These two decoding schemes are compared to the conventional NOR decoder. Fewer word lines are charged and discharged by the proposed schemes which leads to less energy dissipation. Energy, delay, and area calculations are provided for all three designs under analysis. The two novel decoder schemes range from dissipating 3.9% to 23.6% of the energy dissipated by the conventional decoder. The delays of these designs are 80.8% of the conventional decoder delay. Simulations of the three decoders are performed using a 90nm CMOS technology.
“…The architecture in [3] uses a pipelined tree decoding system with latches between each level of the hierarchy. The latency is very low in this architecture however much of the elegance is lost when the pyramid is flattened onto a two-dimensional VLSI floorplan.…”
This paper proposes a scalable memo y architecture that maintains a high data rate independent of address sequence and memo ry size. It is suitable for applications where throughput is of primary importance and access latency as tolerable. A rectangular array of memory blocks is pipelined to build a memory with an operating frequency determined only by the access time of a single block. This is independent of the number of blocks because address and data communication is localized t o adjacent memory blocks. Rather than sacn j c i n g speed for memo ry size, the new approach scales tu provide high throughput random access memories of v r y large size with some increase in latency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.