The relevance of introducing optical interconnects (OI's) in monoprocessors and multiprocessors is studied from an architectural point of view. We show that perhaps the major explanation for why optical technologies have nearly been unable to penetrate into computers is that OI's generally do not shorten the memory-access time, which is the most critical issue for today's stored-program machines. In monoprocessors the memory-access time is dominated by the electronic latency of the memory itself. Thus implementing OI's inside the memory hierarchy without changing the memory architecture cannot dramatically improve the global performance. In strongly coupled multiprocessors the node-bypass latency dominates. Therefore the higher the connectivity (possibly with optics), the shorter the path to another node, but the more expensive the network and the more complex the structure of electronic nodes. This relation leaves the choice of the best network open in terms of simplicity and latency reduction. The bottlenecks resulting from and the benefits of implementing OI's are discussed with respect to symmetric multiprocessors, rings, and distributed shared-memory supercomputers.
This work aims at defining the marks that optoelectronic solutions will have to beat for replacing electric interconnects at chip level. We first simulate the electric response of future electrical interconnects considering the reduction of the CMOS feature size λ λ λ λ from 0.7 to 0.05 µ µ µ µm. We also consider the architectural evolution of chips to analyze the latency issues. We conclude that: 1) It does not seem necessary in the future chips to consider the integration of optical interconnects (OI) over distances shorter than 1000-2000 λ, λ, λ, λ, because the performance of electric intercomnects is sufficient. 2) The penetration of OIs over distances longer than 10 4 λ λ λ λ could be envisaged (on the sole basis of the performance limitation) provided that it will be possible to demonstrate new generations of (cheap and CMOS-compatible) low-threshold high-efficiency VCSELs and ultra-fast high-efficiency photodiodes. 3) The first possible application of onchip OIs is likely not for inter-block communication but for clock distribution as the energy constraints (imposed by the evolution of CMOS technology) are weaker and because the clock tree is an extremely long interconnect.
We start with a detailed analysis of the communication issues in today's symmetric multiprocessor (SMP) architectures to study the benefits of implementing optical interconnects (OI) in these machines. We show that the transmission of block addresses is the most critical communication bottleneck of future large SMPs owing to the need to preserve the coherence of data duplicated in caches. An address transmission bandwidth as high as 200-300 Gb/s may be necessary in ten years from now; this requirement will represent a difficult challenge for shared electric buses. In this context we suggest the introduction of simple point-to-point OIs for a SMP cache-coherent switch, i.e., for a VLSI switch that would emulate the shared-bus function. The operation might require as much as 10,000 input-outputs (IOs) to connect 100 processors, particularly if one maintains the present parallelism of transmissions to preserve a large bandwidth and a short memory access latency. The interest for OIs comes from the potential increase of the transmission frequency and from the possible integration of such a high density of IOs on top of electronic chips to overcome packaging issues. Then we consider the implementation of an optical bus that is a multipoint optical line involving more optical technology. This solution allows multiple simultaneous accesses to the bus, but the preservation of the coherence of caches can no longer be maintained with the usual fast snooping protocols.
This study has been carried out in order to determine cost-effective conjiguratiom of functional units for multiple-issue out-oforder superscalar processors. The trace-driven simulations were pe@ormed on the six integer and the fourteen floating-point programs from the SPEC 92 suite. We first evaluate the number of instructions allowed to be concurrently processed by the execution stages of the pipeline. We then opply some restrictions on the execution issue of different instruction classes in order to define these configurations.We conclude that jive to nitw functional units are necessary to exploit Instruction-LevelParallelism.An important point is that several data cache ports are required in a processor of degree 4 or more. Finally, we report on complementary results on the utilization rate of the functional units.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.