TABLE OF CONTENTS
Title Page i
Authorization Page ii
Signature Page iiiAcknowledgments iv Table of Contents v
List of Figures ix
List of Tables xiiiAbstract xiv Therefore, fast and accurate analytical models for NoC-based multicore performance evaluation are strongly desired to better explore the design space. For this purpose, we propose a machine learning based latency regression model to evaluate the NoC designs with respect to different configurations before the system is built or taped-out. Then, for high performance NoC designs, we tackle one of the most important problems, i.e., the routing algorithms design with different design constraints and objectives. For avoiding temperature hotspots, a thermal-aware routing algorithm is proposed to achieve an even temperature profile for application-specific Network-on-chips (NoCs). For improving the reliability, a routing algorithm to achieve maximum performance under fault is proposed. Finally, in the architecture level, we propose two new NoC structures using bi-directional links for the performance optimization. In particular, we propose a flit-level speedup scheme to enhance the network-on-chip(NoC) performance utilizing bidirectional channels. In addition to the traditional efforts on allowing flits of different packets using the idling internal and external bandwidth of the bidirectional channel, our proposed flit-level speedup scheme also allows flits within the same packet to be transmitted simultaneously on the bi-directional channel.We also propose a flexible NoC architecture which takes advantage of a dynamic distributed routing algorithm and improves the NoC communication performance with minimal energy overhead. This proposed NoC architecture exploits the selfreconfigurable bidirectional channels to increase the effective bandwidth and uses express virtual paths, as well as localized hub routers, to bypass some intermediate nodes at run time in the network. From the simulation results on both synthetic traffic and real workload traces, significantly performance improvement in terms of latency and throughput can be achieved.