Partial Connection-Aware Topology Synthesis for On-Chip Cascaded Crossbar Network

Jun, Minje; Woo, Deumji; Chung, Eui-Young

doi:10.1109/tc.2010.211

Cited by 15 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In future work, we plan to extend the work through the following approaches: other data-driven algorithms (e.g., graph neural network [38,39]) for abnormal frequency detection will be investigated and a performance comparison undertaken; different topologies (e.g., partially connected topology [40]) will be applied to the IoT network to model the connection relationship between devices and the gateway; anomaly detection for transmission frequency manipulation of multi-devices will be investigated, focusing on the case where the majority of devices are manipulated; anomaly detection will be investigated in the case when a device suffers attacks by multiple manipulations at the same time; anomaly detection with plug-and-play capacity will be developed for application in anomaly detection when there are new devices connecting to or disconnecting from the gateway.…”

Section: Discussionmentioning

confidence: 99%

Real-Time Anomaly Detection for an ADMM-Based Optimal Transmission Frequency Management System for IoT Devices

O’Connor

Bruton

et al. 2022

Sensors

View full text Add to dashboard Cite

In this paper, we investigate different scenarios of anomaly detection on decentralised Internet of Things (IoT) applications. Specifically, an anomaly detector is devised to detect different types of anomalies for an IoT data management system, based on the decentralised alternating direction method of multipliers (ADMM), which was proposed in our previous work. The anomaly detector only requires limited information from the IoT system, and can be operated using both a mathematical-rule-based approach and the deep learning approach proposed in the paper. Our experimental results show that detection based on mathematical approach is simple to implement, but it also comes with lower detection accuracy (78.88%). In contrast, the deep-learning-enabled approach can easily achieve a higher detection accuracy (96.28%) in the real world working environment.

show abstract

Section: Discussionmentioning

confidence: 99%

Real-Time Anomaly Detection for an ADMM-Based Optimal Transmission Frequency Management System for IoT Devices

O’Connor

Bruton

et al. 2022

Sensors

View full text Add to dashboard Cite

show abstract

“…One possible solution is to tailor multi-stage NoCs, i.e., butterfly and Clos, to GPUs by only providing communication paths between SMs and LLC slices. The idea of removing unused paths is similar to the previously proposed partial cascaded crossbar network [20]. Unfortunately, such solutions pose significant problems.…”

Section: Goalmentioning

confidence: 99%

CD-Xbar: A Converge-Diverge Crossbar Network for High-Performance GPUs

Zhao

Wang

et al. 2019

IEEE Trans. Comput.

View full text Add to dashboard Cite

Modern GPUs feature an increasing number of streaming multiprocessors (SMs) to boost system throughput. How to construct an efficient and scalable network-on-chip (NoC) for future high-performance GPUs is particularly critical. Although a mesh network is a widely used NoC topology in manycore CPUs for scalability and simplicity reasons, it is ill-suited to GPUs because of the many-to-few-to-many traffic pattern observed in GPU-compute workloads. Although a crossbar NoC is a natural fit, it does not scale to large SM counts while operating at high frequency. In this paper, we propose the converge-diverge crossbar (CD-Xbar) network with round-robin routing and topology-aware concurrent thread array (CTA) scheduling. CD-Xbar consists of two types of crossbars, a local crossbar and a global crossbar. A local crossbar converges input ports from the SMs into so-called converged ports; the global crossbar diverges these converged ports to the last-level cache (LLC) slices and memory controllers. CD-Xbar provides routing path diversity through the converged ports. Round-robin routing and topology-aware CTA scheduling balance network traffic among the converged ports within a local crossbar and across crossbars, respectively. Compared to a mesh with the same bisection bandwidth, CD-Xbar reduces NoC active silicon area and power consumption by 52.5% and 48.5%, respectively, while at the same time improving performance by 13.9% on average. CD-Xbar performs within 2.9% of an idealized fully-connected crossbar. We further demonstrate CD-Xbar's scalability, flexibility and improved performance per Watt (by 17.1%) over state-of-the-art GPU NoCs which are highly customized and non-scalable. Index Terms-graphics processing unit (GPU), network-on-chip (NoC), crossbar !

show abstract

“…While this approach could be useful for small-or medium-sized designs, as the system size grows, a single bus matrix interconnecting a large number of components may be prohibitively expensive and incur high latencies. In order to overcome the scalability issues of single-crossbar solutions, several approaches based on cascaded crossbars have been proposed Jun et al 2009Jun et al , 2012. Constraining the interconnect architecture to using only crossbars results in higher area costs, especially for FPGA-based implementations.…”

Section: Related Workmentioning

confidence: 99%

Exploiting Concurrency for the Automated Synthesis of MPSoC Interconnects

Cilardo

Fusella

Gallo

et al. 2015

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

University of Naples Federico IIMultiprocessor Systems-on-Chip (MPSoC) applications can rely today on a very large spectrum of interconnection topologies potentially meeting given communication requirements, determining various trade-offs between cost and performance. Building interconnects that enable concurrent communication tasks introduces decisive opportunities for reducing the overall communication latency. This work identifies three levels of parallelism at the interconnect level: global parallelism across different independent domains; local or intradomain parallelism, relying on inherently concurrent interconnect components such as crossbars; and interdomain parallelism, where multiple concurrent paths across different local domains are exploited. We propose an automated methodology to search the design space, aimed at maximizing the exploitation of these forms of parallelism. The approach also takes into consideration possible dependencies between communication tasks, which further constrains the design space, making the identification of a feasible solution more challenging. By jointly solving a scheduling and interconnect synthesis problem, the methodology turns the description of the application communication requirements, including data dependencies, into an on-chip synthesizable interconnection structure along with a communication schedule satisfying given area constraints. The article thoroughly describes the formalisms and the methodology used to derive such optimized heterogeneous topologies. It also discusses some case studies emphasizing the impact of the proposed approach and highlighting the essential differences with a few other solutions presented in the technical literature.

show abstract

Partial Connection-Aware Topology Synthesis for On-Chip Cascaded Crossbar Network

Cited by 15 publications

References 18 publications

Real-Time Anomaly Detection for an ADMM-Based Optimal Transmission Frequency Management System for IoT Devices

Real-Time Anomaly Detection for an ADMM-Based Optimal Transmission Frequency Management System for IoT Devices

CD-Xbar: A Converge-Diverge Crossbar Network for High-Performance GPUs

Exploiting Concurrency for the Automated Synthesis of MPSoC Interconnects

Contact Info

Product

Resources

About