2021
DOI: 10.1364/jocn.412360
|View full text |Cite
|
Sign up to set email alerts
|

Machine-learning-aided cognitive reconfiguration for flexible-bandwidth HPC and data center networks [Invited]

Abstract: This paper proposes a machine-learning (ML)-aided cognitive approach for effective bandwidth reconfiguration in optically interconnected datacenter/high-performance computing (HPC) systems. The proposed approach relies on a Hyper-X-like architecture augmented with flexible-bandwidth photonic interconnections at large scales using a hierarchical intra/inter-POD photonic switching layout. We first formulate the problem of the connectivity graph and routing scheme optimization as a mixed-integer linear programmin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 15 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…The upper bound of Alizadeh Web Search distribution [2] was used to determine the sizes of flows. To emulate various traffic patterns, we used both synthetic traffic matrices and real HPC application traces, i.e., algebraic multi-grid (AMG), center for exascale simulation of advanced reactors (CESAR) and FFT [6,7]. The two synthetic traffic matrices were generated by randomly selecting 30% and 50% of the ToR pairs to generate demands following a uniform distribution.…”
Section: Performance Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…The upper bound of Alizadeh Web Search distribution [2] was used to determine the sizes of flows. To emulate various traffic patterns, we used both synthetic traffic matrices and real HPC application traces, i.e., algebraic multi-grid (AMG), center for exascale simulation of advanced reactors (CESAR) and FFT [6,7]. The two synthetic traffic matrices were generated by randomly selecting 30% and 50% of the ToR pairs to generate demands following a uniform distribution.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…However, such an approach cannot scale easily and has been demonstrated only in small-scale DC systems. Our previous work in [6] proposed a cognitive reconfiguration policy relying on performance (i.e., latency, packet loss rate) estimations by deep neural network (DNN) models. Nevertheless, training the DNN models requires collecting a large amount of performance data, introducing non-negligible operation overheads.…”
mentioning
confidence: 99%