Empirical Analysis of the I/O Characteristics of a Highly Integrated Many-Core Processor

Lee, Cheongjun; Lee, John J.; Koo, Donghun; Kim, Chungyong; Bang, Jiwoo; Byun, Eun-Kyu; Eom, Hyeonsang

doi:10.1109/acsos-c51401.2020.00020

Cited by 1 publication

(1 citation statement)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2 shows the KNL diagram. A socket of KNL can have up to 36 active tiles, and each tile consists of two cores [34][35][36]. Thus, a Knights Landing socket can have up to 72 cores.…”

Section: Many-core Cpu: Knights Landingmentioning

confidence: 99%

Empirical Performance Analysis of Collective Communication for Distributed Deep Learning in a Many-Core CPU Environment

2020

View full text Add to dashboard Cite

To accommodate lots of training data and complex training models, “distributed” deep learning training has become employed more and more frequently. However, communication bottlenecks between distributed systems lead to poor performance of distributed deep learning training. In this study, we proposed a new collective communication method in a Python environment by utilizing Multi-Channel Dynamic Random Access Memory (MCDRAM) in Intel Xeon Phi Knights Landing processors. Major deep learning software platforms, such as TensorFlow and PyTorch, offer Python as a main development language, so we developed an efficient communication library by adapting Memkind library, which is a C-based Message Passing Interface (MPI) library to utilize high-performance memory MCDRAM. For performance evaluation, we tested the popular collective communication methods in distributed deep learning, such as Broadcast, Gather, and AllReduce. We conducted experiments to analyze the effect of high-performance memory and processor location on communication performance. In addition, we analyze performance in a Docker environment for further relevance given the recent major trend of Cloud computing. By extensive experiments in our testbed, we confirmed that the communication in our proposed method showed performance improvement by up to 487%.

show abstract

“…Figure 2 shows the KNL diagram. A socket of KNL can have up to 36 active tiles, and each tile consists of two cores [34][35][36]. Thus, a Knights Landing socket can have up to 72 cores.…”

Section: Many-core Cpu: Knights Landingmentioning

confidence: 99%