An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks

Kahira, Albert Njoroge; Nguyen, Truong Thao; Bautista-Gomez, Leonardo; Takano, Ryousei; Badía, Rosa M.; Wahib, Mohamed

doi:10.1145/3431379.3460644

Cited by 6 publications

(1 citation statement)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This methodology assesses Oracle using six parallelization algorithms, four CNN models, and different datasets (2D and 3D) on up to 1024 GPUs. Compared to empirical results, the Oracle tool has an average accuracy of roughly 86.74% and data parallelism accuracy of up to 97.57% [25]. However, GPU processing performance and training throughput are severely limited because of the excessive memory consumption mentioned before.…”

Section: Related Workmentioning

confidence: 95%

Comparative Study on Distributed Lightweight Deep Learning Models for Road Pothole Detection

Tahir

Jung

2023

Sensors

View full text Add to dashboard Cite

This paper delves into image detection based on distributed deep-learning techniques for intelligent traffic systems or self-driving cars. The accuracy and precision of neural networks deployed on edge devices (e.g., CCTV (closed-circuit television) for road surveillance) with small datasets may be compromised, leading to the misjudgment of targets. To address this challenge, TensorFlow and PyTorch were used to initialize various distributed model parallel and data parallel techniques. Despite the success of these techniques, communication constraints were observed along with certain speed issues. As a result, a hybrid pipeline was proposed, combining both dataset and model distribution through an all-reduced algorithm and NVlinks to prevent miscommunication among gradients. The proposed approach was tested on both an edge cluster and Google cluster environment, demonstrating superior performance compared to other test settings, with the quality of the bounding box detection system meeting expectations with increased reliability. Performance metrics, including total training time, images/second, cross-entropy loss, and total loss against the number of the epoch, were evaluated, revealing a robust competition between TensorFlow and PyTorch. The PyTorch environment’s hybrid pipeline outperformed other test settings.

show abstract

Section: Related Workmentioning

confidence: 95%