Semi-dynamic load balancing

Chen, Chen; Weng, Qizhen; Wang, Wei; Li, Baochun; Li, Bo

doi:10.1145/3419111.3421299

Cited by 20 publications

(2 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This generalpurpose data partitioning framework performs accurate and efficient benchmarking to obtain the relative speed of the resources that constitute the cluster, providing the load measurements for each element that optimize execution time. Also, other criteria could be taken into account to determine these measurements [5]. In this particular case, the resource speed is used to define the heterogeneity of the platform, as explained in the following Sect.…”

Section: Hetgrad Optimization Methodologymentioning

confidence: 99%

See 1 more Smart Citation

Heterogeneous gradient computing optimization for scalable deep neural networks

et al. 2022

View full text Add to dashboard Cite

Nowadays, data processing applications based on neural networks cope with the growth in the amount of data to be processed and with the increase in both the depth and complexity of the neural networks architectures, and hence in the number of parameters to be learned. High-performance computing platforms are provided with fast computing resources, including multi-core processors and graphical processing units, to manage such computational burden of deep neural network applications. A common optimization technique is to distribute the workload between the processes deployed on the resources of the platform. This approach is known as data-parallelism. Each process, known as replica, trains its own copy of the model on a disjoint data partition. Nevertheless, the heterogeneity of the computational resources composing the platform requires to unevenly distribute the workload between the replicas according to its computational capabilities, to optimize the overall execution performance. Since the amount of data to be processed is different in each replica, the influence of the gradients computed by the replicas in the global parameter updating should be different. This work proposes a modification of the gradient computation method that considers the different speeds of the replicas, and hence, its amount of data assigned. The experimental results have been conducted on heterogeneous high-performance computing platforms for a wide range of models and datasets, showing an improvement in the final accuracy with respect to current techniques, with a comparable performance.

show abstract

Section: Hetgrad Optimization Methodologymentioning

confidence: 99%

“…Regarding workload distribution in data-parallelism scheme, a dynamic workload distribution scheme is proposed in [5], to adapt the assigned batch size to each replica in every iteration. A recurrent neural network (RNN) is used in order to measure the speed of each replica.…”

mentioning

confidence: 99%