A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors

Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, L.; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chi; Zhang, Lei

doi:10.3390/s17102172

Cited by 10 publications

(2 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3) Bulk Synchronous Parallel: In distributed computing systems, each computing node has different computing power than other nodes in the system due to real-world environment. For this reason, the distributed ML training uses an iteration to coordinate the synchronization between all computer nodes [12]. In the synchronous update known as bulk synchronous parallel (BSP) [13], the replicas submit the gradients after locally training process at every iteration or mini-batch to global model parameters or to other replicas.…”

Section: A Artificial Neural Networkmentioning

confidence: 99%

Performance Analysis and Comparison of Distributed Machine Learning Systems

Alqahtani¹,

Demirbaş²

2019

Preprint

View full text Add to dashboard Cite

Deep learning has permeated through many aspects of computing/processing systems in recent years. While distributed training architectures/frameworks are adopted for training large deep learning models quickly, there has not been a systematic study of the communication bottlenecks of these architectures and their effects on the computation cycle time and scalability. In order to analyze this problem for synchronous Stochastic Gradient Descent (SGD) training of deep learning models, we developed a performance model of computation time and communication latency under three different system architectures: Parameter Server (PS), peer-to-peer (P2P), and Ring allreduce (RA). To complement and corroborate our analytical models with quantitative results, we evaluated the computation and communication performance of these system architectures of the systems via experiments performed with Tensorflow and Horovod frameworks.We found that the system architecture has a very significant effect on the performance of training. RA-based systems achieve scalable performance as they successfully decouple network usage from the number of workers in the system. In contrast, 1PS systems suffer from low performance due to network congestion at the parameter server side. While P2P systems fare better than 1PS systems, they still suffer from significant network bottleneck. Finally, RA systems also excel by virtue of overlapping computation time and communication time, which PS and P2P architectures fail to achieve.

show abstract

Section: A Artificial Neural Networkmentioning

confidence: 99%

Performance Analysis and Comparison of Distributed Machine Learning Systems

Alqahtani¹,

Demirbaş²

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…If the worker node does not reach the stale threshold during the training process, the accuracy cannot be guaranteed. Our previous work improved it and proposed a dynamic synchronous parallel model based on dynamic finite fault tolerance [41], [42]. The DSP adds an optional condition for entering the synchronization barrier.…”

Section: Consistency Modelmentioning

confidence: 99%

Parameter Communication Consistency Model for Large-Scale Security Monitoring Based on Mobile Computing

et al. 2019

Self Cite

View full text Add to dashboard Cite

With the application of mobile computing in the security field, security monitoring big data has also begun to emerge, providing favorable support for smart city construction and city-scale and investment expansion. Mobile computing takes full advantage of the computing power and communication capabilities of various sensing devices and uses these devices to form a computing cluster. When using such clusters for training of distributed machine learning models, the load imbalance and network transmission delay result in low efficiency of model training. Therefore, this paper proposes a distributed machine learning parameter communication consistency model based on the parameter server idea, which is called the limited synchronous parallel model. The model is based on the fault-tolerant characteristics of the machine learning algorithm, and it dynamically limits the size of the synchronization barrier of the parameter server, reduces the synchronization communication overhead, and ensures the accuracy of the model training; thus, the model realizes finite asynchronous calculation between the worker nodes and gives full play to the overall performance of the cluster. The implementation of cluster dynamic load balancing experiments shows that the model can fully utilize the cluster performance during the training of distributed machine learning models to ensure the accuracy of the model and improve the training speed. INDEX TERMS Mobile computing, security monitoring, distributed machine learning, limited synchronous parallel model, parameter server.

show abstract

Distributed machine learning load balancing strategy in cloud computing services

Zhang

Wan

et al. 2019

Wireless Netw

View full text Add to dashboard Cite

A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors

Cited by 10 publications

References 29 publications

Performance Analysis and Comparison of Distributed Machine Learning Systems

Performance Analysis and Comparison of Distributed Machine Learning Systems

Parameter Communication Consistency Model for Large-Scale Security Monitoring Based on Mobile Computing

Distributed machine learning load balancing strategy in cloud computing services

Contact Info

Product

Resources

About