We study collaborative machine learning (ML) at the wireless edge, where power and bandwidth-limited wireless devices with local datasets carry out distributed stochastic gradient descent (DSGD) with the help of a remote parameter server (PS). Standard approaches assume separate computation and communication, where local gradient estimates are compressed and communicated to the PS over orthogonal links. Following this digital approach, we introduce D-DSGD, in which the wireless terminals, referred to as the workers, employ gradient quantization and error accumulation, and transmit their gradient estimates to the PS over the underlying wireless multiple access channel (MAC).We then introduce an analog scheme, called A-DSGD, which exploits the additive nature of the wireless MAC for over-the-air gradient computation. In A-DSGD, the workers first sparsify their gradient estimates, and then project them to a lower dimensional space imposed by the available channel bandwidth. These projections are transmitted directly over the MAC without employing any digital code. Numerical results show that A-DSGD converges much faster than D-DSGD thanks to its more efficient use of the limited bandwidth and the natural alignment of the gradient estimates over the channel. The improvement is particularly compelling at low power and low bandwidth regimes. We also observe that the performance of A-DSGD improves with the number of workers (keeping the total size of the dataset constant), while D-DSGD deteriorates, limiting the ability of the latter in harnessing the computation power of edge devices. The lack of quantization and channel encoding/decoding in A-DSGD further speeds up communication, making it very attractive for low-latency ML applications at the wireless network edge. 1 |Bm,t| u n ∈Bm,t ∇f (θ t , u n ) is the stochastic gradient of the current model computed at worker m, m ∈ [M ], using the
We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS).We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and implement distributed stochastic gradient descent (DSGD) over-the-air. We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error from previous iterations, and project the resultant sparse vector into a low-dimensional vector. We also design a power allocation scheme to align the received gradient vectors at the PS in an efficient manner. Numerical results show that the proposed CA-DSGD algorithm converges much faster than the D-DSGD scheme and other schemes in the literature, while providing a significantly higher accuracy.where M denotes the number of wireless devices, and g m (θ t )In FL, each device participating in the training can also carry out model updates as in (3) locally, and share the overall difference with respect to the previous model parameters with the PS [1].What distinguishes FL from conventional ML is the large number of devices that participate in the training, and the low-capacity and unreliable links that connect these devices to the PS. Therefore, there have been significant research efforts to reduce the communication requirements in FL [1]- [24]. However, these and follow-up studies consider orthogonal channels from the participating devices to the PS, and ignore the physical layer aspects of wireless connections, even though FL has been mainly motivated for mobile devices.
We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better longterm performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is noni.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution.
We study federated edge learning (FEEL), where wireless edge devices, each with its own dataset, learn a global model collaboratively with the help of a wireless access point acting as the parameter server (PS). At each iteration, wireless devices perform local updates using their local data and the most recent global model received from the PS, and send their local updates to the PS over a wireless fading multiple access channel (MAC). The PS then updates the global model according to the signal received over the wireless MAC, and shares it with the devices. Motivated by the additive nature of the wireless MAC, we propose an analog 'over-the-air' aggregation scheme, in which the devices transmit their local updates in an uncoded fashion.However, unlike recent literature on over-the-air FEEL, here we assume that the devices do not have channel state information (CSI), while the PS has imperfect CSI. On the other hand, the PS is equipped with multiple antennas to alleviate the destructive effect of the channel, exacerbated due to the lack of perfect CSI. We design a receive beamforming scheme at the PS, and show that it can compensate for the lack of perfect CSI when the PS has a sufficient number of antennas. We also derive the convergence rate of the proposed algorithm highlighting the impact of the lack of perfect CSI, as well as the number of PS antennas. Both the experimental results and the convergence analysis illustrate the performance improvement of the proposed algorithm with the number of PS antennas, where the wireless fading MAC becomes deterministic despite the lack of perfect CSI when the PS has a sufficiently large number of antennas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.