Machine Learning (ML), and Deep Learning (DL) in particular, play a vital role in providing smart services to the industry. These techniques however suffer from privacy and security concerns since data is collected from clients and then stored and processed at a central location. Federated Learning (FL), an architecture in which model parameters are exchanged instead of client data, has been proposed as a solution to these concerns. Nevertheless, FL trains a global model by communicating with clients over communication rounds, which introduces more traffic on the network and increases the convergence time to the target accuracy. In this work, we solve the problem of optimizing accuracy in stateful FL with a budgeted number of candidate clients by selecting the best candidate clients in terms of test accuracy to participate in the training process. Next, we propose an online stateful FL heuristic to find the best candidate clients. Additionally, we propose an IoT client alarm application that utilizes the proposed heuristic in training a stateful FL global model based on IoT device type classification to alert clients about unauthorized IoT devices in their environment. To test the efficiency of the proposed online heuristic, we conduct several experiments using a real dataset and compare the results against state-of-the-art algorithms. Our results indicate that the proposed heuristic outperforms the online random algorithm with up to 27% gain in accuracy. Additionally, the performance of the proposed online heuristic is comparable to the performance of the best offline algorithm.
In response to various privacy risks, researchers and practitioners have been exploring different paradigms that can leverage the increased computational capabilities of consumer devices to train machine (ML) learning models in a distributed fashion without requiring the uploading of the training data from individual devices to central facilities. For this purpose, federated learning (FL) was proposed as a technique that can learn a global machine model at a central master node by the aggregation of models trained locally using private data. However, organizations may be reluctant to train models locally and to share these local ML models due to required computational resources for model training at their end and due to privacy risks that may result from adversaries inverting these models to infer information about the private training data. Incentive mechanisms have been proposed to motivate end users to participate in collaborative training of ML models (using their local data) in return for certain rewards. However, the design of an optimal incentive mechanism for FL is challenging due to its distributed nature and the fact that the central server has no access to clients’ hyperparameters information and the amount/quality data used for training, which makes the task of determining the reward based on the contribution of individual clients in FL environment difficult. Even though several incentive mechanisms have been proposed for FL, a thorough up-to-date systematic review is missing and this paper fills this gap. According to the best of our knowledge, this paper is the first systematic review that comprehensively enlists the design principles required for implementing these incentive mechanisms and then categorizes various incentive mechanisms according to their design principles. In addition, we also provide a comprehensive overview of security challenges associated with incentive-driven FL. Finally, we highlight the limitations and pitfalls of these incentive schemes and elaborate upon open-research issues that required further research attention.
With machine learning (ML) services now used in a number of mission-critical human-facing domains, ensuring the integrity and trustworthiness of ML models becomes all important. In this work, we consider the paradigm where cloud service providers collect big data from resource-constrained devices for building ML-based prediction models that are then sent back to be run locally on the intermittently connected resource-constrained devices. Our proposed solution comprises an intelligent polynomial-time heuristic that maximizes the level of trust of ML models by selecting and switching between a subset of the ML models from a superset of models in order to maximize the trustworthiness while respecting the given reconfiguration budget/rate and reducing the cloud communication overhead. We evaluate the performance of our proposed heuristic using two case studies. First, we consider Industrial IoT (IIoT) services, and as a proxy for this setting, we use the turbofan engine degradation simulation data set to predict the remaining useful life of an engine. Our results in this setting show that the trust level of the selected models is 0.49%-3.17% less compared to the results obtained using integer linear programming (ILP). Second, we consider smart cities services, and as a proxy of this setting, we use an experimental transportation data set to predict the number of cars. Our results show that the selected model's trust level is 0.7%-2.53% less compared to the results obtained using ILP. We also show that our proposed heuristic achieves an optimal competitive ratio in a polynomial-time approximation scheme for the problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.