Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Federated learning (FL) facilitates shared training of machine learning models while maintaining data privacy. Unfortunately, it suffers from data imbalance among participating clients, causing the performance of the shared model to drop. To diminish the negative effects of unfavourable data-specific properties, both algorithm- and data-based approaches seek to make FL more resilient against them. In this regard, data-based approaches prove to be more versatile and require less domain knowledge to be applied efficiently. Hence, they seem particularly suitable for widespread application in various FL environments. Although data-based approaches such as local data sampling have been applied to FL in the past, previous research did not provide a systematic analysis of the potential and limitations of individual data sampling strategies to improve FL. To this end, we (1) identify relevant local data sampling strategies applicable to FL systems, (2) identify data-specific properties that negatively affect FL system performance, and (3) provide a benchmark of local data sampling strategies regarding their effect on model performance, convergence, and training time in synthetic, real-world, and large-scale FL environments. Moreover, we propose and rigorously test a novel method for data sampling in FL that locally optimizes the choice of sampling strategy prior to FL participation. Our results show that FL can greatly benefit from applying local data sampling in terms of performance and convergence rate, especially when data imbalance is high or the number of clients and samples is low. Furthermore, our proposed sampling strategy offers the best trade-off between model performance and training time.
Federated learning (FL) facilitates shared training of machine learning models while maintaining data privacy. Unfortunately, it suffers from data imbalance among participating clients, causing the performance of the shared model to drop. To diminish the negative effects of unfavourable data-specific properties, both algorithm- and data-based approaches seek to make FL more resilient against them. In this regard, data-based approaches prove to be more versatile and require less domain knowledge to be applied efficiently. Hence, they seem particularly suitable for widespread application in various FL environments. Although data-based approaches such as local data sampling have been applied to FL in the past, previous research did not provide a systematic analysis of the potential and limitations of individual data sampling strategies to improve FL. To this end, we (1) identify relevant local data sampling strategies applicable to FL systems, (2) identify data-specific properties that negatively affect FL system performance, and (3) provide a benchmark of local data sampling strategies regarding their effect on model performance, convergence, and training time in synthetic, real-world, and large-scale FL environments. Moreover, we propose and rigorously test a novel method for data sampling in FL that locally optimizes the choice of sampling strategy prior to FL participation. Our results show that FL can greatly benefit from applying local data sampling in terms of performance and convergence rate, especially when data imbalance is high or the number of clients and samples is low. Furthermore, our proposed sampling strategy offers the best trade-off between model performance and training time.
Machine learning (ML) and deep learning (DL) have become very popular in the research community for addressing complex issues in intelligent transportation. This has resulted in many scientific papers being published across various transportation topics over the past decade. This paper conducts a systematic review of the intelligent transportation literature using a scientometric analysis, aiming to summarize what is already known, identify current research trends, evaluate academic impacts, and suggest future research directions. The study provides a detailed review by analyzing 113 journal articles from the Web of Science (WoS) database. It examines the growth of publications over time, explores the collaboration patterns of key contributors, such as researchers, countries, and organizations, and employs techniques such as co-authorship analysis and keyword co-occurrence analysis to delve into the publication clusters and identify emerging research topics. Nine emerging sub-topics are identified and qualitatively discussed. The outcomes include recognizing pioneering researchers in intelligent transportation for potential collaboration opportunities, identifying reliable sources of information for publishing new work, and aiding researchers in selecting the best solutions for specific problems. These findings help researchers better understand the application of ML and DL in the intelligent transportation literature and guide research policymakers and editorial boards in selecting promising research topics for further research and development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.