Makiya Nakashima scite author profile

While federated learning (FL) has gained great attention for mobile and Internet of Things (IoT) computing with the benefits of scalable cooperative learning and privacy protection capabilities, there still exist a great deal of technical challenges to make it practically deployable. For instance, the distribution of the training process to a myriad of devices limits the classification performance of machine learning (ML) algorithms, often showing a significantly degraded accuracy compared to centralized learning. In this paper, we investigate the problem of performance limitation under FL and present the benefit of data augmentation with an application of anomaly detection using an IoT dataset. Our initial study reveals that one of the critical reasons for the performance degradation is that each device sees only a small fraction of data (that it generates), which limits the efficacy of the local ML model (constructed by the device). This becomes more critical if the data holds the class imbalance problem, observed not infrequently in practice (e.g., a small fraction of anomalies). Moreover, device heterogeneity with respect to data quantity is an open challenge in FL. Based on these observations, we examine the impact of data augmentation on detection performance in FL settings (both homogeneous and heterogeneous). Our experimental results show that even a simple random oversampling can improve detection performance with manageable learning complexity.

show abstract

Anomaly Detection based on Traffic Monitoring for Secure Blockchain Networking

Kim

Nakashima

Fan

et al. 2021

View full text Add to dashboard Cite

Automated Feature Selection for Anomaly Detection in Network Traffic Data

Nakashima

Sim

Kim

et al. 2021

ACM Trans. Manage. Inf. Syst.

View full text Add to dashboard Cite

Variable selection (also known as feature selection ) is essential to optimize the learning complexity by prioritizing features, particularly for a massive, high-dimensional dataset like network traffic data. In reality, however, it is not an easy task to effectively perform the feature selection despite the availability of the existing selection techniques. From our initial experiments, we observed that the existing selection techniques produce different sets of features even under the same condition (e.g., a static size for the resulted set). In addition, individual selection techniques perform inconsistently, sometimes showing better performance but sometimes worse than others, thereby simply relying on one of them would be risky for building models using the selected features. More critically, it is demanding to automate the selection process, since it requires laborious efforts with intensive analysis by a group of experts otherwise. In this article, we explore challenges in the automated feature selection with the application of network anomaly detection. We first present our ensemble approach that benefits from the existing feature selection techniques by incorporating them, and one of the proposed ensemble techniques based on greedy search works highly consistently showing comparable results to the existing techniques. We also address the problem of when to stop to finalize the feature elimination process and present a set of methods designed to determine the number of features for the reduced feature set. Our experimental results conducted with two recent network datasets show that the identified feature sets by the presented ensemble and stopping methods consistently yield comparable performance with a smaller number of features to conventional selection techniques.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.