Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.
Vulnerability identification and assessment is a key process in risk management. While enumerations of vulnerabilities are available, it is challenging to identify vulnerability sets focused on the profiles and roles of specific organizations. To this end, we have employed systematized knowledge and relevant standards (including National Electric Sector Cybersecurity Organization Resource (NESCOR), ISO/IEC 27005:2018 and National Vulnerability Database (NVD)) to identify a set of 250 vulnerabilities for operators of energy-related critical infrastructures. We have elaborated a “double-mapping” scheme to associate (arbitrarily) categorized assets, with the pool of identified Physical, Cyber and Human/Organizational vulnerabilities. We have designed and implemented an extensible vulnerability identification and assessment framework, allowing historized assessments, based on the CVSS (Common Vulnerability Scoring System) scoring mechanism. This framework has been extended to allow modelling of the vulnerabilities and assessments using the Structured Threat Information eXpression (STIX) JSON format, as Cyber Threat Intelligence (CTI) information, to facilitate information sharing between Electrical Power and Energy Systems (EPES) and to promote collaboration and interoperability scenarios. Vulnerability assessments from the initial analysis of the project in the context of Research and Technology Development (RTD) projects have been statistically processed, offering insights in terms of the assessment’s importance and distribution. The assessments have also been transformed into a dynamic dataset processed to identify and quantify correlation and start the discussion on the interpretation of the way assessments are performed.
Nowadays, IoT networks and devices exist in our everyday life, capturing and carrying unlimited data. However, increasing penetration of connected systems and devices implies rising threats for cybersecurity with IoT systems suffering from network attacks. Artificial Intelligence (AI) and Machine Learning take advantage of huge volumes of IoT network logs to enhance their cybersecurity in IoT. However, these data are often desired to remain private. Federated Learning (FL) provides a potential solution which enables collaborative training of attack detection model among a set of federated nodes, while preserving privacy as data remain local and are never disclosed or processed on central servers. While FL is resilient and resolves, up to a point, data governance and ownership issues, it does not guarantee security and privacy by design. Adversaries could interfere with the communication process, expose network vulnerabilities, and manipulate the training process, thus affecting the performance of the trained model. In this paper, we present a federated learning model which can successfully detect network attacks in IoT systems. Moreover, we evaluate its performance under various settings of differential privacy as a privacy preserving technique and configurations of the participating nodes. We prove that the proposed model protects the privacy without actually compromising performance. Our model realizes a limited performance impact of only ~ 7% less testing accuracy compared to the baseline while simultaneously guaranteeing security and applicability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.