Monitoring water quality is an important challenge in both developed and developing countries. Remote sensing data can form a highly frequent dataset with acceptable spatial coverage that can be used to remotely monitor water quality. This paper presents a novel automated model for remotely monitoring water quality to address the problem of insufficient samples and save the time and cost of sample collection. The proposed model estimates both optical and non-optical water quality parameters via Sentinel-2A data. A bio-inspired hybrid model of a Binary Whale Optimization Algorithm (BWOA) and Artificial Neural Network (ANN) (BWOA-ANN) is applied to determine the relationship between extracted reflectance values from Sentinel-2A images and analyzed samples. The novelty of this model is to solve two main problems of remote water quality monitoring: poor applicability and low non-optical parameter estimation accuracy. For the first problem, a proposed fully automated model with band selection using the BWOA to automatically select the optimal features (Sentinel-2A bands) that are suitable for each water quality parameter. The second problem is addressed by automatically detecting the relationship between non-optical parameters, such as the total phosphorus, and optical parameters, such as chlorophyll-a. Three datasets with different locations, seasons, and parameters were selected to test the proposed BWOA-ANN. The experimental results demonstrated good regression with a mean R 2 value of 0.916 for optical parameters and 0.890 for non-optical parameters. The proposed model was found to outperform the ANN with an R 2 value higher by 40% and 52% for the optical and non-optical parameters, respectively.
Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliability of the system in case of failure. The DFS replicates the files in different locations to provide availability and reliability. These replications consume storage space and other resources. The importance of these files differs depending on how frequently they are used in the system. So some of these files do not deserve to replicate many times because it is unimportant in the system. This paper introduces a Dynamic Replication Policy using Machine Learning Clustering (DRPMLC) on HDFS, which uses Machine Learning to cluster the files into different groups and apply other replication policies to each group to reduce the storage consumption, improve the read and write operations time and keep the availability and reliability of HDFS as a High-Performance Distributed Computing (HPDC).
Diabetes disease is one of the main healthcare challenges in all the world. Undiagnosed diabetes can increase the danger of cardiac stroke, diabetic nephropathy, and other disorders. Early detection of diabetes is necessary to take care of a healthy life. Nowadays, social media is a new dimension to deal with health care by exploiting the real-time shared patients' data to early detect diabetes disease. Furthermore, technologies typically associated with digitalization add value in healthcare, including artificial intelligence, data analytic technologies, and stream processing technologies. Therefore, in this research, we propose a real-time system for predicting diabetes disease from health-based social streaming data to indicate the current status for patient health. The proposed system aims to find the most accurate machine learning model which has the highest accuracy of diabetes prediction. We have used three types of feature selection techniques to select the most relevant features from the used dataset i.e., Recursive Feature Elimination, Univariate feature selection, and Feature Importance. Also, we have evaluated and compared four machine learning models with selected and full features i.e, , Random Forest, Support Vector Machine, Decision Tree, and Logistic Regression Classifier. The experimental results have determined that the random forest model has achieved the greatest accuracy among other models at 84.11%. For online prediction through social media, we have performed our proposed system to handle streaming Twitter data about patients' health. In doing so, Kafka and Spark streaming are integrated into the backend of the proposed system. Then, the random forest classifier is used to predict the patient's current health status in real-time.
The real-time monitoring and tracking systems play a critical role in the healthcare field. Wearable medical devices with sensors, mobile applications, and health cloud have continuously generated an enormous amount of data, often called streaming big data. Due to the higher speed of the streaming data, it is difficult to ingest, process, and analyze such huge data in real-time to make real-time actions in case of emergencies. Using traditional methods that are inadequate and time-consuming. Therefore, there is a significant need for real-time big data stream processing to guarantee an effective and scalable solution. So, we proposed a new system for online prediction to predict health status using Spark streaming framework. The proposed system focuses on applying streaming machine learning models (i.e. streaming linear regression with SGD) on streaming health data events ingested to spark streaming through Kafka topics. The experimental results are done on the historical medical datasets (i.e. diabetes dataset, heart disease dataset, and breast cancer dataset) and generated dataset which is simulated to wearable medical sensors. The historical datasets have shown that the accuracy improvement ratio obtained using the diabetes disease dataset is the highest one with respect to the other two datasets with an accuracy of 81%. For generated datasets, the online prediction system has achieved accuracy with 98% at 5 seconds window size. Beyond this, the experimental results have proofed that the online prediction system can online learn and update the model according to the new data arrival and window size.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.