Big data is facing many challenges in different aspects, which appear in characteristics such as: Velocity, Volume, Value and Veracity. Processing and analysis of big data are challenging issues to acquire quality information in order to support accurate medical drug practice. The quality of data taxonomy is indicated by three basic elements: are meaningful, predication and decision-making. These elements have been encouraged in previous work that focused on the same challenges of big data. Consequently, the proposed approach preserves the quality of medical drug data toward meaningful data lake by clustering. It consists of four components. Data collection and pre-processing represent the first component in the data lake. Profile data is treated with semi-structured data to clean it up. The second component is extracting data through enforcing rules on whole data to produce different groups and generate weight based on constraints within groups. In component three, data is organized and clustering. This component complies with schema profiling referring to component two in the data lake. Weight outputs of component three are inputs for component four, where K-Mean clustering is applied to obtain different clusters. Each cluster presents an alternative drug to achieve meaningful drug data that is consistent with component three in the data lake.This paper addressed two main challenges; the first challenge is extracting meaningful data from big data; whereas the second challenge is using big data technique with K-Mean clustering algorithm. An experimental approach was followed through using Food and Drug Administration (FDA) data and symptoms in R framework. ANOVA statistical test was carried out to calculate sum of square error, P- Value and F-Valuefor the evaluation of variances between clusters and variances within clusters. The results showed the efficiency of the proposed approach.
The various model that has been used to predict, datamining, and information retrieval are useful to use through the traditional database, due to big data the prediction should derive in a different role that conduct the hidden structure data based on a stability scale to allow discovering accrue unsupervised drug data. Especially, the drug data must be understandable to analysts. Following this approach, conduct the stability drug data through computation methods are quality measurements, preprocess data, k-mean cluster, and decision tree. This approach seeks to identify the data by two dimensions (vertically and horizontally), which extrapolations, compilation, and interpretation values of the dataset while considering individual attributes. A comparison with clusters defines the set for features using balance value by K-mean algorithm to determine the k clusters that consider the set of features based on two values 0 and 1, which given the discernible between dependent and independent class target, and pinpoint the relationship among them. Keywords: Big Data, Discretize, k-mean cluster Stability, Target drug
Velocity and volume are two important factors that affect the accuracy of streaming data during the transfer process in Big data applications. This paper presents an Adaptive Fuzzy Map Approach that Relies on Fireflies Algorithm for Accruing Velocity of Big Data and Decentralized Decision Making. A key advantage of the Firefly algorithm is the providing of a small number of iterations comparing to the other methods, which minimize the execution time. Furthermore, the Firefly algorithm is significant to the fuzzy logic system to get its inputs. In addition to the Firefly algorithm, Kalman filter is used to scale the distances of Big data datasets, where it generates output by assigning the match and mismatch. This work used a real dataset to extract variables and values through fuzzification function and be able to coexist as categorical data. After 10 dependent runs that are dealing with certain parameters to be available on aspects of velocity and volume of Big data existing in two parameters Goal and Dimension, the meaningful aspect scale by minimizes the randomness parameter by approximately 1.6%. The other aspect is decision making that is gained through exploration and exploitation that is covered by attraction base and attraction_min parameters. The evaluation has been made by making a comparison between the proposed Adaptive Fuzzy Map Approach and ANOVA model based on the variables like travelled time, road, speed, and distance, which showed clear enhancement produced by the proposed Adaptive Fuzzy Map Approach in terms of the accruing velocity of Big Data.
Data exists in large volume in the modern world, it becomes very useful when decoded correctly to inform decision making towards tackling real word issues. However, when the data is conflicting, it becomes a daunting task to get obtain information. Working on missing data has become a very important task in big data analysis. This paper considers the handling of the missing data using the Support Vector Machine (SVM) based on a technique called Correlation-Genetic Algorithm-SVM. This data is to be subjected to the SVM classification technique after identifying the attribute's correlation and application of the genetic algorithm. The application of the correlation enables a clear view of the attributes which are highly correlated within a particular dataset. The results indicate that apart from the SVM, the application of the proposed hybrid algorithm produces better outcomes identification rate and accuracy is considered. The proposed approach is also compared with depicts the Mean Identification rate of applying the neural network, the result indicate a consistent accuracy hence making it better.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.