The considerable improvement of technology produced for various applications has resulted in a growth in data sizes, such as healthcare data, which is renowned for having a large number of variables and data samples. Artificial neural networks (ANN) have demonstrated adaptability and effectiveness in classification, regression, and function approximation tasks. ANN is used extensively in function approximation, prediction, and classification. Irrespective of the task, ANN learns from the data by adjusting the edge weights to minimize the error between the actual and predicted values. Back Propagation is the most frequent learning technique that is used to learn the weights of ANN. However, this approach is prone to the problem of sluggish convergence, which is especially problematic in the case of Big Data. In this paper, we propose a Distributed Genetic Algorithm based ANN Learning Algorithm for addressing challenges associated with ANN learning for Big data. Genetic Algorithm is one of the well-utilized bio-inspired combinatorial optimization methods. Also, it is possible to parallelize it at multiple stages, and this may be done in an extremely effective manner for the distributed learning process. The proposed model is tested with various datasets to evaluate its realizability and efficiency. The results obtained from the experiments show that after a specific volume of data, the proposed learning method outperformed the traditional methods in terms of convergence time and accuracy. The proposed model outperformed the traditional model by almost 80% improvement in computational time.
The features of a dataset play an important role in the construction of a machine learning model. Because big datasets often have a large number of features, they may contain features that are less relevant to the machine learning task, which makes the process more time-consuming and complex. In order to facilitate learning, it is always recommended to remove the less significant features. The process of eliminating the irrelevant features and finding an optimal feature set involves comprehensively searching the dataset and considering every subset in the data. In this research, we present a distributed fuzzy cognitive map based learning-based wrapper method for feature selection that is able to extract those features from a dataset that play the most significant role in decision making. Fuzzy cognitive maps (FCMs) represent a hybrid computing technique combining elements of both fuzzy logic and cognitive maps. Using Spark’s resilient distributed datasets (RDDs), the proposed model can work effectively in a distributed manner for quick, in-memory processing along with effective iterative computations. According to the experimental results, when the proposed model is applied to a classification task, the features selected by the model help to expedite the classification process. The selection of relevant features using the proposed algorithm is on par with existing feature selection algorithms. In conjunction with a random forest classifier, the proposed model produced an average accuracy above 90%, as opposed to 85.6% accuracy when no feature selection strategy was adopted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.