Recent technological advancements in data acquisition tools allowed life scientists to acquire multimodal data from different biological application domains. Categorized in three broad types (i.e. images, signals, and sequences), these data are huge in amount and complex in nature. Mining such enormous amount of data for pattern recognition is a big challenge and requires sophisticated data-intensive machine learning techniques. Artificial neural network-based learning systems are well known for their pattern recognition capabilities, and lately their deep architectures—known as deep learning (DL)—have been successfully applied to solve many complex pattern recognition problems. To investigate how DL—especially its different architectures—has contributed and been utilized in the mining of biological data pertaining to those three types, a meta-analysis has been performed and the resulting resources have been critically analysed. Focusing on the use of DL to analyse patterns in data from diverse biological domains, this work investigates different DL architectures’ applications to these data. This is followed by an exploration of available open access data sources pertaining to the three data types along with popular open-source DL tools applicable to these data. Also, comparative investigations of these tools from qualitative, quantitative, and benchmarking perspectives are provided. Finally, some open research challenges in using DL to mine biological data are outlined and a number of possible future perspectives are put forward.
Ensemble models achieve high accuracy by combining a number of base estimators and can increase the reliability of machine learning compared to a single estimator. Additionally, an ensemble model enables a machine learning method to deal with imbalanced data, which is considered to be one of the most challenging problems in machine learning. In this paper, the capability of Adaptive Boosting (AdaBoost) is integrated with a Convolutional Neural Network (CNN) to design a new machine learning method, AdaBoost-CNN, which can deal with large imbalanced datasets with high accuracy. AdaBoost is an ensemble method where a sequence of classifiers is trained. In AdaBoost, each training sample is assigned a weight, and a higher weight is set for a training sample that has not been trained by the previous classifier. The proposed AdaBoost-CNN is designed to reduce the computational cost of the classical AdaBoost when dealing with large sets of training data, through reducing the required number of learning epochs for its ingredient estimator. AdaBoost-CNN applies transfer learning to sequentially transfer the trained knowledge of a CNN estimator to the next CNN estimator, while updating the weights of the samples in the training set to improve accuracy and to reduce training time. Experimental results revealed that the proposed AdaBoost-CNN achieved 16.98% higher accuracy compared to the classical AdaBoost method on a synthetic imbalanced dataset. Additionally, AdaBoost-CNN reached an accuracy of 94.08% on 10,000 testing samples of the synthetic imbalanced dataset, which is higher than the accuracy of the baseline CNN method, i.e. 92.05%. AdaBoost-CNN is computationally efficient, as evidenced by the fact that the training simulation time of the proposed method is 47.33 seconds, which is lower than the training simulation time required for a similar AdaBoost method without transfer learning, i.e. 225.83 seconds on the imbalanced dataset. Moreover, when compared to the baseline CNN, AdaBoost-CNN achieved higher accuracy when applied to five other benchmark datasets including CIFAR-10 and Fashion-MNIST. AdaBoost-CNN was also applied to the EMNIST datasets, to determine its impact on large imbalanced classes, and the results demonstrate the superiority of the proposed method compared to CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.