BackgroundThe wealth of gene expression values being generated by high throughput microarray technologies leads to complex high dimensional datasets. Moreover, many cohorts have the problem of imbalanced classes where the number of patients belonging to each class is not the same. With this kind of dataset, biologists need to identify a small number of informative genes that can be used as biomarkers for a disease.ResultsThis paper introduces a Balanced Iterative Random Forest (BIRF) algorithm to select the most relevant genes for a disease from imbalanced high-throughput gene expression microarray data. Balanced iterative random forest is applied on four cancer microarray datasets: a childhood leukaemia dataset, which represents the main target of this paper, collected from The Children’s Hospital at Westmead, NCI 60, a Colon dataset and a Lung cancer dataset. The results obtained by BIRF are compared to those of Support Vector Machine-Recursive Feature Elimination (SVM-RFE), Multi-class SVM-RFE (MSVM-RFE), Random Forest (RF) and Naive Bayes (NB) classifiers. The results of the BIRF approach outperform these state-of-the-art methods, especially in the case of imbalanced datasets. Experiments on the childhood leukaemia dataset show that a 7% ∼ 12% better accuracy is achieved by BIRF over MSVM-RFE with the ability to predict patients in the minor class. The informative biomarkers selected by the BIRF algorithm were validated by repeating training experiments three times to see whether they are globally informative, or just selected by chance. The results show that 64% of the top genes consistently appear in the three lists, and the top 20 genes remain near the top in the other three lists.ConclusionThe designed BIRF algorithm is an appropriate choice to select genes from imbalanced high-throughput gene expression microarray data. BIRF outperforms the state-of-the-art methods, especially the ability to handle the class-imbalanced data. Moreover, the analysis of the selected genes also provides a way to distinguish between the predictive genes and those that only appear to be predictive.
Early damage detection is critical for a large set of global ageing infrastructure. Structural Health Monitoring systems provide a sensor-based quantitative and objective approach to continuously monitor these structures, as opposed to traditional engineering visual inspection. Analysing these sensed data is one of the major Structural Health Monitoring (SHM) challenges. This paper presents a novel algorithm to detect and assess damage in structures such as bridges. This method applies tensor analysis for data fusion and feature extraction, and further uses one-class support vector machine on this feature to detect anomalies, i.e., structural damage. To evaluate this approach, we collected acceleration data from a sensor-based SHM system, which we deployed on a real bridge and on a laboratory specimen. The results show that our tensor method outperforms a state-of-the-art approach using the wavelet energy spectrum of the measured data. In the specimen case, our approach succeeded in detecting 92.5% of induced damage cases, as opposed to 61.1% for the wavelet-based approach. While our method was applied to bridges, its algorithm and computation can be used on other structures or sensor-data analysis problems, which involve large series of correlated data from multiple sensors.
Summary In this paper, we focused on the development and verification of a solid and robust framework for structural condition assessment of real‐life structures using measured vibration responses, with the presence of multiple progressive damages occurring within the inspected structures. A self‐tuning learning method for structural condition assessment was proposed. Damage sensitive features were extracted using a frequency domain decomposition (FDD) approach to fuse all the measured responses, followed by random projection algorithm for dimensionality reduction. An automatic parameter selection method called Appropriate Distance to the Enclosing Surface (ADES) was used for tuning the classifier parameter. The effect of operational conditions on the robustness of the proposed method was also investigated, and it was realized that application of FDD to extract damage sensitive feature reduces the variation in the results. Promising results in the assessment of damage were obtained based on two comprehensive case studies, which included single and multiple damage scenarios. The contributions of the work are threefold. First, through two comprehensive case studies, we demonstrate that the frequency‐based feature from a single sensor might not be adequate enough to detect the progress of damage, even if the sensor is in the vicinity of damage. Second, we show that data fusion using FDD can reliably assess the severity of damage, and finally, we propose a new automated approach for tuning the classifier parameter.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.