Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
The pseudotargeted metabolomics method integrates advantages of nontargeted and targeted analysis because it can acquire data of metabolites in the multireaction monitoring (MRM) mode of mass spectrometry (MS) without needing standards. The key is the ion-pair information collection from samples to be analyzed. It is well-known that sequential windowed acquisition of all theoretical Fragment ion (SWATH) MS mode can acquire MS2 information to a maximum extent. To expediently acquire as many ion-pairs as possible with optimal collision energy (CE), an ion-pair selection approach based on SWATH MS acquisition with variable isolation windows was developed in this study. Initially, nontargeted acquisition of all metabolites information in plasma Standard Reference Material (SRM 1950) was performed by ultra high-performance liquid chromatography (UHPLC)-quadrupole time-of-flight (Q-TOF) MS platform with three CEs. With the help of software tool, the ion-pairs of unique metabolites were gained. Then they were validated in scheduled MRM coupled with UHPLC. After removing false positive, the ion-pairs with an optimal CE was integrated. A total of 1373 unique metabolite ion-pairs were obtained at positive ion mode. And repeatability of the established pseudotargeted approach was evaluated by intraday and interday precision. The results demonstrated the method was stable, reliable, and suitable for metabolomics study. As an application example, alterations of serum metabolites in Type 2 diabetes were investigated by using the established method. This work provides a pseudotargeted ion-pair selection method based on SWATH MS acquisition with the characters of increased metabolite coverage, suitable CE, and convenient processing.
In a Chinese prospective cohort, 500 patients with new‐onset type 2 diabetes (T2D) within 4.61 years and 500 matched healthy participants are selected as case and control groups, and randomized into discovery and validation sets to discover the metabolite changes before T2D onset and the related diabetogenic loci. A serum metabolomics analysis reveals that 81 metabolites changed significantly before T2D onset. Based on binary logistic regression, eight metabolites are defined as a biomarker panel for T2D prediction. Pipecolinic acid, carnitine C14:0, epinephrine and phosphatidylethanolamine 34:2 are first found associated with future T2D. The addition of the biomarker panel to the clinical markers (BMI, triglycerides, and fasting glucose) significantly improves the predictive ability in the discovery and validation sets, respectively. By associating metabolomics with genomics, a significant correlation (p < 5.0 × 10−8) between eicosatetraenoic acid and the FADS1 (rs174559) gene is observed, and suggestive correlations (p < 5.0 × 10−6) between pipecolinic acid and CHRM3 (rs535514), and leucine/isoleucine and WWOX (rs72487966) are discovered. Elevated leucine/isoleucine levels increased the risk of T2D. In conclusion, multiple metabolic dysregulations are observed to occur before T2D onset, and the new biomarker panel can help to predict T2D risk.
Assessment and prediction of prognostic risk in patients with hepatocellular carcinoma (HCC) would greatly benefit the optimal treatment selection. Here, we aimed to identify the critical metabolites associated with the outcomes and develop a risk score to assess the prognosis of HCC patients after curative resection. A total of 78 serum samples of HCC patients were analyzed by liquid chromatography–mass spectrometry to characterize the metabolic profiling. A novel network-based feature selection method (NFSM) was developed to define the critical metabolites with the most discriminant capacity to outcomes. The metabolites defined by NFSM was further reduced by Cox regression analysis to generate a prognostic metabolite panelphenylalanine and choline. Furthermore, univariate and multivariate Cox regression analyses were applied to combine the metabolite panel with the presence of satellite nodes to generate a global prognostic index (GPI) score for overall survival assessment. Compared with the current clinical classification systems, including the Barcelona-clinic liver cancer stage, tumor–node–metastasis stage, and albumin–bilirubin grade, the GPI score presented comparable performance, according to the time-dependent receiver operating characteristic curves and was validated in an independent cohort, which suggested that metabolomics could serve as a helpful tool to stratify the HCC prognostic risk after operation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.