Background Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC–MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC–MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence. Results We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC–MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides, and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq. Conclusions This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.
Accurate detection is still a challenge in machine learning (ML) for Alzheimer’s disease (AD). Class imbalance in imbalanced AD data is another big challenge for machine-learning algorithms working under the assumption that the data are evenly distributed within classes. Here, we present a hyperparameter tuning workflow with high-performance computing (HPC) for imbalanced data related to prevalent mild cognitive impairment (MCI) and AD in the Health and Aging Brain Study-Health Disparities (HABS-HD) project. We applied a single-node multicore parallel mode to hyperparameter tuning of gamma, cost, and class weight using a support vector machine (SVM) model with 10 times repeated fivefold cross-validation. We executed the hyperparameter tuning workflow with R’s bigmemory, foreach, and doParallel packages on Texas Advanced Computing Center (TACC)’s Lonestar6 system. The computational time was dramatically reduced by up to 98.2% for the high-performance SVM hyperparameter tuning model, and the performance of cross-validation was also improved (the positive predictive value and the negative predictive value at base rate 12% were, respectively, 16.42% and 92.72%). Our results show that a single-node multicore parallel structure and high-performance SVM hyperparameter tuning model can deliver efficient and fast computation and achieve outstanding agility, simplicity, and productivity for imbalanced data in AD applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.