Lung cancer (LC) represents most of the cancer incidences in the world. There are many types of LC, but Lung Adenocarcinoma (LUAD) is the most common type. Although RNA-seq and microarray data provide a vast amount of gene expression data, most of the genes are insignificant to clinical diagnosis. Feature selection (FS) techniques overcome the high dimensionality and sparsity issues of the large-scale data. We propose a framework that applies an ensemble of feature selection techniques to identify genes highly correlated to LUAD. Utilizing LUAD RNA-seq data from the Cancer Genome Atlas (TCGA), we employed mutual information (MI) and recursive feature elimination (RFE) feature selection techniques along with support vector machine (SVM) classification model. We have also utilized Random Forest (RF) as an embedded FS technique. The results were integrated and candidate biomarker genes across all techniques were identified. The proposed framework has identified 12 potential biomarkers that are highly correlated with different LC types, especially LUAD. A predictive model has been trained utilizing the identified biomarker expression profiling and performance of 97.99% was achieved. In addition, upon performing differential gene expression analysis, we could find that all 12 genes were significantly differentially expressed between normal and LUAD tissues, and strongly correlated with LUAD according to previous reports. We here propose that using multiple feature selection methods effectively reduces the number of identified biomarkers and directly affects their biological relevance.
Background The novel coronavirus (SARS-CoV-2) caused lethal infections worldwide during an unprecedented pandemic. Identification of the candidate viral epitopes is the first step in the design of vaccines against the viral infection. Several immunoinformatic approaches were employed to identify the SARS-CoV-2 epitopes that bind specifically with the major histocompatibility molecules class I (MHC-I). We utilized immunoinformatic tools to analyze the whole viral protein sequences, to identify the SARS-CoV-2 epitopes responsible for binding to the most frequent human leukocyte antigen (HLA) alleles in the Egyptian population. These alleles were also found with high frequency in other populations worldwide. Results Molecular docking approach showed that using the co-crystallized MHC-I and T cell receptor (TCR) instead of using MHC-I structure only, significantly enhanced docking scores and stabilized the conformation, as well as the binding affinity of the identified SARS-CoV-2 epitopes. Our approach directly predicts 7 potential vaccine subunits from the available SARS-CoV-2 spike and ORF1ab protein sequence. This prediction has been confirmed by published experimentally validated and in silico predicted spike epitope. On the other hand, we predicted novel epitopes (RDLPQGFSA and FCLEASFNY) showing high docking scores and antigenicity response with both MHC-I and TCR. Moreover, antigenicity, allergenicity, toxicity, and physicochemical properties of the predicted SARS-CoV-2 epitopes were evaluated via state-of-the-art bioinformatic approaches, showing high efficacy of the proposed epitopes as a vaccine candidate. Conclusion Our predicted SARS-CoV-2 epitopes can facilitate vaccine development to enhance the immunogenicity against SARS-CoV-2 and provide supportive data for further experimental validation. Our proposed molecular docking approach of exploiting both MHC and TCR structures can be used to identify potential epitopes for most microbial pathogens, provided the crystal structure of MHC co-crystallized with TCR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.