Identification of protein–protein interaction associated functions based on gene ontology and KEGG pathway

Yang, Lili; Zhang, Yuhang; Huang, FeiMing; Li, ZhanDong; Huang, Tao; Cai, Yi

doi:10.3389/fgene.2022.1011659

Cited by 8 publications

(8 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“… 11 The information was obtained from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. 12 , 13 …”

Section: Methodsmentioning

confidence: 99%

The identification of key genes and pathways in glioblastoma by bioinformatics analysis

Farsi,

Allahyari Fard

2023

Molecular & Cellular Oncology

View full text Add to dashboard Cite

GBM is the most common and aggressive type of brain tumor. It is classified as a grade IV tumor by the WHO, the highest grade. Prognosis is generally poor, with most patients surviving only about a year. Only 5% of patients survive longer than 5 years. Understanding the molecular mechanisms that drive GBM progression is critical for developing better diagnostic and treatment strategies. Identifying key genes involved in GBM pathogenesis is essential to fully understand the disease and develop targeted therapies. In this study two datasets, GSE108474 and GSE50161, were obtained from the Gene Expression Omnibus (GEO) to compare gene expression between GBM and normal samples. Differentially expressed genes (DEGs) were identified and analyzed. To construct a protein-protein interaction (PPI) network of the commonly up-regulated and down-regulated genes, the STRING 11.5 and Cytoscape 3.9.1 were utilized. Key genes were identified through this network analysis. The GEPIA database was used to confirm the expression levels of these key genes and their association with survival. Functional and pathway enrichment analyses on the DEGs were conducted using the Enrichr server. In total, 698 DEGs were identified, consisting of 377 up-regulated genes and 318 down-regulated genes. Within the PPI network, 11 key up-regulated genes and 13 key down-regulated genes associated with GBM were identified. NOTCH1, TOP2A, CD44, PTPRC, CDK4, HNRNPU, and PDGFRA were found to be important targets for potential drug design against GBM. Additionally, functional enrichment analysis revealed the significant impact of Epstein-Barr virus (EBV), Cell Cycle, and P53 signaling pathways on GBM.

show abstract

“… 11 The information was obtained from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database. 12 , 13 …”

Section: Methodsmentioning

confidence: 99%

The identification of key genes and pathways in glioblastoma by bioinformatics analysis

Farsi,

Allahyari Fard

2023

Molecular & Cellular Oncology

View full text Add to dashboard Cite

show abstract

“…The combination of biological sequence analysis and ML models has gained quite a lot of attention among researchers in recent years [8], [9]. As a biological sequence consists of a long string of characters corresponding to either nucleotides or amino acids, it needs to be transformed into a numerical form to make it compatible with the ML model.…”

Section: Related Workmentioning

confidence: 99%

“…Moreover, the application of ML approaches for performing biological sequence analysis is a popular research topic these days [8], [9]. The ability of ML methods to determine the sequence's biological functions makes them desirable to be employed for sequence analysis.…”

Section: Introductionmentioning

confidence: 99%

“…Additionally, ML models can also determine the relationship between the primary structure of the sequence and its biological functions. Like [8] built a Random Forest-based algorithm to classify sucrose transporter (SUT) protein, [9] designed a novel tool for Protein-protein interactions data and functional analysis, [10] developed a new ML model to identify RNA pseudo-uridine modification sites. ML-based biological sequence analysis approaches can be categorized into feature-engineering-based methods [11], [12], kernel-based methods [13], neural network-based techniques [14], [15], and pre-trained deep learning models [16], [17].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Exploring The Potential Of GANs In Biological Sequence Analysis

Murad¹,

Ali²,

Patterson³

2023

Preprint

View full text Add to dashboard Cite

Biological sequence analysis is an essential step toward building a deeper understanding of the underlying functions, structures, and behaviors of the sequences. It can help in identifying the characteristics of the associated organisms, like viruses, etc., and building prevention mechanisms to eradicate their spread and impact, as viruses are known to cause epidemics which can become pandemics globally. New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences. However, these ML-based methods undergo challenges with data imbalance, generally associated with biological sequence datasets, which hinders their performance. Although various strategies are present to address this issue, like the SMOTE algorithm, which create synthetic data, however, they focus on local information rather than the overall class distribution. In this work, we explore a novel approach to handle the data imbalance issue based on Generative Adversarial Networks (GANs) which use the overall data distribution. GANs are utilized to generate synthetic data that closely resembles the real one, thus this generated data can be employed to enhance the ML models' performance by eradicating the class imbalance problem for biological sequence analysis. We perform 3 distinct classification tasks by using 3 different sequence datasets (Influenza A Virus, PALMdb, VDjDB) and our results illustrate that GANs can improve the overall classification performance.

show abstract

“…However, the availability of large-size sequence data exceeds the computational limit of such techniques. Moreover, the application of ML approaches for performing biological sequence analysis is a popular research topic these days [ 9 , 10 ]. The ability of ML methods to determine the sequence’s biological functions makes them desirable to be employed for sequence analysis.…”

Section: Introductionmentioning

confidence: 99%

Exploring the Potential of GANs in Biological Sequence Analysis

Murad

Ali

Patterson

2023

Biology

View full text Add to dashboard Cite

Biological sequence analysis is an essential step toward building a deeper understanding of the underlying functions, structures, and behaviors of the sequences. It can help in identifying the characteristics of the associated organisms, such as viruses, etc., and building prevention mechanisms to eradicate their spread and impact, as viruses are known to cause epidemics that can become global pandemics. New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences. However, these ML-based methods undergo challenges with data imbalance, generally associated with biological sequence datasets, which hinders their performance. Although various strategies are present to address this issue, such as the SMOTE algorithm, which creates synthetic data, however, they focus on local information rather than the overall class distribution. In this work, we explore a novel approach to handle the data imbalance issue based on generative adversarial networks (GANs), which use the overall data distribution. GANs are utilized to generate synthetic data that closely resembles real data, thus, these generated data can be employed to enhance the ML models’ performance by eradicating the class imbalance problem for biological sequence analysis. We perform four distinct classification tasks by using four different sequence datasets (Influenza A Virus, PALMdb, VDjDB, Host) and our results illustrate that GANs can improve the overall classification performance.

show abstract

Identification of protein–protein interaction associated functions based on gene ontology and KEGG pathway

Cited by 8 publications

References 39 publications

The identification of key genes and pathways in glioblastoma by bioinformatics analysis

The identification of key genes and pathways in glioblastoma by bioinformatics analysis

Exploring The Potential Of GANs In Biological Sequence Analysis

Exploring the Potential of GANs in Biological Sequence Analysis

Contact Info

Product

Resources

About