Non-alcoholic fatty liver disease (NAFLD) is a chronic liver disease that presents a great challenge for treatment and prevention.. This study aims to implement a machine learning approach that employs such datasets to identify potential biomarker targets. We developed a pipeline to identify potential biomarkers for NAFLD that includes five major processes, namely, a pre-processing step, a feature selection and a generation of a random forest model and, finally, a downstream feature analysis and a provision of a potential biological interpretation. The pre-processing step includes data normalising and variable extraction accompanied by appropriate annotations. A feature selection based on a differential gene expression analysis is then conducted to identify significant features and then employ them to generate a random forest model whose performance is assessed based on a receiver operating characteristic curve. Next, the features are subjected to a downstream analysis, such as univariate analysis, a pathway enrichment analysis, a network analysis and a generation of correlation plots, boxplots and heatmaps. Once the results are obtained, the biological interpretation and the literature validation is conducted over the identified features and results. We applied this pipeline to transcriptomics and lipidomic datasets and concluded that the C4BPA gene could play a role in the development of NAFLD. The activation of the complement pathway, due to the downregulation of the C4BPA gene, leads to an increase in triglyceride content, which might further render the lipid metabolism. This approach identified the C4BPA gene, an inhibitor of the complement pathway, as a potential biomarker for the development of NAFLD.
Background Liver cancer is the fourth leading cause of cancer‐related death globally which is estimated to reach more than 1 million deaths a year by 2030. Among liver cancer types, hepatocellular carcinoma (HCC) accounts for approximately 90% of the cases and is known to have a tumour promoting inflammation regardless of its underlying aetiology. However, current promising treatment approaches, such as immunotherapy, are partially effective for most of the patients due to the immunosuppressive nature of the tumour microenvironment (TME). Therefore, there is an urgent need to fully understand TME in HCC and discover new immune markers to eliminate resistance to immunotherapy. Methods We analyse three microarray datasets, using unsupervised and supervised methods, in an effort to discover signature genes. First, univariate, and multivariate, feature selection methods, such as the Boruta algorithm, are applied. Subsequently, an optimisation procedure, which utilises random forest algorithm with three dataset pairs combinations, is performed. The resulting optimal gene sets are then combined and further subjected to network analysis and pathway enrichment analysis so as to obtain information related to their biological relevance. The microarray datasets were analysed via the MCP‐counter, CIBERSORT, TIMER, EPIC, and quanTIseq deconvolution methods and an estimation of cell type abundances for each dataset sample were identified. The differences in the cell type abundances, between the adjacent and tumour sample groups, were then assessed using a Wilcoxon Rank Sum test ( p ‐value < 0.05). Results The optimal gene signature sets, derived from each of the data pairs combination, achieved AUC values ranging from 0.959 to 0.988 in external validation sets using Random Forest model. CLEC1B and PTTG1 genes are retrieved across each optimal set. Among the signature genes, PTTG1, AURKA, and UBE2C genes are found to be involved in the regulation of mitotic sister chromatid separation and anaphase‐promoting complex (APC) dependent catabolic process (adjusted p ‐value < 0.001). Additionally, the application of deconvolution algorithms revealed significant changes in cell type abundances of Regulatory T (Treg) cells, M0 and M1 macrophages, and T CD8 + cells between adjacent and tumour samples. Conclusion We identified ECM1 gene as a potential immune‐related marker acting through immune cell migration and macrophage polarisation. Our results indicate that macrophages, such as M0 macrophage and M1 macrophage cells, undergo significant changes in HCC TME. Moreover, our immune deconvolution approach revealed significant infiltration of Treg cells and M0 macrophages, and a significant decrease in T CD8 + cells and M1 macrophages in tumour samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.