Key PointsEarly and Accurate Diagnosis Essential:Acute pulmonary embolism (PE) is a critical condition that demands prompt and precise diagnosis for effective treatment.Limitations of Current Diagnostics:Existing diagnostic methods like Computed Tomography Pulmonary Angiography (CTPA) have certain limitations, leading to the exploration of alternative approaches.Potential of Blood-Based Biomarkers:A recent study focused on identifying blood-based biomarkers for PE. This involved using gene ontology analysis and machine learning methods to analyze gene expression data from both PE patients and healthy controls.Gene Selection and Analysis:The study selected 20 genes for detailed analysis. These included various coagulation factors, fibrinolytic genes, and inflammation markers. Gene Ontology enrichment analysis was performed to understand the biological processes and molecular functions of these genes.Machine Learning for Diagnosis:Supervised machine learning algorithms were utilized to create classification models using the expression levels of these 20 genes. The models demonstrated promising results in distinguishing PE patients from healthy individuals.Acute pulmonary embolism (PE) is a life-threatening condition requiring early and accurate diagnosis. Current diagnostic methods like CTPA have limitations, and a study aimed to identify potential blood-based biomarkers for PE using gene ontology analysis and machine learning methods. Gene expression data of PE patients and healthy controls were obtained from the Gene Expression Omnibus database. A total of 20 genes were selected for further analysis, including coagulation factors F7, F10, F12, fibrinolytic genes PLAT, SERPINE1 and SERPINE2, and inflammation markers SELE, VCAM1 and ICAM. Gene Ontology enrichment analysis was performed to identify biological processes and molecular functions overrepresented among the candidate genes. Supervised machine learning algorithms were applied to build classification models using the expression levels of the 20 genes as features. Nested cross-validation was employed to assess model performance. The RF model achieved the highest area under the receiver operating characteristic curve of 0.89, indicating excellent discrimination between PE patients and controls based on the gene expression signature. Validation in larger cohorts is warranted to clinically translate these findings into a non-invasive diagnostic test for PE.