ObjectiveThe cause and mechanism of non-obstructive azoospermia (NOA) is complicated; therefore, an effective therapy strategy is yet to be developed. This study aimed to analyse the pathogenesis of NOA at the molecular biological level and to identify the core regulatory genes, which could be utilised as potential biomarkers.MethodsThree NOA microarray datasets (GSE45885, GSE108886, and GSE145467) were collected from the GEO database and merged into training sets; a further dataset (GSE45887) was then defined as the validation set. Differential gene analysis, consensus cluster analysis, and WGCNA were used to identify preliminary signature genes; then, enrichment analysis was applied to these previously screened signature genes. Next, 4 machine learning algorithms (RF, SVM, GLM, and XGB) were used to detect potential biomarkers that are most closely associated with NOA. Finally, a diagnostic model was constructed from these potential biomarkers and visualised as a nomogram. The differential expression and predictive reliability of the biomarkers were confirmed using the validation set. Furthermore, the competing endogenous RNA network was constructed to identify the regulatory mechanisms of potential biomarkers; further, the CIBERSORT algorithm was used to calculate immune infiltration status among the samples.ResultsA total of 215 differentially expressed genes (DEGs) were identified between NOA and control groups (27 upregulated and 188 downregulated genes). The WGCNA results identified 1123 genes in the MEblue module as target genes that are highly correlated with NOA positivity. The NOA samples were divided into 2 clusters using consensus clustering; further, 1027 genes in the MEblue module, which were screened by WGCNA, were considered to be target genes that are highly correlated with NOA classification. The 129 overlapping genes were then established as signature genes. The XGB algorithm that had the maximum AUC value (AUC=0.946) and the minimum residual value was used to further screen the signature genes. IL20RB, C9orf117, HILS1, PAOX, and DZIP1 were identified as potential NOA biomarkers. This 5 biomarker model had the highest AUC value, of up to 0.982, compared to other single biomarker models; additionally, the results of this biomarker model were verified in the validation set.ConclusionsAs IL20RB, C9orf117, HILS1, PAOX, and DZIP1 have been determined to possess the strongest association with NOA, these five genes could be used as potential therapeutic targets for NOA patients. Furthermore, the model constructed using these five genes, which possessed the highest diagnostic accuracy, may be an effective biomarker model that warrants further experimental validation.