Context 46,XY, disorders of sexual development (46,XY, DSD) is a congenital genetic disease whose pathogenesis is complex and clinical manifestations are diverse. The existing molecular research has often focused on single-centre sequencing data, instead of prediction based on big data. Aims This work aimed to fully understand the pathogenesis of 46,XY, DSD, and summarise the key pathogenic genes. Methods Firstly, the potential pathogenic genes were identified from public data. Secondly, bioinformatics was used to predict pathogenic genes, including hub gene analysis, protein–protein interaction (PPI) and function enrichment analysis. Lastly, the genomic DNA from two unrelated families were recruited, next-generation sequencing and Sanger sequencing were performed to verify the hub genes. Key results A total of 161 potential pathogenic genes were selected from MGI and PubMed gene sets. The PPI network was built which included 144 nodes and 194 edges. MCODE 4 was selected from PPI which scored the most significant P-value. The top 15 hub genes were ranked and identified by Cytoscape. Furthermore, three variants were found on SRD5A2 gene by genome sequencing, which belonged to the prediction hub genes. Conclusions Our results indicate that occurrence of 46,XY, DSD is attributed to a variety of genes. Bioinformatics analysis can help us predict the hub genes and find the most core network MCODE model. Implications Bioinformatic predictions may provide a novel perspective on better understanding the pathogenesis of 46,XY, DSD.