Background
Minor QTLs mining has a very important role in genomic selection, pathway analysis and trait development in agricultural and biological research. Since most individual loci contribute little to complex trait variations, it remains a challenge for traditional statistical methods to identify minor QTLs with subtle phenotypic effects. In this study, we developed a new framework which combined the GWAS analysis and machine learning feature selection to explore new ways for minor QTLs mining.
Results
We studied the soybean branching trait with the 2137 accessions from soybean (Glycine max) diversity panel, which was sequenced by 50k SNP chips with 42,080 valid SNPs. First as a baseline study, we conducted the GWAS GAPIT analysis, and we found that only one SNP marker significantly associated with soybean branching was identified. We then combined the GWAS analysis and feature importance analysis with Random Forest score analysis and permutation analysis. Our analysis results showed that there were 36077 features (SNPs) identified by Random Forest score analysis, and 2098 features (SNPs) identified by permutation analysis. In total, there were 1770 features (SNPs) confirmed by both of the Random Forest score analysis and the permutation analysis. Based on our analysis, 328 branching development related genes were identified. A further analysis on GO (gene ontology) term enrichment were applied on these 328 genes. And the gene location and gene expression of these identified genes were provided.
Conclusions
The combined GWAS and feature selection with machine learning methods provide a new analysis framework, which shows significant identification power for minor QTLs mining. This research on minor QTLs mining helps to understand the biological activities that lies between genotype and phenotype in terms of causal networks of interacting genes. This research also provides an integrative approach for effective genomic selection in plant breeding and help broaden the way of molecular breeding in plants.