Genomic selection (GS) emphasizes the simultaneous prediction of the genetic effects of thousands of scattered markers over the genome. Several statistical methodologies have been used in GS for the prediction of genetic merit. In general, such methodologies require certain assumptions about the data, such as the normality of the distribution of phenotypic values. To circumvent the non-normality of phenotypic values, the literature suggests the use of Bayesian Generalized Linear Regression (GBLASSO). Another alternative is the models based on machine learning, represented by methodologies such as Artificial Neural Networks (ANN), Decision Trees (DT) and related possible refinements such as Bagging, Random Forest and Boosting. This study aimed to use DT and its refinements for predicting resistance to orange rust in Arabica coffee. Additionally, DT and its refinements were used to identify the importance of markers related to the characteristic of interest. The results were compared with those from GBLASSO and ANN. Data on coffee rust resistance of 245 Arabica coffee plants genotyped for 137 markers were used. The DT refinements presented equal or inferior values of ApparentError Rate compared to those obtained by DT, GBLASSO, and ANN. Moreover, DT refinements were able to identify important markers for the characteristic of interest. Out of 14 of the most important markers analyzed in each methodology, 9.3 markers on average were in regions of quantitative trait loci (QTLs) related to resistance to disease listed in the literature.
Many methodologies are used to predict the genetic merit in animals and plants, but some of them require priori assumptions that may increase the complexity of the model. Artificial neural network (ANN) has advantage to not require priori assumptions about the relationships between inputs and the output allowing great flexibility to handle different types of complex non-additive effects, such as dominance and epistasis. Despite this advantage, the biological interpretability of ANNs is still limited. The aim of this research was to estimate the heritability and markers effects for two traits in Coffea canephora using an additive-dominance architecture ANN and to compare it with genomic best linear unbiased prediction (GBLUP). The data used consists of 51 clones of C. canephora varietal Conilon, 32 of varietal group Robusta and 82 intervarietal hybrids. From this, 165 phenotyped individuals were genotyped for 14,387 SNPs. Due to the high computational cost of ANNs, we used Bagging decision tree to reduce the dimensionality of the data, selecting the markers that accumulated 70% of the total importance. An ANN with three hidden layers was run, each varying from 1 to 40 neurons summing 64,000 neural networks. The network architectures with the best predictive ability were selected. The best architectures were composed by 4, 15, and 33 neurons in the first, second and third hidden layers, respectively, for yield, and by 13, 20, and 24 neurons, respectively for rust resistance. The predictive ability was greater when using ANN with three hidden layers than using one hidden layer and GBLUP, with 0.72 and 0.88 for yield and coffee leaf rust resistance, respectively. The concordance rate (CR) of the 10% larger markers effects among the methods varied between 10% and 13.8%, for additive effects and between 5.4% and 11.9% for dominance effects. The narrow-sense (ha2) and dominance-only (hd2) heritability estimates were 0.25 and 0.06, respectively, for yield, and 0.67 and 0.03, respectively for rust resistance. The ANN was able to estimate the heritabilities from an additive-dominance genomic architectures and the ANN with three hidden layers obtained best predictive ability when compared with those obtained from GBLUP and ANN with one hidden layer.
Genetic diversity analysis has guided the choice of appropriate parents in breeding programs. Multivariate statistical methods such as discriminant analysis are used to obtain the necessary results in these studies. However, to obtain reliable results, one must meet assumptions such as covariance matrix heterogeneity and multivariate normality of the observation vector. Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT) and its refinements do not have these assumptions and may be used in the choice of appropriate parents. This study evaluates the robustness of the Fisher's discriminant function under covariance matrix heterogeneity and multivariate non-normal random vectors. The results were compared with those obtained from Quadratic Discriminant Analysis (QDA), ANN, SVM and DT. Scenarios characterized by heterogeneous covariance matrices and multivariate non-normal random vectors were simulated. Considering the apparent error rate (APER), the SVM method (APER-Normal = 0.07; APER-Poisson = 0.13) and quadratic discriminant method (APER-Normal = 0.09; APER-Poisson = 0.09) presented better results for scenarios simulated with covariance matrix heteroscedasticity. For scenarios with multivariate normality and covariance matrix homoscedasticity, the SVM (APER = 0.15) and ANN (APER = 0.06) presented best results. For situations in which the data had multivariate Poisson distribution and covariance matrix homogeneity, the SVM (APER = 0.15), Fisher's discriminant function (APER = 0.19) and ANN (APER = 0.19) presented better performances. Finally, DT refinements (Bagging, Random Forest and Boosting) presented APER values less than 0.25 and are shown to be alternatives.Additional Keywords: quadratic discriminant function; multivariate analysis, simulation. ResumoAnálises de diversidade genética têm orientado a escolha de genitores apropriados em programas de melhoramento. Métodos de Estatística Multivariada, como por exemplo, as análises discriminantes são utilizadas para obtenção dos resultados necessários nesses estudos. Entretanto, a obtenção de resultados confiáveis está associada ao atendimento de pressupostos, como por exemplo a heterogeneidade de matrizes de covariância e normalidade multivariada do vetor de observações. Redes Neurais Artificiais (RNA), Máquina de Vetor Suporte (MVS), Árvores de Decisão (AD) e seus refinamentos, não possuem pressupostos e podem ser utilizadas para esse fim. O objetivo desse trabalho foi avaliar a robustez da função discriminante de Fisher na presença de matrizes de covariâncias heterogêneas e vetores aleatórios não normais multivariados. Os resultados foram comparados com aqueles provenientes da função discriminante quadrática (FDQ), RNA, MVS e AD. Foram simulados cenários caracterizados por matrizes de covariâncias heterogêneas e vetores aleatórios não normais multivariados. Considerando a Taxa de Erro Aparente média (TEA) a MVS (TEA-Normal=0,07; TEA-Poisson=0,13) e FDQ (TEA-Normal=0,09; TEA-Poisson=0,09) apresentaram melhores resultados para os simu...
The aim of this study was to use fuzzy logic as an auxiliary tool in the assessment of adaptability and stability, using grain‐yield data from flood‐irrigated rice, evaluated in different agricultural years. Eighteen rice genotypes belonging to flood‐irrigated rice breeding programme were evaluated over four agricultural years, 2012/2013 to 2015/2016, totalling 12 environments (3 sites × 4 years). The methodologies of Eberhart and Russell (1966), Lin and Binns (1988) modified by Carneiro (1998) and Centroid's were used. Fuzzy logic was applied to the results of these methodologies as a tool for interpretation and decision‐making regarding the recommendation of genotypes. Performances of the different flood‐irrigated rice genotypes were influenced by environmental conditions, thereby justifying the use of multiple adaptability and stability methodologies. The use of fuzzy logic in the selection of flood‐irrigated rice genotypes is a useful and promising tool in breeding programmes, allowing information from different parameters to be used to understand the influence of environmental variations on the performance of crop genotypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.