Empirical patterns of linkage disequilibrium (LD) can be used to increase the statistical power of genetic mapping. This study was carried out with the objective of verifying the efficacy of factor analysis (AF) applied to data sets of molecular markers of the SNP type, in order to identify linkage groups and haplotypes blocks. The SNPs data set used was derived from a simulation process of an F2 population, containing 2000 marks with information of 500 individuals. The estimation of the factorial loadings of FA was made in two ways, considering the matrix of distances between the markers (A) and considering the correlation matrix (R). The number of factors (k) to be used was established based on the graph scree-plot and based on the proportion of the total variance explained. Results indicated that matrices A and R lead to similar results. Based on the scree-plot we considered k equal to 10 and the factors interpreted as being representative of the bonding groups. The second criterion led to a number of factors equal to 50, and the factors interpreted as being representative of the haplotypes blocks. This showed the potential of the technique, making it possible to obtain results applicable to any type of population, helping or corroborating the interpretation of genomic studies. The study demonstrated that AF was able to identify patterns of association between markers, identifying subgroups of markers that reflect factor binding groups and also linkage disequilibrium groups.
O conceito de seleção genômica tem como base o desequilíbrio de ligação (LD) entre locos de características quantitativas (QTLs) e marcadores. Uma variação genética que se relaciona com a forma que o fenótipo é expresso conduz a múltiplas associações estatísticas em marcadores próximos em termos de ligação fatorial ou de desequilíbrio, podendo estas associações ser ou não de causa e efeito. Assim ao construir modelos preditivos, em geral não é conhecido quais SNPs possuem de fato associação de causa e efeito com o fenótipo de interesse, consequentemente o modelo é construído utilizando todas as informações genotípicas. Com o intuito de aumentar a acurácia dos modelos de predição, diferentes abordagens de seleção de marcadores foram propostas. São estratégias utilizadas para isto selecionar SNPs relatados anteriormente em estudos de associação para a característica de interesse, estimar a significância dos SNPs no conjunto de dados para cada característica utilizando um modelo preditivo e o efeito dos marcadores estimados pelo modelo, ou a seleção subconjuntos dos marcadores uniformemente espaçados ao longo do genoma. Dentre as abordagens citadas anteriormente, a seleção uniformemente espaçada ao longo do genoma é a mais versátil, uma vez que um painel de baixa densidade formado por meio dela pode ser utilizado em estudos de predição de valores genéticos de qualquer característica, diferentemente das outras abordagens citadas. Porém esta seleção está sujeita a possibilidade de excluir por completo blocos de haplótipos em LD relacionados com o fenótipo de interesse. Este trabalho foi desenvolvido com o objetivo de propor uma abordagem de seleção de marcadores espaçados dentro de blocos de haplótipos construídos utilizando Análise de Fatores (AF). Mostramos, utilizando dados simulados que a Análise de Fatores pode ser utilizada para construir os blocos de haplótipos, sendo ela capaz de sintetizar a relação linear entre marcadores e criar fatores comuns que podem ser interpretados como blocos de LD. Em seguida utilizamos em um conjunto de dados de soja, contendo 41985 marcadores do tipo SNPs com informação de 20087 acessos de soja, esta abordagem para construir os blocos e então foi feito a seleção espaçada dentro dos blocos formados a partir da AF. Três painéis de SNPs foram considerados, contendo 1%, 5% e 100% dos marcadores. Para avaliar o êxito desta abordagem, foi considerado a acurácia em uma tarefa de predição do valor fenotípico dos indivíduos utilizando os painéis reduzidos e o painel completo. Os resultados mostram que ao utilizar os painéis reduzidos não há diferença significativa de acurácia seletiva comparado a acurácia obtida utilizando o painel completo e para uma das características avaliadas também não foi encontrada diferença significativa para acurácia preditiva. Palavras-chave: SNP. GWS. Seleção de Marcadores. Análise Fatorial. Soja. Aprendizado de Máquina. Blocos de Haplótipos.
The biggest challenge in the alfalfa breeding program is to obtain cultivars with high persistence, high productivity, and adaptability. Therefore, studies about selection methods are necessary for the success of alfalfa breeding programs. This study aimed to evaluate dry matter yield and persistence in alfalfa for selecting genotypes, using appropriate statistical models for experiments with repeated measures. The experiment was conducted at Embrapa Southeast Livestock, in São Carlos, state of São Paulo, Brazil in a randomized blocks design, in plots subdivided in time, with three replicates. Eight genotypes were evaluated, and the agronomic trait evaluated was dry matter yield. The experiments in split-plots were used with two and three errors and generalized linear models with the following correlation structures: composite symmetry (CS), heterogeneous composite symmetry (HCS), auto regressive (AR), heterogeneous auto regressive (HAR), and variance components (VC). The best model was selected according to the lowest value of the Akaike Information Criterion (AIC), and three methodologies were used to identify the genotype with greater productivity and persistence: Average test for multiple comparisons, adaptability, and stability by multi-information, and similarity between genotype and ideotype. The interaction between genotypes and cuts was significant, demonstrating the existence of the different behavior of the alfalfa genotypes over the cuts. Different methodologies allowed to measure the average yield of the alfalfa genotype and the persistence over the cuts. PSB 4 genotype demonstrated promissory behavior in terms of productivity and persistence throughout the production cycle of alfalfa.
The image segmentation procedure is fundamental in the phenotyping of plant images. Supervised algorithms have been used for pixel soil plant segmentation. Recent research has used the K-means algorithm to evaluate the segmentation of agronomic images in different crops with different databases. The algorithm has shown good performance in the pixel clustering process despite not being able to classify them directly. The present research intends to propose the use of the K-means algorithm in image segmentation and pixel classification in sugarcane images. 37,430-pixel samples referring to soil and vegetation were manually extracted from some images. This information was used to train and evaluate supervised models. The model with the best performance was considered the "standard" model. A rule that can serve as empirical support to interpret the clusters formed by K-means by assigning a label to each pixel was proposed. Then K-means was used to segment all images and classify the pixels. The vegetation index was used as features and the standard model classification was used as a true label. The measures recall, F1Score, precision, and accuracy were used as a performance measure of K-means, and the mask of each produced to compare the final result of the two approaches, highlighting the vegetation. Using K-means provided better-defined edges than Logistic Regression (standard model) and considerably distinguished the occurrence of soil between the leaves, with precision ranging from 0.77 to 0.92. These results expressed the importance of vegetation index to the clusterization process and showed that K-means ally to an interpretation clusters rule, which could be used to classify pixels in images.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.