Mexico, as the center of origin of avocado (Persea americama Mill.), harbors a wide genetic diversity of this species, whose identification may provide the grounds to not only understand its unique population structure and domestication history, but also inform the efforts aimed at its conservation. Although molecular characterization of cultivated avocado germplasm has been studied by several research groups, this had not been the case in Mexico. In order to elucidate the genetic structure of avocado in Mexico and the sustainable use of its genetic resources, 318 avocado accessions conserved in the germplasm collection in the National Avocado Genebank were analyzed using 28 markers [9 expressed sequence tag-Simple Sequence Repeats (SSRs) and 19 genomic SSRs]. Deviation from Hardy Weinberg Equilibrium and high inter-locus linkage disequilibrium were observed especially in drymifolia, and guatemalensis. Total averages of the observed and expected heterozygosity were 0.59 and 0.75, respectively. Although clear genetic differentiation was not observed among 3 botanical races: americana, drymifolia, and guatemalensis, the analyzed Mexican population can be classified into two groups that correspond to two different ecological regions. We developed a core-collection by K-means clustering method. The selected 36 individuals as core-collection successfully represented more than 80% of total alleles and showed heterozygosity values equal to or higher than those of the original collection, despite its constituting slightly more than 10% of the latter. Accessions selected as members of the core collection have now become candidates to be introduced in cryopreservation implying a minimum loss of genetic diversity and a back-up for existing field collections of such important genetic resources.
Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.
Genomic signal processing (GSP) is based on the use of digital signal processing methods for the analysis of genomic data. Convolutional neural networks (CNN) are the state-of-the-art machine learning classifiers that have been widely applied to solve complex problems successfully. In this paper, we present a deep learning architecture and a method for the classification of three different functional genome types: coding regions (CDS), long noncoding regions (LNC), and pseudogenes (PSD) in genomic data, based on the use of GSP methods to convert the nucleotide sequence into a graphical representation of the information contained in it. The obtained accuracy scores of 83% and 84% when classifying between CDS vs. LNC and CDS vs. PSD, respectively, indicate the feasibility of employing this methodology for the classification of these types of sequences. The model was not able to differentiate from PSD and LNC. Our results indicate the feasibility of employing CNN with GSP for the classification of these types of DNA data.
Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome was considered as a document and 13-mer nucleotide representations as words. Latent Dirichlet allocation was used as the probabilistic modeling of the corpus. We where able to identify the topic distribution among analyzed genomes, which is highly consistent with traditional hierarchical classification. It is possible that topic modeling may be applied to establish relationships between genome’s composition and biological phenomena.
Khao Kai Noi rice is considered as an elite quality landrace in Laos, which has led to its germplasm conservation in the Laos National Genebank. As happens with other germplasm collections, a manageable yet representative sub collection has become an essential element for researchers and breeders to simplify many activities, including those related to crop improvement, phenotype-genotype correlation and determination of diversity hotspots. In this study, 109 accessions were used as a test collection for core collection development to determine the feasibility of collection reduction in a closely related rice group. Three core collections were developed by two established methodologies and evaluated by diversity indexes, allele retention, phylogenetic distribution and geographical location. Based on SSR molecular markers and PowerCore, a reduction to 24 accessions was achieved with the conservation of complete genetic diversity. A K-means based on reduction to 24 accessions rendered slightly lesser results while based on 12 accessions resulted in a 17% diversity loss. These core collections may be useful for genebank management, research and breeding activities in the future. Also, they may as well serve to estimate core collection development behavior in other landraces and cultivars, which is fundamental in genetic resources management and utilization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.