Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Plant germplasm is a part of living genetic resources, including seeds and plant materials, such as roots, leaves, and stems, and should be conserved and managed to maintain ecological biodiversity and to consistently generate the product and supply food crops. Plant germplasm can be categorized based on various genetic traits such as race, and clustering based on similar genetic traits is an efficient method for managing large numbers of germplasms. Therefore, we developed an algorithm, termed cacGMS (Clustering Analysis for Categorical genetic traits of germplasms in Genebank Management System), using categorical variables which statistically differentiate the datatype of genetic traits such as seed-coat color, seed shape, and flower color. Briefly, using Newman's modularity method, cacGMS combines the hierarchical clustering algorithm using the Ward2 method and representative-based algorithms such as K-medoids, and it regroups all germplasms using germplasm core sets. We tested cacGMS using 2,378 pepper germplasms with 46 different categorical genetic traits, and it exhibited better performance than the hierarchical and K-medoids algorithms for the average distance among clusters (0.4534) and entropy (1.2672). Moreover, cacGMS showed better performance in terms of threshold (from 15 to 30) for genetic traits than other algorithms and provided similar results in a test run using tomato germplasm. From these results, we expect that cacGMS will be a useful tool for managing each group with numerous plant germplasms and facilitate the analysis of other studies, such as analysis of representative characteristics of clustered germplasms and of correlations among germplasms in a particular cluster.
Plant germplasm is a part of living genetic resources, including seeds and plant materials, such as roots, leaves, and stems, and should be conserved and managed to maintain ecological biodiversity and to consistently generate the product and supply food crops. Plant germplasm can be categorized based on various genetic traits such as race, and clustering based on similar genetic traits is an efficient method for managing large numbers of germplasms. Therefore, we developed an algorithm, termed cacGMS (Clustering Analysis for Categorical genetic traits of germplasms in Genebank Management System), using categorical variables which statistically differentiate the datatype of genetic traits such as seed-coat color, seed shape, and flower color. Briefly, using Newman's modularity method, cacGMS combines the hierarchical clustering algorithm using the Ward2 method and representative-based algorithms such as K-medoids, and it regroups all germplasms using germplasm core sets. We tested cacGMS using 2,378 pepper germplasms with 46 different categorical genetic traits, and it exhibited better performance than the hierarchical and K-medoids algorithms for the average distance among clusters (0.4534) and entropy (1.2672). Moreover, cacGMS showed better performance in terms of threshold (from 15 to 30) for genetic traits than other algorithms and provided similar results in a test run using tomato germplasm. From these results, we expect that cacGMS will be a useful tool for managing each group with numerous plant germplasms and facilitate the analysis of other studies, such as analysis of representative characteristics of clustered germplasms and of correlations among germplasms in a particular cluster.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.