2021
DOI: 10.1111/1755-0998.13350
|View full text |Cite
|
Sign up to set email alerts
|

The choices we make and the impacts they have: Machine learning and species delimitation in North American box turtles (Terrapene spp.)

Abstract: Model-based approaches that attempt to delimit species are hampered by computational limitations as well as the unfortunate tendency by users to disregard algorithmic assumptions. Alternatives are clearly needed, and machine-learning (M-L) is attractive in this regard as it functions without the need to explicitly define a species concept. Unfortunately, its performance will vary according to which (of several) bioinformatic parameters are invoked. Herein, we gauge the effectiveness of M-L-based species-delimi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 11 publications
(13 citation statements)
references
References 103 publications
0
13
0
Order By: Relevance
“…icenoglei lineages, whereas our higher locus completeness datasets (75p and 90p) retained only enough signal to maintain the North lineage as a separate cluster but not for Central or South lineages (Figure 4 ). VAE relies on the inherent structure present in the data (Derkarabetian et al, 2019 ), and previous studies have shown that VAE analyses have been heavily influenced by the filtering parameters for the SNP datasets (Martin et al, 2021 ; Newton et al, 2020 ). Specifically, if a lower threshold for locus completeness is allowed in a dataset the more likely it is to “over‐split”, whereas more stringent filtering (i.e., a high threshold for locus completeness) can remove potentially important signal and “under‐split” the amount of diversity.…”
Section: Discussionmentioning
confidence: 99%
“…icenoglei lineages, whereas our higher locus completeness datasets (75p and 90p) retained only enough signal to maintain the North lineage as a separate cluster but not for Central or South lineages (Figure 4 ). VAE relies on the inherent structure present in the data (Derkarabetian et al, 2019 ), and previous studies have shown that VAE analyses have been heavily influenced by the filtering parameters for the SNP datasets (Martin et al, 2021 ; Newton et al, 2020 ). Specifically, if a lower threshold for locus completeness is allowed in a dataset the more likely it is to “over‐split”, whereas more stringent filtering (i.e., a high threshold for locus completeness) can remove potentially important signal and “under‐split” the amount of diversity.…”
Section: Discussionmentioning
confidence: 99%
“…Modeling population structure given continuous (geographic) and discrete (reproductive isolation) processes can be used to separate clines vs. geographic clusters using programs such as conStruct (Bradburd et al, 2018). Additionally, redundancy analyses or machine learning can be used to isolate nonspatial effects on genetic structure, fully supplanting reliance on incorrect Mantel tests (Diniz-Filho et al, 2013;Burbrink et al, 2021;Martin et al, 2021). Resistance or conductive surfaces representing spatial and environmental covariates on genetic dissimilarity between mapped individuals can also partial out the effects of space of population structure (Peterman and Pope, 2021), though these methods may only indicate that genetic changes are occurring over environmental gradients.…”
Section: Identifying Geographic Lineages and Isolation By Distancementioning
confidence: 99%
“…Machine learning approaches can help to accurately delimit species, providing better estimates of species-level diversity (Martin et al, 2021). Machine learning approaches also allow prediction of patterns of biodiversity at large geographical scales by facilitating the combination of genomic, ecological and geographical data in novel ways (Barrow et al, 2020).…”
Section: Biodiversity and Species Limitsmentioning
confidence: 99%
“…Martin et al (2021) evaluate the use of unsupervised machine learning algorithms in species delimitation of North American box turtles, by employing a suite of unsupervised machine learning methods first applied to species delimitation byDerkarabetian et al (2019) in combination with a supervised machine learning approach(Smith & Carstens, 2020) and a more traditional coalescent-based approach to species delimitation(Leaché et al, 2014). With respect to unsupervised machine learning algorithms, they compare the use of RF, T-distributed Stochastic Neighbor Embedding(Maaten & Hinton, 2008) and Variational Auto-Encoders(Kingma & Welling, 2013) to cluster individuals into putative species while using a variety of missing data thresholds, minor allele frequency filters and methods for selecting the best number of populations or species.…”
mentioning
confidence: 99%