2018
DOI: 10.17161/bi.v13i0.7600
|View full text |Cite
|
Sign up to set email alerts
|

Sample data and training modules for cleaning biodiversity information

Abstract: Large-scale biodiversity databases have become crucial information sources in many analyses in biogeography, macroecology, and conservation biology, often involving development of empirical models of species’ ecological niches and predictions of their geographic distributions. These analyses, however, can be impaired by the presence of errors, particularly as regards taxonomic identifications and accurate geographic coordinates. Here, we present a detailed data-cleaning exercise based on two contrasting datase… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0
3

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
2

Relationship

4
6

Authors

Journals

citations
Cited by 48 publications
(35 citation statements)
references
References 5 publications
0
32
0
3
Order By: Relevance
“…1 ) separate from non-native occurrences facilitated by human introduction. We cleaned occurrences from the native distribution following Cobos et al (2018) by removing duplicates and records with inconsistent georeferencing (coordinates outside country limits, on the sea, or missing, as recommended in the literature of data cleaning; Chapman, 2005 ). To avoid model fitting influences of spatial autocorrelation and overdominance of specific regions due to sampling bias, we thinned these records spatially in two ways: by geographic distance and by density of records per country ( Fig.…”
Section: Methodsmentioning
confidence: 99%
“…1 ) separate from non-native occurrences facilitated by human introduction. We cleaned occurrences from the native distribution following Cobos et al (2018) by removing duplicates and records with inconsistent georeferencing (coordinates outside country limits, on the sea, or missing, as recommended in the literature of data cleaning; Chapman, 2005 ). To avoid model fitting influences of spatial autocorrelation and overdominance of specific regions due to sampling bias, we thinned these records spatially in two ways: by geographic distance and by density of records per country ( Fig.…”
Section: Methodsmentioning
confidence: 99%
“…One group is used to demonstrate the application of our method of evaluation, while the second group is used to make a comparison between different classification methods as applied to the problem of estimating distributions (SDMs) and then to evaluate their performance. We obtained occurrence data for all of the selected species from GBIF (Samy et al., 2013) and cleaned the datasets using standard procedures (Cobos, Jiménez, Nuñez‐Penichet, Romero‐Alvarez, & Simoes, 2018). The climatic layers used to create the SDMs for each species came from the WorldClim database (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005).…”
Section: Methodsmentioning
confidence: 99%
“…1) separate from non-native occurrences facilitated by human introduction. We cleaned occurrences from the native distribution following Cobos et al (2018) by removing duplicates and records with doubtful or missing coordinates. To avoid model overfitting derived from spatial autocorrelation and overdominance of specific regions due to sampling bias, we thinned these records spatially in two ways: by geographic distance and by density of records per country (Fig.…”
Section: Methodsmentioning
confidence: 99%