2016
DOI: 10.1016/j.ecoinf.2016.06.001
|View full text |Cite
|
Sign up to set email alerts
|

Quantifying the value of user-level data cleaning for big data: A case study using mammal distribution models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
47
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(47 citation statements)
references
References 49 publications
0
47
0
Order By: Relevance
“…The accuracy of the automated conservation assessment was in the same range as found by previous studies (Nic Lughadha et al 2019; Zizka, Azevedo, et al 2019). The similar accuracy of the raw and filtered dataset for the automated conservation assessment was surprising, in particular given the EOO and AOO reduction observed in the filtered dataset (Table 4) and the impact of errors on spatial analyses observed in previous studies (Gueta and Carmel 2016). The robustness of the automated assessment was likely due to the fact that the EOO for most species was large, even after the considerable reduction caused by filtering.…”
Section: Discussionmentioning
confidence: 78%
“…The accuracy of the automated conservation assessment was in the same range as found by previous studies (Nic Lughadha et al 2019; Zizka, Azevedo, et al 2019). The similar accuracy of the raw and filtered dataset for the automated conservation assessment was surprising, in particular given the EOO and AOO reduction observed in the filtered dataset (Table 4) and the impact of errors on spatial analyses observed in previous studies (Gueta and Carmel 2016). The robustness of the automated assessment was likely due to the fact that the EOO for most species was large, even after the considerable reduction caused by filtering.…”
Section: Discussionmentioning
confidence: 78%
“…To model the paleoclimatic distribution of Triplostegia, we used three global We corrected all the occurrence data to reduce the negative effect of sampling bias on the performance of the SDMs. For records from GBIF, CVH, and SDNPT, we excluded the duplicated entries within a 2.5 arc-minute pixel (4.3 km at the equator) and omitted entries with implausible geographical coordinates using ArcGIS 10.2 (ESRI Inc., Redlands, CA, USA; http://www.esri.com/software/arcgis/arcgis-for-desktop) and Google Earth (http://earth.google.com/; Gueta & Carmel, 2016;Maldonado et al, 2015). We also checked voucher photos for records from CVH and SDNPT to make sure that the species are correctly identified.…”
Section: Species Distribution Modelingmentioning
confidence: 99%
“…The distribution data for KĆ«marahou ( n = 406) and Kuta ( n = 3559) were derived from databases, herbarium specimens and field observations (Table S3). To improve species distribution model (SDM) performance, occurrence records were cleaned; points were removed if they were obviously inaccurate (e.g., points in the ocean), missing coordinates, located exactly at the centre of the country (which suggests incorrect georeferencing) or had fewer than three decimal digits in the latitude or longitude (which suggests insufficient spatial accuracy; Gueta & Carmel, ). True absences were not available for these species; to account for climatic bias of presence data, pseudo‐absences were randomly distributed within a buffer region 200 km from presence points (Figures S2 and S3) (Barbet‐Massin, Jiguet, Albert, & Thuiller, ).…”
Section: Methodsmentioning
confidence: 99%
“…The distribution data for KĆ«marahou (n = 406) and Kuta (n = 3559) were derived from databases, herbarium specimens and field observations (Table S3). To improve species distribution model (SDM) performance, occurrence records were cleaned; points were removed if they were obviously inaccurate (e.g., points in the ocean), missing coordinates, located exactly at the centre of the country (which suggests incorrect georeferencing) or had fewer than three decimal digits in the latitude or longitude (which suggests insufficient spatial accuracy; Gueta & Carmel, 2016 (VanDerWal, Shoo, Graham, & Williams, 2009) was 200 km. To reduce spatial sampling bias, counter residual spatial autocorrelation and improve SDM performance (Boria, Olson, Goodman, & Anderson, 2014;de Oliveira, Rangel, Lima-Ribeiro, Terribile, & Diniz-Filho, 2014;Hijmans, 2012), occurrence and pseudo-absence points were spatially thinned by randomly removing a single point from within a 10 km radius using the spThin package (Aiello-Lammens, Boria, Radosavljevic, Vilela, & Anderson, 2015).…”
Section: Species Distribution Modellingmentioning
confidence: 99%