2018
DOI: 10.1007/s10618-018-00607-x
|View full text |Cite
|
Sign up to set email alerts
|

The spatial leave-pair-out cross-validation method for reliable AUC estimation of spatial classifiers

Abstract: Machine learning based classification methods are widely used in geoscience applications, including mineral prospectivity mapping. Typical characteristics of the data, such as small number of positive instances, imbalanced class distributions and lack of verified negative instances make ROC analysis and cross-validation natural choices for classifier evaluation. However, recent literature has identified two sources of bias, that can affect reliability of area under ROC curve estimation via cross-validation on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 37 publications
1
4
0
Order By: Relevance
“…However, their performance could have been better when applied to totally new data from a different time or different location. This is in line with prior research indicating that CCV often leads to models prone to overoptimistic and biased prediction results [9,17,51]. The SCV method produced worse results in the testing dataset than in B dataset (same area, another time).…”
Section: Discussionsupporting
confidence: 91%
See 1 more Smart Citation
“…However, their performance could have been better when applied to totally new data from a different time or different location. This is in line with prior research indicating that CCV often leads to models prone to overoptimistic and biased prediction results [9,17,51]. The SCV method produced worse results in the testing dataset than in B dataset (same area, another time).…”
Section: Discussionsupporting
confidence: 91%
“…Machine learning models that are robust and efficient at making realistic predictions typically assume that the data used to train them is independent and identically distributed (i.i.d.) [7,[9][10][11][12]. However, when this assumption is violated, it can lead to overfitting the highly flexible methods to the training data and underestimating spatial prediction errors [13].…”
Section: Introductionmentioning
confidence: 99%
“…The AUC is a criterion for estimating the probability that a classifier (predictor), i.e. gene-set can predict better than randomly selected classifiers 26 .…”
Section: Methodsmentioning
confidence: 99%
“…The former involves analyzing patterns of spatial distribution of toponyms in the text. De Bruijn et al have suggested a method for extracting toponyms by comparing databases of toponyms and OpenStreetMap (Airola et al, 2019). Current methods that rely on specific geographic directories are unable to identify unofficial place names mentioned in unstructured text.…”
Section: Introductionmentioning
confidence: 99%