In machine learning one often assumes the data are independent when evaluating model performance. However, this rarely holds in practise. Geographic information data sets are an example where the data points have stronger dependencies among each other the closer they are geographically. This phenomenon known as spatial autocorrelation (SAC) causes the standard cross validation (CV) methods to produce optimistically biased prediction performance estimates for spatial models, which can result in increased costs and accidents in practical applications. To overcome this problem we propose a modified version of the CV method called spatial k-fold cross validation (SKCV), which provides a useful estimate for model prediction performance without optimistic bias due to SAC. We test SKCV with three real world cases involving open natural data showing that the estimates produced by the ordinary CV are up to 40% more optimistic than those of SKCV. Both regression and classification cases are considered in our experiments. In addition, we will show how the SKCV method can be applied as a criterion for selecting data sampling density for new research area.
Purpose
To develop and validate a classifier system for prediction of prostate cancer (PCa) Gleason score (GS) using radiomics and texture features of T
2
-weighted imaging (T
2
w), diffusion weighted imaging (DWI) acquired using high b values, and T
2
-mapping (T
2
).
Methods
T
2
w, DWI (12 b values, 0–2000 s/mm
2
), and T
2
data sets of 62 patients with histologically confirmed PCa were acquired at 3T using surface array coils. The DWI data sets were post-processed using monoexponential and kurtosis models, while T
2
w was standardized to a common scale. Local statistics and 8 different radiomics/texture descriptors were utilized at different configurations to extract a total of 7105 unique per-tumor features. Regularized logistic regression with implicit feature selection and leave pair out cross validation was used to discriminate tumors with 3+3 vs >3+3 GS.
Results
In total, 100 PCa lesions were analysed, of those 20 and 80 had GS of 3+3 and >3+3, respectively. The best model performance was obtained by selecting the top 1% features of T
2
w, ADC
m
and K with ROC AUC of 0.88 (95% CI of 0.82–0.95). Features from T
2
mapping provided little added value. The most useful texture features were based on the gray-level co-occurrence matrix, Gabor transform, and Zernike moments.
Conclusion
Texture feature analysis of DWI, post-processed using monoexponential and kurtosis models, and T
2
w demonstrated good classification performance for GS of PCa. In multisequence setting, the optimal radiomics based texture extraction methods and parameters differed between different image types.
Forest harvesting operations with heavy machinery can lead to significant soil rutting. Risks of rutting depend on the soil bearing capacity which has considerable spatial and temporal variability. Trafficability prediction is required in the selection of suitable operation sites for a given time window and conditions, and for on-site route optimization during the operation. Integrative tools are necessary to plan and carry out forest operations with minimal negative ecological and economic impacts. This study demonstrates a trafficability prediction framework that utilizes a spatial hydrological model and a wide range of spatial data. Trafficability was approached by producing a rut depth prediction map at a 16 × 16 m grid resolution, based on the outputs of a general linear mixed model developed using field data from Southern Finland, modelled daily soil moisture, spatial forest inventory and topography data, along with field measured rolling resistance and information on the mass transported through the grid cells. Dynamic rut depth prediction maps were produced by accounting for changing weather conditions through hydrological modelling. We also demonstrated a generalization of the rolling resistance coefficient, measured with harvester CAN-bus channel data. Future steps towards a nationwide prediction framework based on continuous data flow, process-based modelling and machine learning are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.