Background. Biogeographers assess how species distributions and abundances affect the structure, function, and composition of ecosystems. Yet we face a major challenge: it is difficult to precisely map species across landscapes. Novel Earth observations could obviate this challenge. Airborne imaging spectrometers measure plant functional traits at high resolution, and these measurements can be used to identify tree species. Plant traits are often highly conserved within species, and highly variable between species, which provides the biophysical basis for species mapping. In this paper I describe a trait-based approach to species identification with imaging spectroscopy, CCB-ID, which was developed as part of a NIST-sponsored ecological data science evaluation (ECODSE).Methods. These methods were developed using NEON airborne imaging spectroscopy data. CCB-ID classifies tree species using trait-based reflectance variation and decision tree-based machine learning models, approximating a morphological trait and dichotomous key method traditionally used in botanical classification. First, outliers were removed using a spectral variance threshold. The remaining samples were transformed using principal components analysis and resampled by species to reduce common species biases. Gradient boosting and random forest classifiers were trained using the transformed and resampled feature data. Prediction probabilities were then calibrated using sigmoid regression, and sample-scale predictions were averaged to the crown scale.Results. This approach performed well according to the competition metrics, receiving a rank-1 accuracy score of 0.919, and a cross-entropy cost score of 0.447 on the test data. Accuracy and specificity scores were high for all species, but precision and recall scores were variable for rare species. PCA transformation improved accuracy scores compared to models trained using reflectance data, but outlier removal and data resampling exacerbated class imbalance problems.Discussion. CCB-ID accurately classified tree species using NEON imaging spectroscopy data, reporting the best classification scores among participants. However, it failed to overcome several well-known species mapping challenges, like precisely identifying rare species. Key takeaways include (1) training models to maximize metrics beyond accuracy (e.g. recall) could improve rare species predictions, (2) within-genus trait variation may drive spectral separability, precluding efforts to distinguish between functionally convergent species, (3) outlier removal and data resampling exacerbated class imbalance problems, and should be carefully implemented, (4) PCA transformation greatly improved model results, and (5) feature selection could further improve species classification models. CCB-ID is open source, designed for use with NEON data, and available to support future species mapping efforts.PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.26972v1 | CC BY 4.0 Open Access | rec:
AbstractBackground. Biogeographers assess how specie...