a b s t r a c tA formalised means of simplifying hydrological models concurrent with calibration is proposed for use when nonlinear models can be initially formulated as over-parameterised constrained absolute deviation regressions of nonlinear expressions. This provides a flexible modelling framework for approximation of nonlinear situations, while allowing the models to be amenable to algorithmic simplification. The degree of simplification is controlled by a user-specified forcing parameter l. That is, an original overparameterised linear model is reduced to a simpler working model which is no more complex than required for a given application. The degree of simplification is a compromise between two factors. With weak simplification most parameters will remain, risking calibration overfitting. On the other hand, a high degree of simplification generates inflexible models. The linear LASSO (Least Absolute Shrinkage and Selection Operator) is utilised for the simplification process because of its ability to deal with linear constraints in the over-parameterised initial model.
Optimised wavelength selection is important to the development of new types of inexpensive and portable near infrared instruments that might be used on fruit in orchards. The use of discrete bandwidth devices, such as light-emitting diodes, requires preselection of a small number of discrete wavelengths. In this work, a kiwifruit data set consisting of 834 absorbance spectra and corresponding fruit dry-matter measurements, an important maturity indicator for kiwifruit, has been subjected to an exhaustive wavelength search to build optimal multiple linear regression models of up to seven wavelengths. Using a standard partial least-squares model as a benchmark, a six-wavelength model has been identified as an optimum, predicting kiwifruit dry matter with r 2 of 0.88 and root mean square error of prediction (RMSEP) of 1.22%. The sensitivity of the model to shifts in the key wavelengths was also evaluated, revealing that a 1 nm offset or a 0.25 nm random noise component would be enough to increase the RMSEP by around 0.04% in actual dry matter value or 3% in relative percentage terms.
The crowd-sourced Naturewatch GBIF dataset is used to obtain a species classification dataset containing approximately 1.2 million photos of nearly 20 thousand different species of biological organisms observed in their natural habitat. We present a general hierarchical species identification system based on deep convolutional neural networks trained on the NatureWatch dataset. The dataset contains images taken under a wide variety of conditions and is heavily imbalanced, with most species associated with only few images. We apply multi-view classification as a way to lend more influence to high frequency details, hierarchical fine-tuning to help with class imbalance and provide regularisation, and automatic specificity control for optimising classification depth. Our system achieves 55.8% accuracy when identifying individual species and around 90% accuracy at an average taxonomy depth of 5.1-equivalent to the taxonomic rank of "family"-when applying automatic specificity control.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.