Recently, the use of Near Infrared (NIR) spectral sensor in agricultural process is getting much attention, particularly for fruit quality evaluation. The sensor requires a spectrometer to produce some sufficient information called spectrum as interaction between physical matters of the sample with the electromagnetic spectrum. In fact, the presence of experimental error or/and measurement error due to the heterogeneous particle size, moisture content variability, sample density, the instrument noise and pretreatment experience are often cannot be avoided. These would damage the spectra collected which results to decrease the performance in model selection, and increases the prediction error as the harmful influence of possible outlier and leverage points in dataset. To encounter these, a robust pretreatment of NIR spectral data is needed to correct the spectra before it is used for post-processing using any statistical method. In this paper, several different classical pretreatment methods were evaluated and a new robust Generalized Multiplicative Scatter Correction (GMSC) algorithm was proposed to correct the additive and/or multiplicative baseline effects in the spectral data. A dataset of NIR spectral on oil palm (Elaeis guineensis Jacq.) fruit bunch was used in the simulation. In the simulation, a number of repetitions using the single and double cross validation with robust partial least square are also applied. The Desirability Indices as statistical measures are presented for evaluating the methods.
The extraction of relevant wavelengths from a large dataset of Near Infrared Spectroscopy (NIRS) is a significant challenge in vibrational spectroscopy research. Nonetheless, this process allows the improvement in the chemical interpretability by emphasizing the chemical entities related to the chemical parameters of samples. With the complexity in the dataset, it may be possible that irrelevant wavelengths are still included in the multivariate calibration. This yields the computational process to become unnecessary complex and decreases the accuracy and robustness of the model. In multivariate analysis, Partial Least Square Regression (PLSR) is a method commonly used to build a predictive model from NIR spectral data. However, in the PLSR method and common commercial chemometrics software, there is no standard wavelength selection procedure applied to screen the irrelevant wavelengths. In this study, a new robust wavelength selection procedure called the modified VIP-MCUVE (mod-VIP-MCUVE) using Filter-Wrapper method and input scaling strategy is introduced. The proposed method combines the modified Variable Importance in Projection (VIP) and modified Monte Carlo Uninformative Variable Elimination (MCUVE) to calculate the scale matrix of the input variable. The modified VIP uses the orthogonal components of Partial Least Square (PLS) in investigating the informative variable in the model by applying the amount of variation both in X and y{SSX,SSY}, simultaneously. The modified MCUVE uses a robust reliability coefficient and a robust tolerance interval in the selection procedure. To evaluate the superiority of the proposed method, the classical VIP, MCUVE, and autoscaling procedure in classical PLSR were also included in the evaluation. Using artificial data with Monte Carlo simulation and NIR spectral data of oil palm (Elaeis guineensis Jacq.) fruit mesocarp, the study shows that the proposed method offers advantages to improve model interpretability, to be computationally extensive, and to produce better model accuracy.
In cancer studies, the prediction of cancer outcome based on a set of prognostic variables has been a long-standing topic of interest. Current statistical methods for survival analysis offer the possibility of modelling cancer survivability but require unrealistic assumptions about the survival time distribution or proportionality of hazard. Therefore, attention must be paid in developing nonlinear models with less restrictive assumptions. Artificial neural network (ANN) models are primarily useful in prediction when nonlinear approaches are required to sift through the plethora of available information. The applications of ANN models for prognostic and diagnostic classification in medicine have attracted a lot of interest. The applications of ANN models in modelling the survival of patients with gastric cancer have been discussed in some studies without completely considering the censored data. This study proposes an ANN model for predicting gastric cancer survivability, considering the censored data. Five separate single time-point ANN models were developed to predict the outcome of patients after 1, 2, 3, 4, and 5 years. The performance of ANN model in predicting the probabilities of death is consistently high for all time points according to the accuracy and the area under the receiver operating characteristic curve.
The statistically inspired modification of the partial least squares (SIMPLS) is the most commonly used algorithm to solve a partial least squares regression problem when the number of explanatory variables (p) is larger than the sample size (n). Nonetheless, in the presence of irregular points (outliers), this method is no longer efficient. Therefore, the robust iteratively reweighted SIMPLS (RWSIMPLS), which is an improvement of the SIMPLS algorithm, is put forward to remedy this problem. However, the RWSIMPLS is still not very efficient with regard to its parameter estimations and outlier diagnostics. It also suffers from long computational times. This paper proposes a new robust SIMPLS that incorporates a new weight function constructed from nu-Support Vector Regression in its establishment. We call this method the robust iteratively reweighted SIMPLS based on nu-Support Vector Regression, denoted as SVR-RWSIMPLS. To avoid misclassification of observations, a new diagnostic plot is proposed to classify observations into regular observations, vertical outliers, good (GLPs) and bad leverage points (BLPs). The numerical results clearly indicate that the SVR-RWSIMPLS is more efficient, more robust and has less computational running times than the RWSIMPLS when multiple leverage points and vertical outliers exist. The proposed diagnostic plot is also very successful in classifying observations into correct groups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.