We investigated the contributions of physicochemical features to a comprehensive evaluation of the Japanese sake known as 'Junmai Ginjo' by applying machine learning. We used 173 samples of the commercial Japanese sake. The sensor y evaluation was conducted by 35 panelists. The panel conducted the evaluation of each sample using five statements for the comprehensive evaluation of the sample. General analysis, substance-related nucleic acid, volatile components and simplified analyses were measured as physicochemical analyses. We performed regression analyses using a multiple regression analysis (MRA), partial least squares regression (PLS) and machine learning employing a suppor t vector machine (SVM), an ar tificial neural network (ANN), and random forest (RF). The results of these five analysis methods have demonstrated that machine learning (especially RF) provides comparable or higher prediction accuracy and better fitting than MRA. We also discuss the contribution of each physicochemical feature to the evaluation scores based on the regression coefficients obtained by MRA and the features' importance obtained in RF. The analysis of the individual scores indicated that ethyl caproate and isoamyl acetate make large contributions to influence the sake evaluation.
We estimated the quality component values of the commercial Japanese sake Junmai Ginjo by using electronic (e)-nose and e-tongue data. Regression analysis methods were applied to predict the components. Characteristic features of Junmai Ginjo such as acidity, amino acid content, glucose and nine volatile components were used as objective variables. Explanatory variables were the 99 peak data obtained by an e-nose and seven sensor data obtained by an e-tongue. The prediction accuracy by the partial least squares regression method using e-nose and e-tongue data was 7.57 average error% (the ratio of the mean absolute error to the component value range). With the application of other regression analyses (multiple regression analysis, support-vector machine, random forest, gradient boosting), the prediction accuracy was improved for all components except the acidity and amino acid content. With the application of other regression analyses and the addition of the data of seven simplified analyses (Brix, pH, electrical conductivity, OD260, OD280, simplified alcohol content, simplified glucose content), the prediction accuracy was improved for all components. (average error%: 5.04) The analysis conditions (i.e., the regression analysis and the dataset of explanator y variables) for the best score dif fered depending on the component. Thus, when predicting components by a regression analysis, it is necessary to prepare a plurality of analysis conditions and challenges.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.