In a large-scale effort, numerous parameters influencing the neural network interpretation of gas phase infrared spectra have been investigated. Predictions of the presence or absence of 26 different substructural entities were optimized by systematically observing the impact on functional group prediction accuracy for the following parameters: training duration, learning rate, momentum, sigmoidal discrimination and bias, spectral data reduction with four different methods, number of hidden nodes, individual instead of multioutput networks, size of the training set, noise level, and 12 different spectral preprocessing functions. The most promising approaches included constant monitoring of training progress with a 500 spectra cross-validation set, increasing the number of spectral examples in the training set from 511 to 2588, employing variance scaling, and using specialized instead of multioutput networks. An overall recognition accuracy of 93.8% for the presence and 95.7% for the absence of functionalities was achieved, while perfect prediction was reached for several present functional groups.
Combining gas phase infrared (IR) spectra with mass spectral (MS) data, a neural network has been developed to predict 26 different molecular substructures from multispectral information. The back-propagation procedure has been used for training, including its previously published modification, the flashcard algorithm. Present functional groups have been detected correctly in 86.4% of all cases, compared with 88.4% using only IR and 78.2% using only MS data for training and prediction. For only 8 out of the 26 functionalities does the joint utilization of infrared and mass spectra yield better prediction results, with the greatest improvement being for halogen bond predictions. The prediction of functional group absence results in accuracy of about 95.5% for both IR and IR/MS networks but only 87.1% for a stand alone MS network. Insights have been gained into the suitability of both data sets for neural network training by presenting just IR or MS data to a jointly trained neural network, revealing the amount of information the network utilizes from either spectroscopic technique. In addition, an algorithm which produces balanced training and test sets for multi-output neural networks has been devised.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.