The use of fluorescence data coupled with neural networks for improved predictability of drinking water disinfection by-products (DBPs) was investigated. Novel application of autoencoders to process high-dimensional fluorescence data was related to common dimensionality reduction techniques of parallel factors analysis (PARAFAC) and principal component analysis (PCA). The proposed method was assessed based on component interpretability as well as for prediction of organic matter reactivity to formation of DBPs. Optimal prediction accuracies on a validation dataset were observed with an autoencoder-neural network approach or by utilizing the full spectrum without pre-processing. Latent representation by an autoencoder appeared to mitigate overfitting when compared to other methods. Although DBP prediction error was minimized by other pre-processing techniques, PARAFAC yielded interpretable components which resemble fluorescence expected from individual organic fluorophores. Through analysis of the network weights, fluorescence regions associated with DBP formation can be identified, representing a potential method to distinguish reactivity between fluorophore groupings. However, distinct results due to the applied dimensionality reduction approaches were observed, dictating a need for considering the role of data pre-processing in the interpretability of the results. In comparison to common organic measures currently used for DBP formation prediction, fluorescence was shown to improve prediction accuracies, with improvements to DBP prediction best realized when appropriate pre-processing and regression techniques were applied. The results of this study show promise for the potential application of neural networks to best utilize fluorescence EEM data for prediction of organic matter reactivity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.