Here we summarize recent progress in machine learning for the chemical sciences. We outline machine-learning techniques that are suitable for addressing research questions in this domain, as well as future directions for the field. We envisage a future in which the design, synthesis, characterization and application of molecules and materials is accelerated by artificial intelligence.
Preprocessing of chromatographic and spectral data is an important aspect of analytical sciences. In particular, recent advances in proteomics have resulted in the generation of large data sets that require analysis. To assist accurate comparison of chemical signals, we propose two methods for the alignment of multiple spectral data sets. Based on methods previously described, each chromatograph or spectrum to be aligned is divided and aligned as individual segments to a reference. However, our methods make use of fast Fourier transform for the rapid computation of a cross-correlation function that enables alignments between samples to be optimized. The proposed methods are demonstrated in comparison with an existing method on a chromatographic and a mass spectral data set. It is shown that our methods provide an advantage of speed and a reduction of the number of input parameters required. The software implementations for the proposed alignment methods are available under the downloads section at http://ptcl.chem.ox.ac.uk/~jwong/specalign.
The software is free of charge and available for download from http://ptcl.chem.ox.ac.uk/~jwong/specalign. Supports Windows operating systems including Windows 9X/NT/2000/XP.
Artificial neural networks (ANNs) are comparatively straightforward to understand and use in the analysis of scientific data. However, this relative transparency may encourage their use in an uncritical, and therefore possibly unproductive, fashion. The geometry of a network is among the most crucial factors in the successful deployment of network tools; in this review, we cover methods that can be used to determine optimum or near-optimum geometries. These methods of determining neural network architecture include the following: (i) trial and error, in which architectures chosen semirandomly are tested and modified by the user; (ii) empirical or statistical methods, in which an ANN's internal parameters are adjusted based on the model's performance; (iii) hybrid methods, such as fuzzy inference; (iv) constructive and/or pruning algorithms, that add and/or remove neurons or weights from an initial architecture, respectively, based on a predefined link between architecture and ANN performance; (v) evolutionary strategies, which search the topology space using genetic operators to vary the neural network parameters. Several case studies illustrate the development of neural network models for applications in chemistry and chemical engineering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.