a b s t r a c tProcess analytical technology (PAT) plays an important role in the pharmaceutical industry. Calibration-free/ minimum methods in PAT are expected to aid in a deeper understanding of processes in the early development stage of new drugs. Iterative optimization technology (IOT), an existing calibration-free method, is not able to predict the compositions of nonideal mixtures because the Beer-Lambert law does not hold in some wavelength regions. In this paper, we propose IOT with wavelength selection based on excess absorption (WLSEA), which is available with at least one calibration sample. Excess absorption (EA) is the residual between the measured and ideal spectra of a mixture, and includes noise and spectral change related to molecular interactions. WLSEA determines a threshold of EA that separates noise and spectral change by minimizing prediction errors of IOT. Consequently, WLSEA selects a set of regions where predictive accuracy of IOT is high. WLSEA-IOT can be applied to predict compositions of both ideal and nonideal mixtures that have ideal regions. The performance of the proposed IOT is verified by analyses with three types of mixture spectra. The proposed wavelength selection method will enhance both development of quantitative methods and analyses of molecular interactions with infrared spectroscopy.
Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks.Keywords: Generative topographic mapping · QSAR · fragment descriptors · mol2vec · substructure vector embedding · distributed representation[a] S.
This article proposes a novel concentration prediction model that requires little training data and is useful for rapid process understanding. Process analytical technology is currently popular, especially in the pharmaceutical industry, for enhancement of process understanding and process control. A calibration-free method, iterative optimization technology (IOT), was proposed to predict pure component concentrations, because calibration methods such as partial least squares, require a large number of training samples, leading to high costs. However, IOT cannot be applied to concentration prediction in non-ideal mixtures because its basic equation is derived from the Beer-Lambert law, which cannot be applied to non-ideal mixtures. We proposed a novel method that realizes prediction of pure component concentrations in mixtures from a small number of training samples, assuming that spectral changes arising from molecular interactions can be expressed as a function of concentration. The proposed method is named IOT with virtual molecular interaction spectra (IOT-VIS) because the method takes spectral change as a virtual spectrum x into account. It was confirmed through the two case studies that the predictive accuracy of IOT-VIS was the highest among existing IOT methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.