Machine learning (ML) methods have been applied to the analysis of a range of biological systems. This paper reviews the application of these methods to the problem domain of skin permeability and addresses critically some of the key issues. Specifically, ML methods offer great potential in both predictive ability and their ability to provide mechanistic insight to, in this case, the phenomena of skin permeation. However, they are beset by perceptions of a lack of transparency and, often, once a ML or related method has been published there is little impetus from other researchers to adopt such methods. This is usually due to the lack of transparency in some methods and the lack of availability of specific coding for running advanced ML methods. This paper reviews critically the application of ML methods to percutaneous absorption and addresses the key issue of transparency by describing in detail - and providing the detailed coding for - the process of running a ML method (in this case, a Gaussian process regression method). Although this method is applied here to the field of percutaneous absorption, it may be applied more broadly to any biological system.
The design of the data set, and possibly also the approach to validation of the model, is critical in the development of improved models. The size of the data set, if carefully controlled, was not generally a significant factor for these models and that models of excellent statistical quality could be produced from substantially smaller data sets.
The aim of this study is to use Gaussian Process Regression (GPR) methods to quantify the effect of experimental temperature (Texp) and choice of diffusion cell on model quality and performance. Methods Data was collated from the literature. Static and flow-through diffusion cell data was separated and a series of GPR experiments conducted. The effect of Texp was assessed by comparing a range of datasets where Texp either remained constant or was varied from 22 o C to 45 o C. Key findings Using data from flow-through diffusion cells results in poor model performance. Data from static diffusion cells resulted in significantly greater performance. Inclusion of data from flow-through cell experiments reduces overall model quality. Consideration of Texp improves model quality when the dataset used exhibits a wide range of experimental temperatures. Conclusions This study highlights the problem of collating literature data into datasets from which models are constructed without consideration of the nature of those data. In order to optimise model quality data from only static, Franz-type, experiments should be used to construct the model and Texp should either be incorporated as a descriptor in the model if data is collated from a range of studies conducted at different temperatures.
Parivash Ashrafi, Yi Sun, Neil Davey, Rod Adams, Marc B. Brown, Maria Prapopoulou, and Gary Moss, 'The Importance of Hyperparameters Selection within Small Datasets', in Proceedings of the 2015 International Joint Conference on Neural Networks, published in IEEE Explore on 1 October 2015, DOI: 10.1109/IJCNN.2015.7280645. @2015 IEEE.Gaussian Process is a Machine Learning technique that has been applied to the analysis of percutaneous absorption of chemicals through human skin. The normal, automatic method of setting the hyperparameters associated with Gaussian Processes may not be suitable for small datasets. In this paper we investigate whether a handcrafted search method of determining these hyperparameters is better for such datasets
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.