On active learning methods for manifold data

Li, Hang; Runger, George C.

doi:10.1007/s11749-019-00694-y

Cited by 7 publications

(6 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Active learning is a rational sampling method that aims to identify the most informative data to label so that a supervised model trained on this data would perform better than a supervised model trained on an equivalent amount of labeled data chosen at random. 19 Active learning may also be known as sequential learning as it uses all measures up-to-date to inform the next-best candidate for labeling in an increasingly informed search for the optimal training set with minimal data. 20 Shmilovich et al have used this approach to traverse the chemical space of the DXXX-OPV3-XXXD molecular template, where OPV3 represents 1,4-distyrylbenzene and XXX represents variable tripeptides.…”

Section: ■ Introductionmentioning

confidence: 99%

Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets

Teijlingen

Tuttle

2021

J. Chem. Theory Comput.

View full text Add to dashboard Cite

Self-assembling peptide nanostructures have been shown to be of great importance in nature and have presented many promising applications, for example, in medicine as drug-delivery vehicles, biosensors, and antivirals. Being very promising candidates for the growing field of bottom-up manufacture of functional nanomaterials, previous work (Frederix, et al. 2011 and 2015) has screened all possible amino acid combinations for di- and tripeptides in search of such materials. However, the enormous complexity and variety of linear combinations of the 20 amino acids make exhaustive simulation of all combinations of tetrapeptides and above infeasible. Therefore, we have developed an active machine-learning method (also known as “iterative learning” and “evolutionary search method”) which leverages a lower-resolution data set encompassing the whole search space and a just-in-time high-resolution data set which further analyzes those target peptides selected by the lower-resolution model. This model uses newly generated data upon each iteration to improve both lower- and higher-resolution models in the search for ideal candidates. Curation of the lower-resolution data set is explored as a method to control the selected candidates, based on criteria such as log P . A major aim of this method is to produce the best results in the least computationally demanding way. This model has been developed to be broadly applicable to other search spaces with minor changes to the algorithm, allowing its use in other areas of research.

show abstract

Section: ■ Introductionmentioning

confidence: 99%

Beyond Tripeptides Two-Step Active Machine Learning for Very Large Data sets

Teijlingen

Tuttle

2021

J. Chem. Theory Comput.

View full text Add to dashboard Cite

show abstract

“…Instead of using fixed values for regularization parameters, model selection criterion with theoretical justification might provide better learning performance. Similar work has been discussed by Li et al (2019), where they maximize the likelihood function to choose the values of λ A and λ I in a Gaussian Process model. Secondly, there are other optimality criterion than the D/G "alphabetic" criteria in the field of optimal design of experiments.…”

Section: Discussionmentioning

confidence: 88%

Optimal Design of Experiments on Riemannian Manifolds

Li¹

2019

Preprint

Self Cite

View full text Add to dashboard Cite

Traditional optimal design of experiment theory is developed on Euclidean space. In this paper, new theoretical results of optimal design of experiments on Riemannian manifolds are provided. In particular, it is shown that D-optimal and G-optimal designs are equivalent on manifolds and provide a lower bound for the maximum prediction variance. In addition, a converging algorithm that finds the optimal experimental design on manifold data is proposed. Numerical experiments demonstrate the competitive performance of the new algorithm.

show abstract

“…Further research is recommended to refine this criterion of equivalence and to propose a sample selection algorithm aimed at optimizing this equivalence. This concept could be extended to other model architectures by identifying the mathematical components that could be controlled or evaluated in unsupervised sample selection [34].…”

Section: Discussionmentioning

confidence: 99%