2022
DOI: 10.21203/rs.3.rs-2003069/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast Bayesian Optimization of Needle-in-a-Haystack Problems using Zooming Memory-Based Initialization

Abstract: Needle-in-a-Haystack problems exist across a wide range of applications including rare disease prediction, ecological resource management, fraud detection, and material property optimization. A Needle-in-a-Haystack problem arises when there is an extreme imbalance of optimum conditions relative to the size of the dataset. For example, only 0.82% out of 146k total materials in the open-access Materials Project database have a negative Poisson's ratio. However, current state-of-the-art optimization algorithms ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…Upon changing the SDL to cool plastic components prior to removal, the system proceeded to make a series of jumps in maximum K s * from 60 to 68%. Finally, at experiment 17,730 we noted that the predictive model used by the SDL was systematically underpredicting K s * for high-performing components, so we implemented a process where the proposed experiment was selected using a model built only on data close to the best-observed experiment, a process similar to algorithms such as TURBO or ZOMBI 52 , 53 . This intervention led the SDL to progress from 70.6% to 75.2% in maximum K s * .…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Upon changing the SDL to cool plastic components prior to removal, the system proceeded to make a series of jumps in maximum K s * from 60 to 68%. Finally, at experiment 17,730 we noted that the predictive model used by the SDL was systematically underpredicting K s * for high-performing components, so we implemented a process where the proposed experiment was selected using a model built only on data close to the best-observed experiment, a process similar to algorithms such as TURBO or ZOMBI 52 , 53 . This intervention led the SDL to progress from 70.6% to 75.2% in maximum K s * .…”
Section: Resultsmentioning
confidence: 99%
“…We should note that this study was not dedicated to benchmarking the acceleration inherent to using SDLs. Prior work, including our own 44 , 49 , has focused on such benchmarking, and ongoing research is focused on developing algorithms and processes to efficiently select experiments 53 , 57 . In this work, our main focus was discovering new mechanical structures, and we believe that this type of sustained campaign is an example of how SDLs can fruitfully exist in the materials discovery pipeline.…”
Section: Discussionmentioning
confidence: 99%
“…The problems typically pose a challenge to Bayesian optimisation algorithms which exhibit slow convergence or get stuck in local optima. Siemenn et al 74 developed a new approach they termed the Zooming Memory-Based Initialization algorithm (ZoMBI) to tackle Needle-in-a-haystack problems building on traditional Bayesian optimisation. Their approach starts by iteratively zooming in on the manifold search bounds, with each dimension handled independently, using a set number of memory points to identify the plausible region containing the global optimum needle.…”
Section: Accelerated Computationsmentioning
confidence: 99%
“…This type of domain knowledge can be more amenable for physical science problems such as materials science for example where an optimal linear mixture of chemical compositions is sought [6]. In such domains especially, due to the inherent extreme imbalance of optimality conditions, most surrogate models resort to smoothing over the optimum or over-predicting near its location which can often result to a local-minima confinement [35,32]. While other choices of surrogate models have been used in BO such as Random forests [14] and Bayesian neural networks [37], these have empirically shown to predominate exploration and lead to poor performance [7].…”
Section: Introductionmentioning
confidence: 99%