Symbolic regression, the task of extracting mathematical expressions from the observed data, plays a crucial role in scientific discovery. Despite the promising performance of existing methods, most of them conduct symbolic regression in an offline setting. That is, they treat the observed data points as given ones that are simply sampled from uniform distributions without exploring the expressive potential of data. However, for real-world scientific problems, the data used for symbolic regression are usually actively obtained by doing experiments, which is an online setting. Thus, how to obtain informative data that can facilitate the symbolic regression process is an important problem that remains challenging.
In this paper, we propose QUOSR, a query-based framework for online symbolic regression that can automatically obtain informative data in an iterative manner. Specifically, at each step, QUOSR receives historical data points, generates new x, and then queries the symbolic expression to get the corresponding y, where the (x, y) serves as new data points. This process repeats until the maximum number of query steps is reached. To make the generated data points informative, we implement the framework with a neural network and train it by maximizing the mutual information between generated data points and the target expression. Through comprehensive experiments, we show that QUOSR can facilitate modern symbolic regression methods by generating informative data.