The practice of mathematics involves discovering patterns and using these to formulate and prove conjectures, resulting in theorems. Since the 1960s, mathematicians have used computers to assist in the discovery of patterns and formulation of conjectures1, most famously in the Birch and Swinnerton-Dyer conjecture2, a Millennium Prize Problem3. Here we provide examples of new fundamental results in pure mathematics that have been discovered with the assistance of machine learning—demonstrating a method by which machine learning can aid mathematicians in discovering new conjectures and theorems. We propose a process of using machine learning to discover potential patterns and relations between mathematical objects, understanding them with attribution techniques and using these observations to guide intuition and propose conjectures. We outline this machine-learning-guided framework and demonstrate its successful application to current research questions in distinct areas of pure mathematics, in each case showing how it led to meaningful mathematical contributions on important open problems: a new connection between the algebraic and geometric structure of knots, and a candidate algorithm predicted by the combinatorial invariance conjecture for symmetric groups4. Our work may serve as a model for collaboration between the fields of mathematics and artificial intelligence (AI) that can achieve surprising results by leveraging the respective strengths of mathematicians and machine learning.
Deep learning has seen tremendous success over the past decade in computer vision, machine translation, and gameplay. This success rests crucially on gradient-descent optimization and the ability to "learn" parameters of a neural network by backpropagating observed errors. However, neural network architectures are growing increasingly sophisticated and diverse, which motivates an emerging quest for even more general forms of differentiable programming, where arbitrary parameterized computations can be trained by gradient descent. In this paper, we take a fresh look at automatic differentiation (AD) techniques, and especially aim to demystify the reverse-mode form of AD that generalizes backpropagation in neural networks.We uncover a tight connection between reverse-mode AD and delimited continuations, which permits implementing reverse-mode AD purely via operator overloading and without managing any auxiliary data structures. We further show how this formulation of AD can be fruitfully combined with multi-stage programming (staging), leading to an efficient implementation that combines the performance benefits of deep learning frameworks based on explicit reified computation graphs (e.g., TensorFlow) with the expressiveness of pure library approaches (e.g., PyTorch).function [Rumelhart et al. 1986]. Beyond this commonality, however, deep learning architectures vary widely. In fact, many of the practical successes are fueled by increasingly sophisticated and diverse network architectures that in many cases depart from the traditional organization into layers of artificial neurons. For this reason, prominent deep learning researchers have called for a paradigm shift from deep learning towards differentiable programming [LeCun 2018; Olah 2015] -essentially, functional programming with first-class gradients -based on the expectation that further advances in artificial intelligence will be enabled by the ability to "train" arbitrary parameterized computations by gradient descent.Programming language designers and compiler writers, key players in this vision, are faced with the challenge of adding efficient and expressive program differentiation capabilities. Forms of automatic gradient computation that generalize the classic backpropagation algorithm are provided by all contemporary deep learning frameworks, including TensorFlow and PyTorch. These implementations, however, are ad-hoc, and each framework comes with its own set of trade-offs and restrictions. In the academic world, automatic differentiation (AD) [Speelpenning 1980;Wengert 1964] is the subject of study of an entire community. Unfortunately, results disseminate only slowly between communities, and while the forward-mode flavor of AD is easy to grasp, descriptions of the reverse-mode flavor that generalizes backpropagation often appear mysterious to PL researchers. A notable exception is the seminal work of Pearlmutter and Siskind [2008], which cast AD in a functional programming framework and laid the groundwork for first-class, unrestricted, gradient ope...
Vocabulary lists of high-frequency lexical items are an important resource in language education and a key product of corpus research. However, no single vocabulary list will be useful for every learning context, with the appropriateness of such lists affected by the corpora on which they are based. This paper investigates the impact of corpus selection on one measure of lexical sophistication, Advanced Guiraud, focusing on two frequency lists originating from an in-house learner corpus (PELIC) and a global learner corpus (Cambridge Learner Corpus). This analysis shows that frequency lists derived from both types of learner corpus can effectively serve as the basis for measuring the development of lexical sophistication, regardless of the specific program of the learners. Therefore, publicly available learner corpus frequency lists can be a reliable resource for stakeholders interested in the lexical gains of language learners.
The ATLAS experiment has introduced and recently commissioned a completely new hardware sub-system of its first-level trigger: the topological processor (L1Topo). L1Topo consist of two AdvancedTCA blades mounting state-of-the-art FPGA processors, providing high input bandwidth (up to 4 Gb/s) and low latency data processing (200 ns). L1Topo is able to select collision events by applying kinematic and topological requirements on candidate objects (energy clusters, jets, and muons) measured by calorimeters and muon sub-detectors. Results from data recorded using the L1Topo trigger will be presented. These results demonstrate a significantly improved background event rejection, thus allowing for rate reduction with minimal efficiency loss. This improvement has been shown for several physics processes leading to low-p T leptons, including H → τ τ and J/ψ → µµ. In addition to describing the L1Topo trigger system, we will discuss the use of an accurate L1Topo simulation as a powerful tool to validate and optimize the performance of this new system. To reach the required accuracy, the simulation must mimic the approximations applied in firmware to execute the kinematic calculations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.