We propose a categorical foundation of gradientbased machine learning algorithms in terms of lenses, parametrised maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as as MSE and Softmax cross-entropy, shedding new light on their similarities and differences. Our approach also generalises beyond neural networks (modelled in categories of smooth maps), accounting for other structures relevant to gradient-based learning such as boolean circuits. Finally, we also develop a novel implementation of gradient-based learning in Python, informed by the principles introduced by our framework.Note the slightly counter-intuitive contravariance in the 2-cells which arises from the Grothendieck construction. This permits reindexing: given a morphism r : P ′ − → P in C we can reparameterize a given P -parameterized morphism to a P ′ -parameterized morphism. We often work with strict monoidal categories whose objects are natural numbers and whose monoidal product is addition. In such settings, Corollary 2.2. If C is a strict symmetric monoidal category, then Para(C) is a 2-category.We have shown how Para acts on some base category C. However, Para is also natural with respect to base change, i.e. given a functor F : C − → D, there is an induced functor Para(F ) : Para(C) − → Para(D): Proposition 2.3. Let C and D be strict symmetric monoidal categories. Let F : C − → D be a lax symmetric monoidal functor. Then there is an induced 2-functor Para(F ) : Para(C) − → Para(D)which agrees with F on objects.This 2-functor is straightforward: for a 1-cell (P, f ) : A − → B, it applies F to P and f and uses the (lax) comparison to get a map of the correct type. Lastly, we mention that Para(C) inherits the symmetric monoidal structure from C and that the induced 2-functor Para(F ) respects that structure. This will allow us to compose learners not only in series, but also in parallel. Cartesian reverse differential categoriesFundamental to all gradient-based learning is, of course, the gradient operation. In most cases this gradient operation is performed in the category of smooth maps between Euclidean spaces. However, recent work [7] has shown that gradient-based learning can also work well in other categories; for example, in a category of boolean circuits. Thus, to encompass these examples in a single framework, it is helpful to work in a category with an abstract gradient operation. Specifically, we will work in a Cartesian reverse differential category (first defined in [14]), a category in which every map has an associated reverse derivative. The reverse derivative is a generalization of the gradient operation: for example, in the category of smooth maps, the reverse derivative of a map of type f : R n − → R is essentially its derivative. Definition 2.4.
We propose a categorical semantics of gradient-based machine learning algorithms in terms of lenses, parametric maps, and reverse derivative categories. This foundation provides a powerful explanatory and unifying framework: it encompasses a variety of gradient descent algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well as a variety of loss functions such as MSE and Softmax cross-entropy, shedding new light on their similarities and differences. Our approach to gradient-based learning has examples generalising beyond the familiar continuous domains (modelled in categories of smooth maps) and can be realized in the discrete setting of boolean circuits. Finally, we demonstrate the practical significance of our framework with an implementation in Python.
Discussions of tangent vectors, tangent spaces, and differentials are important in both differential geometry and algebraic geometry. In this paper, we use the abstract notion of a tangent category to make some of these commonalities precise. In particular, we focus on the idea of a differential bundle in a tangent category, which gives a new way to compare smooth vector bundles and modules. The results of this paper also give a new characterization of the opposite category of modules over a commutative ring and the opposite category of quasicoherent sheaves.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.