We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnostic, as we illustrate with several example applications that are built using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups. For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation and direct loss minimization. We do extensive performance evaluation in a set of applications, demonstrating significant efficiency gains and better scalability when these features are incorporated. Project page: https://sites.google.com/view/theseus-ai/
IntroductionReconciling traditional approaches with deep learning to leverage their complementary strengths is a common thread in a large body of recent work in robotics. In particular, an emerging trend is to differentiate through nonlinear least squares [1] which is a second-order optimization formulation at the heart of many problems in robotics [2-7] and vision [8][9][10][11][12][13]. Optimization layers as inductive priors in neural models have been explored in machine learning with convex optimization [14,15] and in meta learning with gradient descent [16,17] based first-order optimization.Differentiable nonlinear least squares provides a general scheme to encode inductive priors, as the objective function can be partly parameterized by neural models and partly with engineered domain-specific differentiable models. Here, input tensors define a sum of weighted squares objective function and output tensors are minima of that objective. In contrast, typical neural layers take input tensors through a linear transformation and some element-wise nonlinear activation function.The ability to compute gradients end-to-end is retained by differentiating through the optimizer which allows neural models to train on the final task loss, while also taking advantage of priors captured by the optimizer. The flexibility of such a scheme has led to promising state-of-the-art results in a wide range of applications such as structure from motion [18], motion planning [19], SLAM [20,21], bundle adjustment [22], state estimation [23,24], image alignment [25] with other applications like manipulation and tactile sensing [26,27], control [28], human pose tracking [29,30] to be explored. However, existing implementations from above are application specific, common underlying tools like optimizers get reimplemented, and features like sparse solvers, batching, and GPU support that impact efficiency are not always included. This has led to a fragmented literature where it is difficult to start work on new ideas or to build on the...