Memory effects are ubiquitous in a wide variety of complex physical phenomena, ranging from glassy dynamics and metamaterials to climate models. The Generalized Langevin Equation (GLE) provides a rigorous way to describe memory effects via the so-called memory kernel in an integro-differential equation. However, the memory kernel is often unknown, and accurately predicting or measuring it via, e.g., a numerical inverse Laplace transform remains a herculean task. Here, we describe a novel method using deep neural networks (DNNs) to measure memory kernels from dynamical data. As a proof-of-principle, we focus on the notoriously long-lived memory effects of glass-forming systems, which have proved a major challenge to existing methods. In particular, we learn the operator mapping dynamics to memory kernels from a training set generated with the Mode-Coupling Theory (MCT) of hard spheres. Our DNNs are remarkably robust against noise, in contrast to conventional techniques. Furthermore, we demonstrate that a network trained on data generated from analytic theory (hard-sphere MCT) generalizes well to data from simulations of a different system (Brownian Weeks–Chandler–Andersen particles). Finally, we train a network on a set of phenomenological kernels and demonstrate its effectiveness in generalizing to both unseen phenomenological examples and supercooled hard-sphere MCT data. We provide a general pipeline, KernelLearner, for training networks to extract memory kernels from any non-Markovian system described by a GLE. The success of our DNN method applied to noisy glassy systems suggests that deep learning can play an important role in the study of dynamical systems with memory.