Although modern path tracers are successfully being applied to many rendering applications, there is considerable interest to push them towards ever‐decreasing sampling rates. As the sampling rate is substantially reduced, however, even Monte Carlo (MC) denoisers–which have been very successful at removing large amounts of noise–typically do not produce acceptable final results. As an orthogonal approach to this, we believe that good importance sampling of paths is critical for producing better‐converged, path‐traced images at low sample counts that can then, for example, be more effectively denoised. However, most recent importance‐sampling techniques for guiding path tracing (an area known as “path guiding”) involve expensive online (per‐scene) training and offer benefits only at high sample counts. In this paper, we propose an offline, scene‐independent deep‐learning approach that can importance sample first‐bounce light paths for general scenes without the need of the costly online training, and can start guiding path sampling with as little as 1 sample per pixel. Instead of learning to “overfit” to the sampling distribution of a specific scene like most previous work, our data‐driven approach is trained a priori on a set of training scenes on how to use a local neighborhood of samples with additional feature information to reconstruct the full incident radiance at a point in the scene, which enables first‐bounce importance sampling for new test scenes. Our solution is easy to integrate into existing rendering pipelines without the need for retraining, as we demonstrate by incorporating it into both the Blender/Cycles and Mitsuba path tracers. Finally, we show how our offline, deep importance sampler (ODIS) increases convergence at low sample counts and improves the results of an off‐the‐shelf denoiser relative to other state‐of‐the‐art sampling techniques.