We study the problem of wireless edge caching when file popularity is unknown and possibly non-stationary. A bank of J caches receives file requests and a utility is accrued for each request depending on the serving cache. The network decides dynamically which files to store at each cache and how to route them, in order to maximize total utility. The request sequence is assumed to be drawn from an arbitrary distribution, capturing time-variance, temporal and spatial locality of requests. For this challenging setting, we propose the Bipartite Supergradient Caching Algorithm (BSCA) which provably exhibits no regret (RT /T → 0). That is, as the time horizon T increases, BSCA achieves (at least) the same utility with the cache configuration that we would have chosen knowing all future requests. The learning rate of the algorithm is characterized by its regret expression RT = O( √ JT ), which is independent of the file library size. For the single-cache case, we prove that this is the lowest attainable bound. BSCA requires at each step J projections on intersections of boxes and simplices, for which we propose a tailored algorithm. Our model is the first that draws a connection between the network caching problem and Online Convex Optimization, and we demonstrate its generality by discussing various practical extensions and presenting a tracedriven comparison with state-of-the-art competitors.