Sparsity-based techniques have been widely popular in signal processing applications such as compression, denoising, and compressed sensing. Recently, the learning of sparsifying transforms for data has received interest. The advantage of the transform model is that it enables cheap and exact computations. In Part I of this work, efficient methods for online learning of square sparsifying transforms were introduced and investigated (by numerical experiments). The online schemes process signals sequentially, and can be especially useful when dealing with big data, and for real-time, or limited latency signal processing applications. In this paper, we prove that although the associated optimization problems are non-convex, the online transform learning algorithms are guaranteed to converge to the set of stationary points of the learning problem. The guarantee relies on a few simple assumptions. In practice, the algorithms work well, as demonstrated by examples of applications to representing and denoising signals.