Techniques exploiting the sparsity of signals in a transform domain or dictionary have been popular in signal processing. Adaptive synthesis dictionaries have been shown to be useful in applications such as signal denoising, and medical image reconstruction. More recently, the learning of sparsifying transforms for data has received interest. The sparsifying transform model allows for cheap and exact computations. In this paper, we develop a methodology for online learning of square sparsifying transforms. Such online learning can be particularly useful when dealing with big data, and for signal processing applications such as real-time sparse representation and denoising. The proposed transform learning algorithms are shown to have a much lower computational cost than online synthesis dictionary learning. In practice, the sequential learning of a sparsifying transform typically converges faster than batch mode transform learning. Preliminary experiments show the usefulness of the proposed schemes for sparse representation, and denoising.