In this paper, we analyze how to calculate the matrix transposition in continuous flow by using a memory or group of memories. The proposed approach studies this problem for specific conditions such as square and non-square matrices, use of limited access memories and use of several memories in parallel. Contrary to previous approaches, which are based on specific cases or examples, the proposed approach derives the fundamental theory involved in the problem of matrix transposition in a continuous flow. This allows for obtaining the exact equations for the read and write addresses of the memories and other control signals in the circuits. Furthermore, the cases that involve non-square matrices, which have not been studied in detail in the literature, are analyzed in depth in this paper. Experimental results show that the proposed approach is capable of transposing matrices of 8192 x 8192 32-bit data received in series at a rate of 200 mega samples per second, which doubles the throughput of previous approaches.