Maximum distance separable (MDS) matrices are often used in the linear layer of a block cipher due to their good diffusion property. A well-designed lightweight MDS matrix, especially an involutory one, can provide both security and performance benefits to the cipher. Finding the corresponding effective linear straight-line program (SLP) of the circuit of a linear layer is still a challenging problem. In this article, first, we propose a new heuristic algorithm called Superior Boyar-Peralta (SBP) in the computation of the minimum number of two-input Exclusive-OR (XOR) gates with the minimum circuit depth for the SLPs. Contrary to the existing global optimization methods supporting only two-input XOR gates, SBP heuristic algorithm provides the best global optimization solutions, especially for extracting low-latency circuits. Moreover, we give a new 4 × 4 involutory MDS matrix over F24, which requires only 41 XOR gates and depth 3 after applying SBP heuristic, whereas the previously best-known cost is 45 XOR gates with the same depth. In the second part of the article, for further optimization of the circuit area of linear layers with multiple-input XOR gates, we enhance the recently proposed BDKCI heuristic algorithm by incorporating circuit depth awareness, which limits the depth of the circuits created. By using the proposed circuit depth-bounded version of BDKCI, we present better circuit implementations of linear layers of block ciphers than those given in the literature. For instance, the given circuit for the AES MixColumn matrix only requires 44 XOR gates/depth 3/240.95 GE in the STM 130 nm (simply called ASIC4) library, while the previous best-known result is 55 XOR gates/depth 5/243.00 GE. Much better, our new 4 × 4 involutory MDS matrix requires only 19 XOR gates/depth3/79.75 GE in the STM 90 nm (simply called ASIC1) library, which is the lightest and superior to the state-of-the-art results.