The future video coding standard named Versatile Video Coding (VVC) is expected by the end of 2020. VVC will enable better coding efficiency than the current High Efficiency Video Coding (HEVC) standard. This coding gain is brought by several coding tools. The Multiple Transform Selection (MTS) is one of the key coding tools that have been introduced in VVC. The MTS concept relies on three transform types including Discrete Cosine Transform (DCT)-II, Discrete Sine Transform (DST)-VII and DCT-VIII. Unlike the DCT-II that has fast computing algorithms, the DST-VII and DCT-VIII rely on more complex matrix multiplication. In this paper an approximation approach is proposed to reduce the computational cost of the DST-VII and DCT-VIII. The approximation consists in applying adjustment stages, based on sparse block-band matrices, to a variant of DCT-II family mainly DCT-II and its inverse. Genetic algorithm is used to derive the optimal coefficients of the adjustment matrices. Moreover, an efficient hardware implementation of the forward and inverse approximate transform module is proposed. The architecture design includes a pipelined and reconfigurable forward-inverse DCT-II core transform as it is the main core for DST-VII and DCT-VIII computations. The proposed 32-point 1D architecture including low cost adjustment stages allows the processing of a video in 2K and 4K resolutions at 1095 and 273 frames per second, respectively. A unified 2D implementation of forwardinverse DCT-II, approximate DST-VII and DCT-VIII is also presented. The synthesis results show that the design is able to sustain a video in 2K and 4K resolutions at 386 and 96 frames per second, respectively, while using only 12% of Alms, 22% of registers and 30% of DSP blocks of the Arria10 SoC platform.