Versatile Video Coding (VVC) is the next generation video coding standard expected by the end of 2020. Compared to its predecessor, VVC introduces new coding tools to make compression more efficient at the expense of higher computational complexity. This rises a need to design an efficient and optimised implementation especially for embedded platforms with limited memory and logic resources. One of the newly introduced tools in VVC is the Multiple Transform Selection (MTS). This latter involves three Discrete Cosine Transform (DCT)/Discrete Sine Transform (DST) types with larger and rectangular transform blocks. In this paper, an efficient hardware implementation of all DCT/DST transform types and sizes is proposed. The proposed design uses 32 multipliers in a pipelined architecture which targets an ASIC platform. It consists in a multi-standard architecture that supports the transform block of recent MPEG standards including AVC, HEVC and VVC. The architecture is optimized and removes unnecessary complexities found in other proposed architectures by using regular multipliers instead of multiple constant multipliers. The synthesized results show that the proposed method which sustain a constant throughput of two pixels/cycle and constant latency for all block sizes can reach an operational frequency of 600 Mhz enabling to decode in real-time 4K videos at 48 fps.
Versatile video coding (VVC) is the next generation video coding standard released in July 2020. VVC introduces new coding tools enhancing the coding efficiency compared to its predecessor high efficiency video coding (HEVC). These new tools have a significant impact on the VVC software decoder with a complexity estimated to two times HEVC decoder complexity. In particular, the adaptive loop filter (ALF) introduced in VVC as an in-loop filter increases both the decoding complexity and memory usage. These concerns need to be carefully addressed regarding the design of an efficient hardware implementation of a VVC decoder. In this paper, we present an efficient hardware implementation of the ALF tool for VVC decoder. The proposed solution establishes a novel scanning order between Luma and Chroma components that reduces significantly the ALF memory. The design takes advantage of all ALF features and establishes an unified hardware module for all ALF filters. The design uses 26 regular multipliers in a pipelined architecture with a fixed throughput of 2 pixels/cycle and fixed system latency regardless of the selected filter. This design operates at 600 MHz frequency enabling to decode on ASIC platform a 4K video at 30 frames per second in 4:2:2 chroma sub-sampling format.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.