The recently standardized 3GPP codec for Enhanced Voice Services (EVS) offers new features and improvements for low-delay real-time communication systems. Based on a novel, switched low-delay speech/audio codec, the EVS codec contains various tools for better compression efficiency and higher quality for clean/noisy speech, mixed content and music, including support for wideband, super-wideband and full-band content. The EVS codec operates in a broad range of bitrates, is highly robust against packet loss and provides an AMR-WB interoperable mode for compatibility with existing systems. This paper gives an overview of the underlying architecture as well as the novel technologies in the EVS codec and presents listening test results showing the performance of the new codec in terms of compression and speech/audio quality
Traditionally, speech coding and audio coding were separate worlds. Based on different technical approaches and different assumptions about the source signal, neither of the two coding schemes could efficiently represent both speech and music at low bitrates. This paper presents a unified speech and audio codec, which efficiently combines techniques from both worlds. This results in a codec that exhibits consistently high quality for speech, music and mixed audio content. The paper gives an overview of the codec architecture and presents results of formal listening tests comparing this new codec with HE-AAC(v2) and AMR-WB+. This new codec forms the basis of the reference model in the ongoing MPEG standardization activity for Unified Speech and Audio Coding
Context based entropy coding has the potential to provide higher gain over memoryless entropy coding. However serious difficulties arise regarding the practical implementation in real-time applications due to its very high memory requirements. This paper presents an efficient method for designing context adaptive entropy coding while fulfilling low memory requirements. From a study of coding gain scalability as a function of context size, new context design and validation procedures are derived. Further, supervised clustering and mapping optimization are introduced to model efficiently the context. The resulting context modelling associated with an arithmetic coder was successfully implemented in a transform-based audio coder for real-time processing. It shows significant improvement over the entropy coding used in MPEG-4 AAC
Several state-of-the-art switched audio codecs employ the closed-loop mode decision to select the best coding mode at every frame. The closed-loop mode selection is known to have good performance but also high complexity. The new approach we propose in this paper is a low-complexity version of the closed-loop approach, based on similar decisions which compute the coding distortion of each mode and select the one with the lowest distortion. Our approach differs mainly in the way the coding distortions are calculated. We are able to notably reduce the complexity by only estimating the distortions without encoding and decoding the input for each mode. The new approach was implemented in the EVS codec standard and evaluated both objectively and subjectively. Compared to the closed-loop approach, it yields similar performance and lower complexity
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.