For Deaf people, access to the mobile telephone network in the United States is currently limited to text messaging, forcing communication in English as opposed to American Sign Language (ASL), the preferred language. Because ASL is a visual language, mobile video phones have the potential to give Deaf people access to real-time mobile communication in their preferred language. However, even today's best video compression techniques can not yield intelligible ASL at limited cell phone network bandwidths. Motivated by this constraint, we conducted one focus group and two user studies with members of the Deaf Community to determine the intelligibility effects of video compression techniques that exploit the visual nature of sign language. Inspired by eye tracking results that show high resolution foveal vision is maintained around the face, we studied region-of-interest encodings (where the face is encoded at higher quality) as well as reduced frame rates (where fewer, better quality, frames are displayed every second). At all bit rates studied here, participants preferred moderate quality increases in the face region, sacrificing quality in other regions. They also preferred slightly lower frame rates because they yield better quality frames for a fixed bit rate. The limited processing power of cell phones is a serious concern because a real-time video encoder and decoder will be needed. Choosing less complex settings for the encoder can reduce encoding time, but will affect video quality. We studied the intelligibility effects of this tradeoff and found that we can significantly speed up encoding time without severely affecting intelligibility. These results show promise for real-time access to the current low-bandwidth cell phone network through sign-language-specific encoding techniques.
This paper proposes a rate control algorithm for lossless region of interest (RoI) coding in HEVC intra-coding. The algorithm is developed for digital pathology images and allows for random access to the data. Based on an input RoI mask, the algorithm first encodes the RoI losslessly. According to the bit rate spent on the RoI, it then encodes the background by using rate control in order to meet an overall target bit rate. In order to increase rate control accuracy, the algorithm uses an R-λ model to approximate the slope of the rate-distortion curve, and updates any related model parameters during the encoding process. Random access is attained by coding the data using independent tiles. Experimental results show that the proposed algorithm attains the overall bit rate very accurately while providing lossless reconstruction of the RoI.
Abstract-We describe design of an adaptive video delivery system employing a perceptual preprocessing filter. Such filter receives parameters of the reproduction setup, such as viewing distance, pixel density, ambient illuminance, etc. It subsequently applies a contrast sensitivity model of human vision to remove spatial oscillations that are invisible under such conditions. By removing such oscillations the filter simplifies the video content, therefore leading to more efficient encoding without causing any visible alterations of the content. Through experiments, we demonstrate that the use of our filter can yield significant bit rate savings compared to conventional encoding methods that are not tailored to specific viewing conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.