We present the description, results, and analysis of the experiments conducted to find the equivalent resolution associated with handheld devices. That is, the resolution from which users stop perceiving quality improvements if better resolutions are presented to them in such devices. Thus, it is the maximum resolution that it is worth considering for generating and delivering video, as long as sequences are not too intensively compressed. Therefore, the detection of the equivalent resolutions allows for notable savings in bandwidth consumption. Subjective assessments have been carried out on fifty subjects using a set of video sequences of very different nature and four handheld devices with a broad range of screen dimensions. The results prove that the equivalent resolution in current handheld devices is 720p as higher resolutions are not valued by users.
This paper analyzes the impact of signal phase handling in one of the most popular architectures for generative synthesis of audio effects: variational autoencoders (VAEs). Until quite recently, autoencoders based on the Fast Fourier Transform routinely avoided the phase of the signal. They store the phase information and retrieve it at the output or rely on signal phase regenerators such as Griffin-Lim. We evaluate different VAE networks capable of generating a latent space with intrinsic information from signal amplitude and phase. The Modulated Complex Lapped Transform (MCLT) has been evaluated as an alternative to the Short Time Fourier Transform (STFT). A novel database on beats has been designed for testing the architectures. Results were objectively assessed (reconstruction errors and objective metrics approximating opinion scores) with autoencoders on STFT and MCLT representations, using Griffin-Lim phase regeneration, multichannel networks, as well as the Complex VAE. The autoencoders successfully learned to represent the phase information and handle it in a holistic approach. State-of-the-art quality standards were reached for audio effects. The autoencoders show a remarkable ability to generalize and deliver new sounds, while overall quality depends on the reconstruction of phase and amplitude.
Since the emergence of a new strain of coronavirus known as SARS-CoV-2, many countries around the world have reported cases of COVID-19 disease caused by this virus. Numerous people's lives have been affected both from a health and an economic point of view. The long tradition of using mathematical models to generate insights about the transmission of a disease, as well as new computer techniques such as Artificial Intelligence, have opened the door to diverse investigations providing relevant information about the evolution of COVID-19. In this research, we seek to advance the existing epidemiological models based on microscopic Markov chains to predict the impact of the pandemic at medical and economic levels. For this purpose, we have made use of the Spanish population movements based on mobile-phone geographically-located information to determine its economic activity using Artificial Intelligence techniques and have developed a novel advanced epidemiological model that combines this information with medical data. With this tool, scenarios can be released with which to determine which restriction policies are optimal and when they have to be applied both to limit the destruction of the economy and to avoid the feared possible upsurge of the disease.
In cinema it is standard practice to improve the appearance of images by adding noise that simulates film grain. This is computationally very costly, so it is only done in post-production and not on the set. It is also limiting because the artists are not able to really experiment with the noise nor introduce novel looks. Furthermore, video compression requires a higher bit rate when the source material has film grain or any other type of high frequency texture. In this work, we introduce a method for adding texture to digital cinema that aims to solve these problems. The proposed algorithm is based on modeling retinal noise, with which the images processed by our method have a natural appearance. This ''retinal grain'' serves a double purpose. One is aesthetic, as it has parameters that allow to vary widely the resulting texture appearance, which make it an artistic tool for cinematographers. Results are validated through psychophysical experiments in which observers, including cinema professionals, prefer our method over film grain synthesis methods from academia and the industry. The other purpose of the retinal noise emulation method is to improve the quality of compressed video by masking compression artifacts, which allows to lower the encoding bit rate while preserving image quality, and to improve image quality while keeping the bit rate fixed. The effectiveness of our approach for improving coding efficiency, with average bit rate savings of 22.5%, has been validated through psychophysical experiments using professional cinema content shot in 4K, color-graded and where the amount of retinal noise was selected by a motion picture specialist based solely on aesthetic preference.
This paper presents a non-backend web architecture for generative audio mixing from the Freesound website using a Variational Autoencoder. It is designed to experiment with the nonexisting audios in large audio databases without the need to populate them. It works directly from the browser using JavaScript tools with a serverless approach and relies exclusively on the computational capacity of the client. The platform's Graphical User Interface allows rapid sampling of the autoencoder sound space and is under active development while the logic is finalized. A Variational Autoencoder has been trained to serve as the default model. Users can upload their own to operate independently. The platform aims to provide users with a straightforward and quick-access interface to generative sounds, supporting the audiovisual industry by filling the existing gaps in audio repositories with synthetic media.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.