Abstract-The spatialization of the sound field in a room is studied, in particular the evolution of room impulse responses as a function of their spatial positions. It was observed that the multidimensional spectrum of the solution of the wave equation has an almost bandlimited character. Therefore, sampling and interpolation can easily be applied using signals on an array. The decay of the spectrum is studied on both temporal and spatial frequency axes. The influence of the decay on the performance of the interpolation is analyzed. Based on the support of the spectrum, the number and the spacing between the microphones is determined for the reconstruction of the sound pressure field up to a certain temporal frequency and with a certain reconstruction quality. The optimal sampling pattern for the microphone positions is given for the linear, planar and three-dimensional case. Existing techniques usually make use of room models to recreate the sound field present at some point in the space. The presented technique simply starts from the measurements of the sound pressure field in a finite number of positions and with this information the sound pressure field can be recreated at any spatial position. Finally, simulations and experimental results are presented and compared with the theory.
Abstract-Videos representing flames, water, smoke, etc., are often defined as dynamic textures: "textures" because they are characterized by the redundant repetition of a pattern and "dynamic" because this repetition is also in time and not only in space. Dynamic textures have been modeled as linear dynamic systems by unfolding the video frames into column vectors and describing their trajectory as time evolves. After the projection of the vectors onto a lower dimensional space by a singular value decomposition (SVD), the trajectory is modeled using system identification techniques. Synthesis is obtained by driving the system with random noise. In this paper, we show that the standard SVD can be replaced by a higher order SVD (HOSVD), originally known as Tucker decomposition. HOSVD decomposes the dynamic texture as a multidimensional signal (tensor) without unfolding the video frames on column vectors. This is a more natural and flexible decomposition, since it permits us to perform dimension reduction in the spatial, temporal, and chromatic domain, while standard SVD allows for temporal reduction only. We show that for a comparable synthesis quality, the HOSVD approach requires, on average, five times less parameters than the standard SVD approach. The analysis part is more expensive, but the synthesis has the same cost as existing algorithms. Our technique is, thus, well suited to dynamic texture synthesis on devices limited by memory and computational power, such as PDAs or mobile phones.
Abstract-In many applications, the sampling frequency is limited by the physical characteristics of the components: the pixel pitch, the rate of the analog-to-digital (A/D) converter, etc. A lowpass filter is usually applied before the sampling operation to avoid aliasing. However, when multiple copies are available, it is possible to use the information that is inherently present in the aliasing to reconstruct a higher resolution signal. If the different copies have unknown relative offsets, this is a nonlinear problem in the offsets and the signal coefficients. They are not easily separable in the set of equations describing the super-resolution problem. Thus, we perform joint registration and reconstruction from multiple unregistered sets of samples. We give a mathematical formulation for the problem when there are sets of samples of a signal that is described by expansion coefficients. We prove that the solution of the registration and reconstruction problem is generically unique if + 1. We describe two subspace-based methods to compute this solution. Their complexity is analyzed, and some heuristic methods are proposed. Finally, some numerical simulation results on one-and two-dimensional signals are given to show the performance of these methods.
We present a system that automatically recommends tags for YouTube videos solely based on their audiovisual content. We also propose a novel framework for unsupervised discovery of video categories that exploits knowledge mined from the World-Wide Web text documents/searches. First, video content to tag association is learned by training classifiers that map audiovisual content-based features from millions of videos on YouTube.com to existing uploadersupplied tags for these videos. When a new video is uploaded, the labels provided by these classifiers are used to automatically suggest tags deemed relevant to the video. Our system has learned a vocabulary of over 20,000 tags. Secondly, we mined large volumes of Web pages and search queries to discover a set of possible text entity categories and a set of associated is-A relationships that map individual text entities to categories. Finally, we apply these is-A relationships mined from web text on the tags learned from audiovisual content of videos to automatically synthesize a reliable set of categories most relevant to videos -along with a mechanism to predict these categories for new uploads. We then present rigorous rating studies that establish that: (a) the average relevance of tags automatically recommended by our system matches the average relevance of the uploader-supplied tags at the same or better coverage and (b) the average precision@K of video categories discovered by our system is 70% with K=5.
We propose a new image device called gigavision camera. The main differences between a conventional and a gigavision camera are that the pixels of the gigavision camera are binary and orders of magnitude smaller. A gigavision camera can be built using standard memory chip technology, where each memory bit is designed to be light sensitive. A conventional gray level image can be obtained from the binary gigavision image by low-pass filtering and sampling.The main advantage of the gigavision camera is that its response is non-linear and similar to a logarithmic function, which makes it suitable for acquiring high dynamic range scenes. The larger the number of binary pixels considered, the higher the dynamic range of the gigavision camera will be. In addition, the binary sensor of the gigavision camera can be combined with a lens array in order to realize an extremely thin camera. Due to the small size of the pixels, this design does not require deconvolution techniques typical of similar systems based on conventional sensors.Index Terms-high dynamic range imaging, computational photography, thin camera, logarithmic sensor response
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.