We present a technique for building a three-dimensional description of a static scene from a dense sequence of images. These images are taken in such rapid succession that they form a solid block of data in which the temporal continuity from image to image is approximately equal to the spatial continuity in an individual image. The technique utilizes knowledge of the camera motion to form and analyze slices of this solid. These slices directly encode not only the three-dimensional positions of objects, but also such spatiotemporal events as the occlusion of one object by another. For straight-line camera motions, these slices have a simple linear structure that makes them easier to analyze. The analysis computes the threedimensional positions of object features, marks occlusion boundaries on the objects, and builds a threedimensional map of "free space." In our article, we first describe the application of this technique to a simple camera motion, and then show how projective duality is used to extend the analysis to a wider class of camera motions and object types that include curved and moving objects.
Coliseum is a multiuser immersive remote teleconferencing system designed to provide collaborative workers the experience of face-to-face meetings from their desktops. Five cameras are attached to each PC display and directed at the participant. From these video streams, view synthesis methods produce arbitrary-perspective renderings of the participant and transmit them to others at interactive rates, currently about 15 frames per second. Combining these renderings in a shared synthetic environment gives the appearance of having all participants interacting in a common space. In this way, Coliseum enables users to share a virtual world, with acquired-image renderings of their appearance replacing the synthetic representations provided by more conventional avatar-populated virtual worlds. The system supports virtual mobility-participants may move around the shared space-and reciprocal gaze, and has been demonstrated in collaborative sessions of up to ten Coliseum workstations, and sessions spanning two continents.Coliseum is a complex software system which pushes commodity computing resources to the limit. We set out to measure the different aspects of resource, network, CPU, memory, and disk usage to uncover the bottlenecks and guide enhancement and control of system performance. Latency is a key component of Quality of Experience for video conferencing. We present how each aspect of the system-cameras, image processing, networking, and display-contributes to total latency. Performance measurement is as complex as the system to which it is applied. We describe several techniques to estimate performance through direct light-weight instrumentation as well as use of realistic end-to-end measures that mimic actual user experience. We describe the various techniques and how they can be used to improve system performance for Coliseum and other network applications. This article summarizes the Coliseum technology and reports on issues related to its performance-its measurement, enhancement, and control.
Images of a scene captured with multiple cameras will have different color values because of variations in color rendering across devices. We present a method to accurately retrieve color information from uncalibrated images taken under uncontrolled lighting conditions with an unknown device and no access to raw data, but with a limited number of reference colors in the scene. The method is used to assess skin tones. A subject is imaged with a calibration target. The target is extracted and its color values are used to compute a color correction transform that is applied to the entire image. We establish that the best mapping is done using a target consisting of skin colored patches representing the whole range of human skin colors. We show that color information extracted from images is well correlated with color data derived from spectral measurements of skin. We also show that skin color can be consistently measured across cameras with different color rendering and resolutions ranging from 0.1 to 4.0 megapixels.
Little prior image processing work has addressed estimation and classification of skin color in a manner that is independent of camera and illuminant. To this end, we first present new methods for 1) fast, easy-to-use image color correction, with specialization toward skin tones, and 2) fully automated estimation of facial skin color, with robustness to shadows, specularities, and blemishes. Each of these is validated independently against ground truth, and then combined with a classification method that successfully discriminates skin color across a population of people imaged with several different cameras. We also evaluate the effects of image quality and various algorithmic choices on our classification performance. We believe our methods are practical for relatively untrained operators, using inexpensive consumer equipment.
The previous implementations of our Epipolar-Plane Image Analysis mapping technique demonstrated the feasibility and benefits of the approach, but were carried out for restricted camera geometries. The question of more general geometries made the technique's utility for autonomous navigation uncertain. We have developed a generalization of our analysis that (a) enables varying view direction, including variation over time (b) provides three-dimensional connectivity information for building coherent spatial descriptions of observed objects; and (c) operates sequentially, allowing initiation and refinement of scene feature estimates while the sensor is in motion. To implement this generalization it was necessary to develop an explicit description of the evolution of images over time. We have achieved this by building a process that creates a set of two-dimensional manifolds defined at the zeros of a three-dimensional spatiotemporal Laplacian. These manifolds represent explicitly both the spatial and temporal structure of the temporally evolving imagery, and we term them spatiotemporal surfaces. The surfaces are constructed incrementally, as the images are acquired. We describe a tracking mechanism that operates locally on these evolving surfaces in carrying out three-dimensional scene reconstruction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.