A new framework for content-based image retrieval, which takes advantage of the source characterization property of a universal source coding scheme, is investigated. Based upon a new class of multidimensional incremental parsing algorithm, extended from the Lempel-Ziv incremental parsing code, the proposed method captures the occurrence pattern of visual elements from a given image. A linguistic processing technique, namely the latent semantic analysis (LSA) method, is then employed to identify associative ensembles of visual elements, which lay the foundation for intelligent visual information analysis. In 2-D applications, incremental parsing decomposes an image into elementary patches that are different from the conventional fixed square-block type patches. When used in compressive representations, it is amenable in schemes that do not rely on average distortion criteria, a methodology that is a departure from the conventional vector quantization. We call this methodology a parsed representation. In this article, we present our implementations of an image retrieval system, called IPSILON, with parsed representations induced by different perceptual distortion thresholds. We evaluate the effectiveness of the use of the parsed representations by comparing their performance with that of four image retrieval systems, one using the conventional vector quantization for visual information analysis under the same LSA paradigm, another using a method called SIMPLIcity which is based upon an image segmentation and integrated region matching, and the other two based upon query-by-semantic-example and query-by-visual-example. The first two of them were tested with 20,000 images of natural scenes, and the others were tested with a portion of the images. The experimental results show that the proposed parsed representation efficiently captures the salient features in visual images and the IPSILON systems outperform other systems in terms of retrieval precision and distortion robustness.
Most of the recently proposed deep learning-based speech enhancement techniques have focused on designing the neural network architectures as a black box. However, it is often beneficial to understand what kinds of hidden representations the model has learned. Since the real-world speech data are drawn from a generative process involving multiple entangled factors, disentangling the speech factor can encourage the trained model to result in better performance for speech enhancement. With the recent success in learning disentangled representation using neural networks, we explore a framework for disentangling speech and noise, which has not been exploited in the conventional speech enhancement algorithms. In this work, we propose a novel noise-invariant speech enhancement method which manipulates the latent features to distinguish between the speech and noise features in the intermediate layers using adversarial training scheme. To compare the performance of the proposed method with other conventional algorithms, we conducted experiments in both the matched and mismatched noise conditions using TIMIT and TSPspeech datasets. Experimental results show that our model successfully disentangles the speech and noise latent features. Consequently, the proposed model not only achieves better enhancement performance but also offers more robust noise-invariant property than the conventional speech enhancement techniques.
Abstract-Most full-reference fidelity/quality metrics compare the original image to a distorted image at the same resolution assuming a fixed viewing condition. However, in many applications, such as video streaming, due to the diversity of channel capacities and display devices, the viewing distance and the spatiotemporal resolution of the displayed signal may be adapted in order to optimize the perceived signal quality. For example, at low bitrate coding applications an observer may prefer to reduce the resolution or increase the viewing distance to reduce the visibility of the compression artifacts. The tradeoff between resolution/viewing conditions and visibility of compression artifacts, requires new approaches for the evaluation of image quality that account for both image distortions and image size. In order to better understand such tradeoffs, we conducted subjective tests using two representative still image coders, JPEG and JPEG 2000. Our results indicate that an observer would indeed prefer a lower spatial resolution (at a fixed viewing distance) in order to reduce the visibility of the compression artifacts, but not all the way to the point where the artifacts are completely invisible. Moreover, the observer is willing to accept more artifacts as the image size decreases. The subjective test results we report can be used to select viewing conditions for coding applications. They also set the stage for the development of novel fidelity metrics. The focus of this paper is on still images, but it is expected that similar tradeoffs apply to video.Index Terms-Scalability, image quality, image fidelity, noise visibility, just noticeable distortion, JND, human visual perception.
We apply pattern recognition techniques to enhance the robustness of moment-invariants-based image classifiers. Moment invariants exhibit variations under transformations that do not preserve the original image function, such as geometrical transformations involving interpolation. Such variations degrade the performance of classifiers due to the errors in the nearest neighbor search stage. We propose the use of Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) to alleviate the variations and enhance the robustness of classification. We demonstrate the improved performance in image registration applications under spatial scaling and rotation transformations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.