We perform compressed pattern matching in Huffman encoded texts. A modified Knuth-Morris-Pratt (KMP) algorithm is used in order to overcome the problem of false matches, i.e., an occurrence of the encoded pattern in the encoded text that does not correspond to an occurrence of the pattern itself in the original text. We propose a bitwise KMP algorithm that can move one extra bit in the case of a mismatch, since the alphabet is binary. To avoid processing any encoded text bit more than once, a preprocessed table is used to determine how far to back up when a mismatch is detected, and is defined so that the encoded pattern is always aligned with the start of a codeword in the encoded text. We combine our KMP algorithm with two Huffman decoding algorithms which handle more than a single bit per machine operation; Skeleton trees defined by Klein [1], and numerical comparisons between special canonical values and portions of a sliding window presented in Moffat and Turpin [3]. We call the combined algorithms sk-kmp and win-kmp respectively.The following table compares our algorithms with cgrep of Moura et al. As can be seen, the KMP variants are faster than the methods corresponding to "decompress and search" but slower than cgrep. However, when compression performance is important or when one does not want to re-compress Huffman encoded files in order to use cgrep, the proposed algorithms are the better choice.
We present a new lower complexity approach for content based image retrieval based on a relative compressibility similarity measure using VQ codebooks employing feature vectors based on color and position. In previous work we have developed a system that employs feature vectors that are a combination of color and position. In this paper, we present a new approach that decouples color and position. We present this approach as two methods. The first trains separate codebooks for color and position features, eliminating the need for potentially application specific feature weightings during training. The second method achieves nearly the same performance at greatly reduced complexity by partitioning images into regions and training high-rate TSVQ codebooks for each region (i.e., position information is made implicit). Features extracted from query regions are encoded with the corresponding database region codebooks. The maximum number of codewords that a database region codebook may contain is determined at runtime and is a function of the query features. Region codebooks are then pruned appropriately before encoding query features. Experiments performed on the COREL image database show this new approach to provide almost equivalent retrieval precision to our previous method of jointly trained codebooks (and an improvement over previous methods) at much lower complexity. BackgroundWith the recent proliferation of digital images, there is a need for information systems that can organize and store images using models which support content based queries. In the query-by-example setting (Eakins and Graham [4]), a user presents the system with a query image and the system responds by retrieving a set of database images with (visually) similar content. Given the discriminative power of color features and the simple histogram model, global color histograms which are relatively invariant to spatial transformations such as translation and rotation, have been effectively used
Abstract:We present tools that can be used within a larger system referred to as a passive assistant. The system receives information from a mobile device, as well as information from an image database such as Google Street View, and employs image processing to provide useful information about a local urban environment to a user who is visually impaired. The first stage acquires and computes accurate location information, the second stage performs texture and color analysis of a scene, and the third stage provides specific object recognition and navigation information. These second and third stages rely on compression-based tools (dimensionality reduction, vector quantization, and coding) that are enhanced by knowledge of (approximate) location of objects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.