Alexander Bültmann scite author profile

Alexander Bültmann

3Publications

17Citation Statements Received

66Citation Statements Given

How they've been cited

How they cite others

Affiliations

Paderborn University

Publications

Order By: Most citations

Search and Modification in Compressed Texts

Böttcher

Bültmann

Hartel

2011

View full text Add to dashboard Cite

Text compression techniques like bzip2 lack the possibility to search or to update substrings at given positions of texts that have been compressed without prior decompression of the compressed text. We have developed Indexed Reversible Transformation (IRT), a modified version of the Burrows-Wheeler-Transformation (BWT) that in combination with run length encoding (RLE) and wavelet trees (WT) allows for position-based searching and updating substrings of compressed texts without prior decompression of the compressed text. As a result, IRT may be useful for a huge class of applications that due to space limitations prefer to search or to modify compressed texts instead of uncompressed texts.

show abstract

Implementing Efficient Updates in Compressed Big Text Databases

Böttcher

Bültmann

Hartel

et al. 2013

View full text Add to dashboard Cite

Abstract. Text compression techniques like bzip2 lack the possibility to insert or to delete strings at a given position into a text that has been compressed without prior decompression of the compressed text. We present a technique called DICIRT that supports fast insertion into and deletion from compressed texts without full decompression of the compressed text. For inserted fragments up to a size of 8% of the original text size, and for deleted fragments up to 15% of the original text DICIRT is faster than modifying uncompressed text preceded by a decompression step and followed by a compression step.

show abstract

Fast Insertion and Deletion in Compressed Texts

Böttcher

Bültmann

Hartel

et al. 2012

View full text Add to dashboard Cite

Text compression techniques like bzip2 lack the possibility to delete the n th word or to insert text before the n th word of compressed texts without prior decompression of the compressed texts. We present a text compression technique that supports fast insertion into and deletion from compressed texts without full decompression of the compressed text. Our approach combines Indexed Reversible Transformation (IRT) [1], Run-Length-Encoding (RLE), and the Wavelet Tree (WT). For a reasonable size of inserted or deleted texts (more details are given in [2]), our approach is faster than modifying uncompressed text preceded by a decompression step and followed by a compression step.Let IRT(S) denote the Burrows-Wheeler-Transformation (BWT) applied to a text S according to an ordering relation A $ that fulfills the following conditions. The lexicographical order of the word delimiters '$' is changed in such a way, that all '$' of S get the smallest lexicographical order in A $ , and most important, the order of the word delimiters among themselves is determined by their occurrence in S from left to right. That is, the n th word delimiter '$' appearing in S gets a smaller lexicographical order in A $ than the n+1 st word delimiter '$'. Furthermore, IRT sorts characters of S according to their prefix (opposed to BWT that sorts them according to their suffix). Thus, in contrast to BWT, the first character of the n th word of S occurs at position n of IRT(S). This provides a selfindex to the first character of each word of S, which allows for the reconstruction of each word individually without retransforming IRT(S) in total [1].On IRT(S), we apply RLE that returns a bit-stream B and a string R. B is the run-length bit-vector of IRT(S) that contains a 0-bit for each character in IRT(S) that is equal to the previous character and a 1-bit otherwise. R is IRT(S) after reducing each run of equal characters within IRT(S) to one character. The WT W of R stores the bits of the Huffman codes of all characters c i of R, such that for each c i , (1) the Huffman code of c i is stored on a path from the root node to the leaf node representing c i in W, (2) one bit of the Huffman code of c i is stored on each node of the path to c i except for the leaf node, (3) if a 0-bit (1-bit) is stored in a node of W, the path continues with the left (right) sub-tree of W.To delete the n th word of S, we search, mark, and remove the bits of its letters from B and W. The search starts at position B[n] and uses Rank and Select functions [2] on B and the nodes of W to proceed to the bits representing the remaining characters of a word. The found bits are marked, and all unmarked bits represent the compressed text after deleting the n th word of S. To insert a word before the n th word of S into the compressed representation (B,W) of S, we could, reverse to deletion, letter by letter, search and mark the insert positions in B and W and insert the appropriate bits, which computes the compressed representation (B2,W2) of the result string S2. However, in...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.