We present a general approach for the hierarchical segmentation and labeling of document layout structures. This approach models document layout as a grammar and performs a global search for the optimal parse based on a grammatical cost function. Our contribution is to utilize machine learning to discriminatively select features and set all parameters in the parsing process. Therefore, and unlike many other approaches for layout analysis, ours can easily adapt itself to a variety of document analysis problems. One need only specify the page grammar and provide a set of correctly labeled pages. We apply this technique to two document image analysis tasks: page layout structure extraction and mathematical expression interpretation. Experiments demonstrate that the learned grammars can be used to extract the document structure in 57 files from the UWIII document image database. We also show that the same framework can be used to automatically interpret printed mathematical expressions so as to recreate the original LaTeX.
Traditional approaches to combining classifiers attempt to improve classification accuracy at the cost of increased processing. They may be viewed as providing an accuracy-speed trade-off: higher accuracy for lower speed. In this paper we present a novel approach to combining multiple classifiers to solve the inverse problem of significantly improving classification speeds at the cost of slightly reduced classification accuracy. We propose a cascade architecture for combining classifiers and cast the process of building such a cascade as a search and optimization problem. We present two algorithms based on steepest-descent and dynamic programming for producing approximate solutions fast. We also present a simulated annealing algorithm and a depth-first-search algorithm for finding optimal solutions. Results on handwritten optical character recognition indicate that a) a speedup of 4-9 times is possible with no increase in error and b) speedups of up to 15 times are possible when twice as many errors can be tolerated.
The Viterbi algorithm is an efficient and optimal method for decoding linear-chain Markov Models. However, the entire input sequence must be observed before the labels for any time step can be generated, and therefore Viterbi cannot be directly applied to online/interactive/streaming scenarios without incurring significant (possibly unbounded) latency. A widely used approach is to break the input stream into fixed-size windows, and apply Viterbi to each window. Larger windows lead to higher accuracy, but result in higher latency.We propose several alternative algorithms to the fixed-sized window decoding approach. These approaches compute a certainty measure on predicted labels that allows us to trade off latency for expected accuracy dynamically, without having to choose a fixed window size up front. Not surprisingly, this more principled approach gives us a substantial improvement over choosing a fixed window. We show the effectiveness of the approach for the task of spotting semistructured information in large documents. When compared to full Viterbi, the approach suffers a 0.1 percent error degradation with a average latency of 2.6 time steps (versus the potentially infinite latency of Viterbi). When compared to fixed windows Viterbi, we achieve a 40x reduction in error and 6x reduction in latency.
Pen computing devices provide a natural interface for annotating documents with freeform digital ink. However, digital ink annotations are usually larger and sloppier than real ink annotations on paper. We present DIZI, a focus+context interface that zooms up a region of the underlying document for inking. Users write in the zoomed region at a comfortable size for the device. When the zoom region is shrunk back to normal page size, the digital ink shrinks to an appropriate size for the underlying document. The zoom region covers only a small portion of the screen so that users can always see the overall context of the underlying document. We describe several techniques for fluidly moving the zoom region to navigate the document. We show that DIZI allows users to create digital ink annotations that more closely mimic the look of real ink annotations on physical paper.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.