There are many historical manuscripts written in a single hand which it would be useful to index. Examples include the early Presidential papers at the Library of Congress and the collected works of W. B. DuBois at the library of the University of Massachusetts. The standard technique for indexing documents is to scan them in, convert them to machine readable form (ASCII) using Optical Character Recognition (OCR) and then index them using a text retrieval engine. However, OCR does not work well on handwriting. Here an alternative scheme is proposed for indexing such texts. Each page of the document is segmented into words. The images of the words are then matched against each other to create equivalence classes (each equivalence classes contains multiple instances of the same word). The user then provides ASCII equivalents for say the top 2000 equivalence classes.The current paper deals with the matching aspects of this process. Due to variations in even a single person's handwriting, it is expected that the matching will be the most di cult step in the whole process. Two di erent techniques for matching words are discussed. The rst method, based on Euclidean distance mapping, matches words assuming that the transformation between the words may be modelled by a translation (shift). The second method, based on an algorithm developed by Scott and Longuet Higgins, matches words assuming that the transformation between the words may be modelled by an a ne transform.Experiments are shown demonstrating the feasibility of the approach for indexing handwriting.
There are many historical manuscripts written in a single hand which it would be useful to index. Examples include the W. B. DuBois collection at the University of Massachusetts and the early Presidential libraries at the Library of Congress. The standard technique for indexing documents is to scan them in, convert them to machine readable form (ASCII) using Optical Character Recognition (OCR) and then index them using a text retrieval engine. However, OCR does not work well on handwriting. Here an alternative scheme is proposed for indexing such texts. Each page of the document is segmented into words. The images of the words are then matched against each other to create equivalence classes (each equivalence classes contains multiple instances of the same word). The user then provides ASCII equivalents for say the top 2000 equivalence classes. The current paper deals with the matching aspects of this process. Due to variations in even a single person's handwriting, it is expected that the matching will be the most difficult step in the whole process. A matching technique based on Euclidean distance mapping is discussed. Experiments are shown demonstrating the feasibility of the approach.
One of the most important applications of the intelligent operation and maintenance of a cloud database is its trend prediction of key performance indicators (KPI), such as disk use, memory use, etc. We propose a method named AutoPA4DB (Auto Prophet and ARIMA for Database) to predict the trend of the KPIs of the cloud database based on the Prophet model and the ARIMA model. Our AutoPA4DB method includes data preprocessing, model building, parameter tuning and optimization. We employ the weighted MAPE coverage to measure its accuracy and use 6 industrial datasets including 10 KPIs to compare the AutoPA4DB method with other three time-series trend prediction algorithms. The experimental results show that our AutoPA4DB method performs best in predicting monotonic variation data, e.g.disk use trend prediction. But it is unstable in predicting oscillatory variation data; for example, it is acceptable in memory use trend prediction but has poor accuracy in predicting the number of database connection trends.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.