This paper presents efficient algorithms for detemining the language classification of machine generated documents without requiring the identification of individual characters. Such algorithms may be useful for sorting and routing of facsimile documents as they arrive so that appropriate routing and secondary analysis, which may include OCR, is selected for each document. It may also prove useful as a component of a content addressable document access system. There have been numerous reported efforts which att,empt to segment printed documents into homogeneous regions using Hough transforms, hidden Markov models, morphological filtering, and neural networks. However, language identification can be accomplished without explicit segmentation using less computationally intensive methods described here.
Abstract:The term 'biometrics' refers to a measurable characteristic that is unique to an individual such as fingerprints, facial structure, the iris or a person's voice. This paper presents a fingerprint based biometric system that records the attendance of a person by using a hand held fingerprint sensor. The experimental results suggest that many fraudulent issues can be overcome using the fingerprint based attendance system and improves the reliability of the attendance records.
Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA’s performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.