Abstract-An important aspect of examining printed documents for potential forgeries and copyright infringement is the identification of source printer as it can be helpful for ascertaining the leak and detecting forged documents. This paper proposes a system for classification of source printer from scanned images of printed documents using all the printed letters simultaneously. This system uses local texture patterns based features and a single classifier for classifying all the printed letters. Letters are extracted from scanned images using connected component analysis followed by morphological filtering without the need of using an OCR. Each letter is sub-divided into a flat region and an edge region, and local tetra patterns are estimated separately for these two regions. A strategically constructed pooling technique is used to extract the final feature vectors. The proposed method has been tested on both a publicly available dataset of 10 printers and a new dataset of 18 printers scanned at a resolution of 600 dpi as well as 300 dpi printed in four different fonts. The results indicate shape independence property in the proposed method as using a single classifier it outperforms existing handcrafted feature-based methods and needs much smaller number of training pages by using all the printed letters.
In this digital era, one thing that still holds the convention is a printed archive. Printed documents find their use in many critical domains such as contract papers, legal tenders and proof of identity documents. As more advanced printing, scanning and image editing techniques are becoming available, forgeries on these legal tenders pose a serious threat. Ability to easily and reliably identify source printer of a printed document can help a lot in reducing this menace. During printing procedure, printer hardware introduces certain distortions in printed characters' locations and shapes which are invisible to naked eyes. These distortions are referred as geometric distortions, their profile (or signature) is generally unique for each printer and can be used for printer classification purpose. This paper proposes a set of features for characterizing text-line-level geometric distortions, referred as geometric distortion signatures and presents a novel system to use them for identification of the origin of a printed document. Detailed experiments performed on a set of thirteen printers demonstrate that the proposed system achieves state of the art performance and gives much higher accuracy under small training size constraint. For four training and six test pages of three different fonts, the proposed method gives 99% classification accuracy.
The knowledge of source printer can help in printed text document authentication, copyright ownership, and provide important clues about the author of a fraudulent document along with his/her potential means and motives. Development of automated systems for classifying printed documents based on their source printer, using image processing techniques, is gaining lot of attention in multimedia forensics. Currently, state-of-the-art systems require that the font of letters present in test documents of unknown origin must be available in those used for training the classifier. In this work, we attempt to take the first step towards overcoming this limitation. Specifically, we introduce a novel printer specific local texture descriptor. The highlight of our technique is the use of encoding and regrouping strategy based on small linear-shaped structures composed of pixels having similar intensity and gradient. The results of experiments performed on two separate datasets show that: 1) on a publicly available dataset, the proposed method outperforms state-of-the-art algorithms for characters printed in the same font, and 2) on another dataset 1 having documents printed in four different fonts, the proposed method correctly classifies all test samples when sufficient training data is available in same font setup. In addition, it outperforms state-of-the-art methods for cross font experiments. Moreover, it reduces the confusion between the printers of same brand and model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.