This paper presents a confidential text documents leakage investigation system, focused on leak channels by documents printing and screen photographing. Internal intruders may print confidential document, take paper copy out of protected perimeter, make document image by scanner and perform anonymous leak. Also, intruders may take a photo of printed confidential document or displayed on workstation screen using personal mobile phone. Described leakage channels are weakly covered by traditional DLP systems that are usually used by enterprises for confidential information leak protection. Digital watermark (DWM) embedding is chosen as a document protection mechanism by implying changing of document image visual representation. In case of confidential document anonymous leak embedded DWM would enable the employee to determine what leak intentionally or by security protocol violation. System architecture consists of different type components. Employees’ workstation components provide DWM embedding into documents, which are sent for printing or displayed on screen. Information about watermark embedding is sent to a remote server that aggregates marking facts and provides it to security officer during investigation. Text document marking algorithms are developed, which embed DWM into printed and displayed on screen documents. Screen watermark is embedded into interline space interval, information is encoded by sequence of lightened and darkened spaces. DWM embedding into printed documents is implemented by three algorithms: horizontal and vertical shift based, font fragments brightness changing based. Algorithms testing methodology is developed in view of the production environment, that helped to evaluate the application area of algorithms. Besides, intruder model was formulated, system security was evaluated and determined possible attack vectors.
This paper describes the results of the development of methods for marking text documents represented as a raster image. An important feature of the algorithms is the possibility wipe current document mark and embed another one. The study refers to structural marking algorithms based on vertical word shifts and brightness changes of certain areas of the words. Segmentation tools are used to obtain document layout, BCH codes for error correction, a likelihood maximization method for label extraction, and a neural network for perturbed words recovery. Testing has proved the practical applicability of the algorithms with printing and scanning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.