This paper proposes a new form document identification technique that uses the structure of cells in a form. Form identification is a process to classify an input unknown form into one of the form type registered in a database. A f o r m is composed of cells that are rectangular regions enclosed by horizontal and vertical lines, and character regions. The structure of the cells is crucial to describing the type of f o r m because it can be robustly extracted f r o m an input image, even if it there are written characters and distortion caused by photocopying. The proposed method represents the cell structure by the location of the center points of each cell. This representation allows the form identification to be realized by a process of matching the points in an input image to the points in registered forms. We have implemented the point matching process using the twodimensional hash table. This implementation enables the system to robustly identify an input form even if it is skewed or deformed by a scanning process, and to reduce the time for the identification. Moreover, the similarity between two forms de$ned by the implementation can be used to evaluate the identification ability when a set of registered forms is specified. Experimental results show the robustness and effectiveness of the proposed technique.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.