With the advancement in the world of digitization, storing information in the form of scanned copies, images, etc. becomes a new normal. This new normal leads to the need for a system that can extract accurate information from the scanned documents or images with respect to every component they may have, such as textual, graphical, etc. The first step in extracting document information is to segment the document layout: divide the document into textual and non-textual regions of interest. There have been various studies over document layout segmentation, and this study observed that the majority of the existing studies face one common challenge, i.e., accurate segmentation of graphical components with sparsely clustered pixels such as flowcharts, block diagrams, etc. The study addresses it with a two-tier feedback-based framework. The first tier segments and classifies the textual and mathematical equation components, while the second tier segments and classifies the graphical regions using the feedback information from the first tier. The information provided by the first tier is the regional information of the equation and textual components to get a different copy of the original input document image in such a way that most of the foreground pix-els are part of graphical regions. The proposed framework outperforms various existing studies (when evaluated against multiple datasets).
Charts are powerful tools for visualizing and comparing data. With the increase in the presence of various chart types in scientific documents in electronic media, the development of automatic chart classification system is becoming an important task. Existing studies on chart classification fail to address the presence of noise in charts, and confusing chart class pairs. Motivated by the above observations, in this paper, we propose an attention and triplet loss based deep CNN framework to address the above issues. From various experimental results over four datasets, it is evident that the proposed framework can effectively handle noises in the chart and confusing chart samples, and outperforms its counterparts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.