JPEG 2000 is a popular image compression technique that uses Discrete Wavelet Transform (DWT) for compression and subsequently provides many rich features for efficient storage and decompression. Though compressed images are preferred for archival and communication purposes, their processing becomes difficult due to the overhead of decompression and re-compression operations which are needed as many times the data needs to operate. Therefore in this research paper, the novel idea of direct operation over the JPEG 2000 compressed documents is proposed for extracting text and non-text regions without using any segmentation algorithm. The technique avoids full decompression of the compressed document in contrast to the conventional methods, where they fully decompress and then process. Moreover, JPEG 2000 features are explored in this research work to partially and intelligently decompress only the selected regions of interest at different resolutions and bitdepths to accomplish segmentation-less extraction of text and non-text regions. Finally Maximally Stable Extremal Regions (MSER) algorithm is used to extract the layout of segmented text and non-text regions for further analysis. Experiments have been carried out on the standard PRImA Layout Analysis Dataset leading to promising results and saving computational resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.