Digitization of document images using OCR based systems is adversely affected if the image of the document contains distortion (warping). Often, costly and precisely calibrated special hardware such as stereo cameras, laser scanners, etc. are used to infer the 3D model of the distorted image which is used to remove the distortion. Recent methods focus on creating a 3D shape model based on the 2D document image. The performance of these methods is highly dependent on estimating an accurate 2D distortion grid. In the domain of printed document images, the white space between the text lines carries as much information about the 2D distortion as the text lines themselves. Based on this intuitive idea, we build a 2D distortion grid from white space lines, which can be used to rectify a printed document image by a dewarping algorithm. These white space lines are extracted using a propagation technique on the distance transform of the binarized document image, guided by an open active contour algorithm. We compare our proposed method against a state-of-the-art 2D distortion grid construction method and obtain better results. We also present qualitative and quantitative evaluations for the proposed method.
Multi-target tracking plays a key role in many computer vision applications including robotics, human-computer interaction, event recognition, etc., and has received increasing attention in past several years. Starting with an object detector is one of many approaches used by existing multi-target tracking methods to create initial short tracks called tracklets. These tracklets are then gradually grouped into longer final tracks in a heirarchical framework. Although object detectors have greatly improved in recent years, these detectors are far from perfect and can fail to detect the object of interest or identify a false positive as the desired object. Due to the presence of false positives or misdetections from the object detector, these tracking methods can suffer from track fragmentations and identity switches. To address this problem, we formulate multi-target tracking as a min-cost flow graph problem which we call the average shortest path. This average shortest path is designed to be less biased towards the track length. In our average shortest path framework, object misdetection is treated as an occlusion and is represented by the edges between tracklet nodes across non consecutive frames. We evaluate our method on the publicly available ETH dataset. Camera motion and long occlusions in a busy street scene make ETH a challenging dataset. We achieve competitive results with lower identity switches on this dataset as compared to the state of the art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.