Automatic cropping of images under projective transformation

Shemiakina, Julia; Zhukovsky, Alexander; Konovalenko, Ivan; Nikolaev, Dmitry P.

doi:10.1117/12.2523483

Cited by 3 publications

(3 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is the process of cutting out areas of the image that are of interest called a sub-image and that benefit us in the analysis [17]. At this stage the mouth is identified and In ResNet50 network, the size of the input image must be 224 * 224, and this means that we need to modify the size of the input image of the CNN to reduce or enlarge it to reach the required size and the mouth image is the cutout area.…”

Section: Crop Mouth Image From the Framementioning

confidence: 99%

Automatic Lip reading for decimal digits using ResNet50 Model

Mahdi Hashim2,

Saadun Naif1

2023

JEPS

View full text Add to dashboard Cite

Lip reading is a method to understand speech through the movement of the lips, as audio speech is notinclusive of all Categories of society, especially the hearing impaired or people in noisy environments.Lip reading is the best and alternative solution to this problem. Our proposed system solves this problemby taking a video of the person speaking with digits. Then the pre-processing process is carried out byViola Jones algorithm, by cutting the video into a sequential frame, then detecting the face, then themouth, deducting the mouth region of interest(ROI), and inserting the mouth frame into the convolutionalneural network (ResNet50), where the results are classified and the test frames is matched with thetraining frames if it is done Matching, the network is working correctly and the correct digit is spoken.But if the test frame is not matched with the training framework, then there is an error rate in thenetwork’s work and there is an error rate in the network. For that, we used a standard database topronounce the digits from 0 to 9, and we took seven speaking people, 5 males and 2 females, and we gotan accuracy of 86%.

show abstract

Section: Crop Mouth Image From the Framementioning

confidence: 99%

Automatic Lip reading for decimal digits using ResNet50 Model

Mahdi Hashim2,

Saadun Naif1

2023

JEPS

View full text Add to dashboard Cite

show abstract

“…Since ∆p is assumed to be small, one can locally approximate the projective transform H with an affine transform. In this approach, it can be shown that, for a unit circle, the lengths of the ellipse semi-axes are equal to the roots of eigenvalues λ min and λ max of the matrixJ TJ , whereJ is the Jacobian matrix of the transform H at the pointp [40]. Then, for the circle with the radius ∆x s , the lengths of the semi-minor and semi-major axes for the restored pointp, a min and a max , respectively, are calculated as follows:…”

Section: The Minimum Scaling Coefficient Assessment At a Restored Image Pointmentioning

confidence: 99%

A Method of Image Quality Assessment for Text Recognition on Camera-Captured and Projectively Distorted Documents

Shemiakina¹,

Limonova

Skoryukina³

et al. 2021

Mathematics

Self Cite

View full text Add to dashboard Cite

In this paper, we consider the problem of identity document recognition in images captured with a mobile device camera. A high level of projective distortion leads to poor quality of the restored text images and, hence, to unreliable recognition results. We propose a novel, theoretically based method for estimating the projective distortion level at a restored image point. On this basis, we suggest a new method of binary quality estimation of projectively restored field images. The method analyzes the projective homography only and does not depend on the image size. The text font and height of an evaluated field are assumed to be predefined in the document template. This information is used to estimate the maximum level of distortion acceptable for recognition. The method was tested on a dataset of synthetically distorted field images. Synthetic images were created based on document template images from the publicly available dataset MIDV-2019. In the experiments, the method shows stable predictive values for different strings of one font and height. When used as a pre-recognition rejection method, it demonstrates a positive predictive value of 86.7% and a negative predictive value of 64.1% on the synthetic dataset. A comparison with other geometric quality assessment methods shows the superiority of our approach.

show abstract

“…Shemiakina et al [13] presented two document cropping algorithms based on an estimation of pixel stretching under the transformation. The algorithms detect the edge of the document using the ratio of pixel neighbourhood areas and their chord lengths based on an estimation of the cropped background's relevant regions.…”

Section: Related Workmentioning

confidence: 99%

HU‐PageScan: a fully convolutional neural network for document page crop

Neves

Lima

Bezerra

et al. 2020

IET Image Processing

View full text Add to dashboard Cite

November The offer of online, automated, and impersonal services demand users to upload scanned copies of their documents to the organisations. As a consequence of this decentralisation, the documents present more challenges to the already complex process of image processing and information extraction. To address this problem, the authors presented an optimised fully convolutional neural network model for document segmentation that works on mobile devices to detect the region of the document in the captured image. They performed experiments in three representative datasets comparing the proposed method with the Geodesic object Proposals, U-net, Mask R-CNN, and OctHU-PageScan algorithms. They also compared the proposed model with all competitors of the ICDAR2015 Competition on smartphone document capture. Furthermore, they performed a qualitative and comparative analysis with the CamScanner software, a popular app for Android and iOS smartphones used for more than 100 million users in over 200 countries. The proposed approach achieved a significant performance compared with the current state-of-the-art methods, providing a powerful approach for document segmentation in photos and scanned images.

show abstract

Automatic cropping of images under projective transformation

Cited by 3 publications

References 7 publications

Automatic Lip reading for decimal digits using ResNet50 Model

Automatic Lip reading for decimal digits using ResNet50 Model

A Method of Image Quality Assessment for Text Recognition on Camera-Captured and Projectively Distorted Documents

HU‐PageScan: a fully convolutional neural network for document page crop

Contact Info

Product

Resources

About