Algorithm for choosing the best frame in a video stream in the task of identity document recognition

Aliev, M. Sh.; Kunina, Irina; Kazbekov, A.V.; Arlazarov, Vladimir V.

doi:10.18287/2412-6179-co-811

Cited by 5 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since its publication, MIDV-500 dataset and its extension MIDV-2019 were used to evaluate the methods of identity document images classification [12 -14]; identity document location [11,15], including the methods based on semantic segmentation [16]; detecting of faces on images of identity documents [17]; and methods related to text fields recognition, including single text line recognition [18], per-frame recognition results combination [19,20] and making a stopping decision in a video stream [21,22]. The dataset was also used to evaluate the methods of choosing a single best frame in the identity document video capture [23] and assessing the quality of the frame for its processing by an identity analysis system [24], detection and masking of sensitive and private information [25] and general ID verification [26].…”

Section: Introductionmentioning

confidence: 99%

MIDV-2020: a comprehensive benchmark dataset for identity document analysis

et al. 2022

View full text Add to dashboard Cite

Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. The dataset contains 72409 annotated images in total, making it the largest publicly available identity document dataset to the date of publication. We describe the structure of the dataset, its content and annotations, and present baseline experimental results to serve as a basis for future research. For the task of document location and identification content-independent, feature-based, and semantic segmentation-based methods were evaluated. For the task of document text field recognition, the Tesseract system was evaluated on field and character levels with grouping by field alphabets and document types. For the task of face detection, the performance of Multi Task Cascaded Convolutional Neural Networks-based method was evaluated separately for different types of image input modes. The baseline evaluations show that the existing methods of identity document analysis have a lot of room for improvement given modern challenges. We believe that the proposed dataset will prove invaluable for advancement of the field of document analysis and recognition.

show abstract

Section: Introductionmentioning

confidence: 99%

MIDV-2020: a comprehensive benchmark dataset for identity document analysis

et al. 2022

View full text Add to dashboard Cite

show abstract

“…-Document detection and localization in the image [35][36][37]; -Document type identification [35,37]; -Document layout analysis; -Detection of faces in document images [38] and the choice of the best photo of the document owner [39]; -Integration of the recognition results [40]; -Video frame quality assessment [41] and the choice of the best frame [42].…”

Section: Discussionmentioning

confidence: 99%

“…Using mock documents from the MIDV-2020 collection as targets for shooting DLC-2021 video makes it easy to use field values and document geometry markup from MIDV-2020 templates. The prepared open dataset can be used for other ID-recognition tasks: Document detection and localization in the image [ 35 , 36 , 37 ]; Document type identification [ 35 , 37 ]; Document layout analysis; Detection of faces in document images [ 38 ] and the choice of the best photo of the document owner [ 39 ]; Integration of the recognition results [ 40 ]; Video frame quality assessment [ 41 ] and the choice of the best frame [ 42 ]. …”

Section: Discussionmentioning

confidence: 99%

Document Liveness Challenge Dataset (DLC-2021)

et al. 2022

View full text Add to dashboard Cite

Various government and commercial services, including, but not limited to, e-government, fintech, banking, and sharing economy services, widely use smartphones to simplify service access and user authorization. Many organizations involved in these areas use identity document analysis systems in order to improve user personal-data-input processes. The tasks of such systems are not only ID document data recognition and extraction but also fraud prevention by detecting document forgery or by checking whether the document is genuine. Modern systems of this kind are often expected to operate in unconstrained environments. A significant amount of research has been published on the topic of mobile ID document analysis, but the main difficulty for such research is the lack of public datasets due to the fact that the subject is protected by security requirements. In this paper, we present the DLC-2021 dataset, which consists of 1424 video clips captured in a wide range of real-world conditions, focused on tasks relating to ID document forensics. The novelty of the dataset is that it contains shots from video with color laminated mock ID documents, color unlaminated copies, grayscale unlaminated copies, and screen recaptures of the documents. The proposed dataset complies with the GDPR because it contains images of synthetic IDs with generated owner photos and artificial personal information. For the presented dataset, benchmark baselines are provided for tasks such as screen recapture detection and glare detection. The data presented are openly available in Zenodo.

show abstract

“…Applications of such assessment include user interaction, selection of the best frame in a video stream, rejection of analyzing an image of obviously low quality, etc. The respective field of science is rapidly developing [5,6,7], and special methods for assessing distortions of a given type (noise, blurring, low illumination, etc.) are of particular interest due to the possibility of seeking feedback from the user or applying interference correction methods [8].…”

Section: Introductionmentioning

confidence: 99%

Detection of fingers in document images captured in uncontrolled environment

Tolstenko,

Kunina

2024

Sixteenth International Conference on Machine Vision (ICMV 2023)

View full text Add to dashboard Cite

This paper proposes an algorithm for detecting finger areas in document images captured in an uncontrolled environment. The main idea of the proposed algorithm is to segment the image on the chromaticity plane and search for a segment that crosses the boundary of the displayed document in the image. It is proposed to use segmentation in combination with the edge analysis to prevent false merging of two segments belonging to the document and the background respectively, being different in the original RGB space, but acquiring similar characteristics on the chromaticity plane. Testing was performed on document images from the MIDV-2020 open dataset. The precision of the proposed algorithm was evaluated as 91.2%, and the recall as 73.5%.

show abstract

Algorithm for choosing the best frame in a video stream in the task of identity document recognition

Cited by 5 publications

References 24 publications

MIDV-2020: a comprehensive benchmark dataset for identity document analysis

MIDV-2020: a comprehensive benchmark dataset for identity document analysis

Document Liveness Challenge Dataset (DLC-2021)

Detection of fingers in document images captured in uncontrolled environment

Contact Info

Product

Resources

About