Manifold Mixup Improves Text Recognition with CTC Loss

Moysset, Bastien; Messina, Ronaldo

doi:10.1109/icdar.2019.00133

Cited by 3 publications

(9 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The historical Bentham dataset [61], which consists of images of letters by the English philosopher Jeremy Bentham (1748-1832), was utilized in the work of [47]. Furthermore, the English subset of the Maurdor dataset [62] was used in [34] and contains heterogeneous images of different types of documents. Finally, the dataset "GoodNotes Handwriting Kollection" (GNHK) [63] comprises unrestricted cameracaptured images of English handwritten text from various regions, characterized by diverse styles and increased noise, was used in the work of [38].…”

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

“…The RIMES dataset was used in the studies [27, 29-31, 34, 37, 40, 42, 46, 49]. The second dataset was the French subset of Maurdor, which was utilized in the study [34].…”

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

“…The Arabic language was also significantly utilized, offering substantial variation across datasets. A first remarkable dataset in Arabic is the Maurdor [62] subset, which has approximately 13,000 text-line samples and was used in the study of [34]. OpenHaRT [66], boasting a large database of approximately 710,000 images, was utilized in the study of [46].…”

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

“…The "Handwritten Kazakh and Russian" (HKR) dataset [72], representing Kazakh and Russian languages, has been utilized in the studies by [35,41,43]. Furthermore, the Chinese language is also represented by the dataset from the "Chinese Academy of Sciences' Institute of Automation" (CASIA) [73], that was used in the works of [34,37,50,53]. CASIA offers online and offline recognition versions.…”

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

See 3 more Smart Citations

Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review

de Sousa Neto,

Bezerra,

de Moura

et al. 2024

SN COMPUT. SCI.

View full text Add to dashboard Cite

Offline Handwritten Text Recognition (HTR) systems concern the automatic recognition and transcription of handwritten text from scanned images to digital media. Recently, HTR research field has become increasingly important due to the growing need for digitizing documents and automating data entry across various industries. However, achieving satisfactory results depend on the amount of available samples to train an optical model. Creating and labeling large enough datasets for this purpose often require significant time and effort, that in some situations may be impractical. To address this problem, data augmentation approaches are commonly used as an essential component of HTR systems. In this way, the present work aims to identify, explore, and analyze the scope of data augmentation approaches for offline HTR systems. Furthermore, we detailed our research protocol and answered four pertinent research questions, which enabled us to discuss trends and possible gaps. A search was conducted across five scientific databases, focusing on papers published between 2012 and 2023. The search yielded 976 primary papers, with 32 meeting the criteria for inclusion in this review. Our results indicate that handwriting synthesis is an emerging research field, and we observed that Digital Image Processing (DIP) is still widely used as an image generator. Nevertheless, the application of Generative Adversarial Networks (GAN) has gained traction in recent years owing to its impressive ability to synthesize images of handwritten text with arbitrary style and content. In addition, we explored and analyzed the most commonly used datasets and text recognition levels in the selected works.

show abstract

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

“…The RIMES dataset was used in the studies [27, 29-31, 34, 37, 40, 42, 46, 49]. The second dataset was the French subset of Maurdor, which was utilized in the study [34].…”

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

Section: Recognition Tasks and Datasetsmentioning

confidence: 99%

See 2 more Smart Citations

Data Augmentation for Offline Handwritten Text Recognition: A Systematic Literature Review

de Sousa Neto,

Bezerra,

de Moura

et al. 2024

SN COMPUT. SCI.

View full text Add to dashboard Cite

show abstract

“…Other point cloud Mixup methods include Rigid SubSet Mixup [29] and Point MixSwap [49]. Mixup has also been investigated for LiDAR [55], graphs [53], speaker verification [70], vision-language navigation [32], single-view 3D reconstruction [10], and language processing [28,38,45,54,68]. We focus on Mixup for images, but our approach is generic and can be applied to many Mixup variants.…”

Section: Related Workmentioning

confidence: 99%

Music-Guided Video Summarization using Quadratic Assignments

Mensink

Jongstra

Mettes

et al. 2017

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

This paper aims to automatically generate a summary of an unedited video, guided by an externally provided music-track. The tempo, energy and beats in the music determine the choices and cuts in the video summarization. To solve this challenging task, we model video summarization as a quadratic assignment problem. We assign frames to the summary, using rewards based on frame interestingness, plot coherency, audio-visual match, and cut properties. Experimentally we validate our approach on the SumMe dataset. The results show that our music guided summaries are more appealing, and even outperform the current state-of-the-art summarization methods when evaluated on the F1 measure of precision and recall. CCS CONCEPTS INTROThe goal of this paper is to create high-quality video summarizations, guided by an externally provided music-track. Consider for example that after a day of skiing with your GoPro camera, you re ect your mood by selecting a music-track and the computer will automatically create a video summary of your skiing day tted on this speci c music-track. Clearly a summary with classical music should have di erent dynamics, plots, and cuts than a summary based on funk music, even when the summaries are created from Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci c permission and/or a fee. Request permissions from permissions@acm.org. ICMR '17, Bucharest, Romania The key key factors used in our music-guided summarization model are illustrated in Fig. 1. We are inspired by a large body of research focused on video summarization, either using only the visual source video [5, 8, 11-13, 18, 22], or combining multiple video modalities [4,10,20]. In contrast to these works, we aim to create a video summary tted on a given music-track, which to the best of our knowledge has never been considered before.

show abstract