Infty

Suzuki, Masato; Tamari, Fumikazu; Fukuda, Ryoji; Uchida, Seiichi; Kobayashi, Toshihiro

doi:10.1145/958220.958239

Cited by 119 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In recent years, these systems have evolved rapidly through deep learning. Usually, transformer-based approaches [2][3][4] have proven to outperform traditional statistical models [15,16] and convolutional neural networks [5,6,[17][18][19]. These neural networks are able to learn and recognize intricate patterns and features within images automatically, making them particularly well-suited for accurately extracting text with subscripts such as mathematical formulas from scanned documents or images [20].…”

Section: Mathematical Expression Recognitionmentioning

confidence: 99%

Investigating Models for the Transcription of Mathematical Formulas in Images

Feichter,

Schlippe

2024

Applied Sciences

View full text Add to dashboard Cite

The automated transcription of mathematical formulas represents a complex challenge that is of great importance for digital processing and comprehensibility of mathematical content. Consequently, our goal was to analyze state-of-the-art approaches for the transcription of printed mathematical formulas on images into spoken English text. We focused on two approaches: (1) The combination of mathematical expression recognition (MER) models and natural language processing (NLP) models to convert formula images first into LaTeX code and then into text, and (2) the direct conversion of formula images into text using vision-language (VL) models. Since no dataset with printed mathematical formulas and corresponding English transcriptions existed, we created a new dataset, Formula2Text, for fine-tuning and evaluating our systems. Our best system for (1) combines the MER model LaTeX-OCR and the NLP model BART-Base, achieving a translation error rate of 36.14% compared with our reference transcriptions. In the task of converting LaTeX code to text, BART-Base, T5-Base, and FLAN-T5-Base even outperformed ChatGPT, GPT-3.5 Turbo, and GPT-4. For (2), the best VL model, TrOCR, achieves a translation error rate of 42.09%. This demonstrates that VL models, predominantly employed for classical image captioning tasks, possess significant potential for the transcription of mathematical formulas in images.

show abstract

Section: Mathematical Expression Recognitionmentioning

confidence: 99%

Investigating Models for the Transcription of Mathematical Formulas in Images

Feichter,

Schlippe

2024

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…However, both MER systems comprise three stages: symbol segmentation, symbol recognition, and 2D structure analysis [4]. Classic approaches, as the Infty system [13], [14] solve these stages separately, whereas end-to-end approaches address them all at once. With recent progress in deep learning, end-to-end approaches with an encoder-decoder structure have become prevalent [15].…”

Section: Related Workmentioning

confidence: 99%

MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

Schmitt-Koopmann,

Huang,

Hutter

et al. 2024

IEEE Access

View full text Add to dashboard Cite

Printed mathematical expression recognition (MER) models are usually trained and tested using LaTeX-generated mathematical expressions (MEs) as input and the LaTeX source code as ground truth. As the same ME can be generated by various different LaTeX source codes, this leads to unwanted variations in the ground truth data that bias test performance results and hinder efficient learning. In addition, the use of only one font to generate the MEs heavily limits the generalization of the reported results to realistic scenarios. We propose a data-centric approach to overcome this problem, and present convincing experimental results: Our main contribution is an enhanced LaTeX normalization to map any LaTeX ME to a canonical form. Based on this process, we developed an improved version of the benchmark dataset im2latex-100k, featuring 30 fonts instead of one. Second, we introduce the real-world dataset realFormula, with MEs extracted from papers. Third, we developed a MER model, MathNet, based on a convolutional vision transformer, with superior results on all four test sets (im2latex-100k, im2latexv2, realFormula, and InftyMDB-1), outperforming the previous state of the art by up to 88.3%.

show abstract

“…• The Infty Reader [37] is a mathematical OCR solution intended to remedy the above-mentioned lack of mathematical literature accessible to blind individuals. Although under the right circumstances the tool can produce quite accurate recognition results, the terms produced by it are ex-tremely hard to read because they contain lots of formatting information which will distract the blind reader.…”

Section: Present Statementioning

confidence: 99%

Towards an Accessible Mathematics Working Environment Based on Isabelle/VSCode

Miesenberger¹,

Neuper²,

Stöger³

et al. 2023

Electron. Proc. Theor. Comput. Sci.

View full text Add to dashboard Cite

The paper collects preparatory work for interdisciplinary collaboration between three partners, between (1) experts in improving accessibility of studies for impaired individuals, (2) experts in developing educational mathematics software and (3) experts in designing and implementing interactive proof assistants.The cooperation was started with the goal to develop an accessible mathematics working environment for education with reasonable efforts. The start was triggered by the lucky discovery that the upcoming Isabelle/VSCode is greatly accessible for blind users without further impairments; this is envisaged as the project's target group.Technical details are described to an extent necessary to understand essential details of efforts required for development. A survey of demand from practice of education with respect to (1) and (2) leads to a vision for educational math software, which necessarily is sketchy but suffices to guide development and which shall invite experts in didactics of mathematics to collaborate.

show abstract

Infty

Cited by 119 publications

References 9 publications

Investigating Models for the Transcription of Mathematical Formulas in Images

Investigating Models for the Transcription of Mathematical Formulas in Images

MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition

Towards an Accessible Mathematics Working Environment Based on Isabelle/VSCode

Contact Info

Product

Resources

About