People with visual impairments use assistive technology, e.g., screen readers, to navigate and read PDFs. However, such screen readers need extra information about the logical structure of the PDF, such as the reading order, header levels, and mathematical formulas, described in readable form to navigate the document in a meaningful way. This logical structure can be added to a PDF with tags. Creating tags for a PDF is time-consuming, and requires awareness and expert knowledge. Hence, most PDFs are left untagged, and as a result, they are poorly readable or unreadable for people who rely on screen readers. STEM documents are particularly problematic with their complex document structure and complicated mathematical formulae. These inaccessible PDFs present a major barrier for people with visual impairments wishing to pursue studies or careers in STEM felds, who cannot easily read studies and publications from their feld. The goal of this Ph.D. is to apply artifcial intelligence for document analysis to reasonably automate the remediation process of PDFs and present a solution for large mathematical formulae accessibility in PDFs. With these new methods, the Ph.D. research aims to lower barriers to creating accessible scientifc PDFs, by reducing the time, efort, and expertise necessary to do so, ultimately facilitating greater access to scientifc documents for people with visual impairments.
CCS CONCEPTS• Human-centered computing → Accessibility; Accessibility systems and tools; Accessibility; Accessibility technologies; • Applied computing → Document management and text processing; Document capture; Document analysis.