The college entrance rate of the disabled is gradually increasing, and each university is trying to provide equal rights and opportunities for college students with disabilities. However, students with disabilities still have difficulty adapting to college life due to limitations in the range of experience and diversity, restrictions in walking ability, and restrictions on interaction with the environment. Visually impaired students cannot perform tasks given by universities independently without the help of others, but universities do not have a system that is helpful except for supporting helpers. Therefore, in this paper, we aimed to develop independent report generation software, VTR4VI (Voice to Report program for the Visually Impaired) for students with visual impairment by using mobile devices that are always in possession. Since the existing speech recognition document editing software is designed for non-visually impaired people, it is difficult for the visually impaired to use. Accordingly, the requirements of a report generator for blind students were identified so blind students could freely perform assignments or write reports without helpers, just like non-visually impaired students. This software can be easily used by clicking on the Bluetooth remote control instead of touching the phone screen, and the operation is simple. As a result of our usability evaluation, our VTR4VI will surely help the visually impaired to study and make a written report.
In this technical report, we describe the fine-tuned 1 ASR-MT pipeline used for the IWSLT shared task. We remove less useful speech samples by checking WER with an ASR model, and further train a wav2vec and Transformers-based ASR module based on the filtered data. In addition, we cleanse the errata that can interfere with the machine translation process and use it for Transformer-based MT module training. Finally, in the actual inference phase, we use a sentence boundary detection model trained with constrained data to properly merge fragment ASR outputs into full sentences. The merged sentences are postprocessed using part of speech. The final result is yielded by the trained MT module. The performance using the dev set displays BLEU 20.37, and this model records the performance of BLEU 20.9 with the test set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.