2023
DOI: 10.1145/3557894
|View full text |Cite
|
Sign up to set email alerts
|

Improving Readability for Automatic Speech Recognition Transcription

Abstract: Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to grammatical errors, disfluency, and other noises common in spoken communication. These readable issues introduced by speakers and ASR systems will impair the performance of downstream tasks and the understanding of human readers. In this work, we present a task called ASR post-processing for readability (APR) and formul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(11 citation statements)
references
References 47 publications
0
11
0
Order By: Relevance
“…Although speaking is faster than typing or writing, it can be timeconsuming to correct errors caused by speech artifacts, such as disfluencies and repetitions. Modern speech and NLP research has made much progress in cleaning disfluency and speech recognition errors [17,19,30], mostly for post-processing. Researchers have also proposed multimodal interaction methods to reduce editing effort, especially on mobile devices.…”
Section: Voice Dictation and Editingmentioning
confidence: 99%
“…Although speaking is faster than typing or writing, it can be timeconsuming to correct errors caused by speech artifacts, such as disfluencies and repetitions. Modern speech and NLP research has made much progress in cleaning disfluency and speech recognition errors [17,19,30], mostly for post-processing. Researchers have also proposed multimodal interaction methods to reduce editing effort, especially on mobile devices.…”
Section: Voice Dictation and Editingmentioning
confidence: 99%
“…On the other hand, ASR is recognized as a subfield of NLP aimed at enabling computers to transcribe spoken language into text. This field has also been intensively explored for the development of various applications such as virtual assistants [10] and efficient human transcription systems [11].…”
Section: Natural Language Processingmentioning
confidence: 99%
“…However, the results indicate acceptable performance when using NER and dependency parsing through open-source and hybrid NLP models. The performance of the pipeline may increase over time with improvements in automatic speech recognition and text prediction and suggestion methods (methods that also use NLP models that are not covered within the scope of this study) [29][30][31]. However, in this study, the pipeline performance was potentially affected by the transcription errors or typing errors existing in the data set (n=16, 18% of 87 notes had at least one error; errors have not been corrected to contain real-world data features).…”
Section: Principal Findingsmentioning
confidence: 99%