Proceedings of the First Workshop on Natural Language Processing for Medical Conversations 2020
DOI: 10.18653/v1/2020.nlpmc-1.8
|View full text |Cite
|
Sign up to set email alerts
|

Robust Prediction of Punctuation and Truecasing for Medical ASR

Abstract: Automatic speech recognition (ASR) systems in the medical domain that focus on transcribing clinical dictations and doctor-patient conversations often pose many challenges due to the complexity of the domain. ASR output typically undergoes automatic punctuation to enable users to speak naturally, without having to vocalise awkward and explicit punctuation commands, such as "period", "add comma" or "exclamation point", while truecasing enhances user readability and improves the performance of downstream NLP tas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(20 citation statements)
references
References 23 publications
0
20
0
Order By: Relevance
“…We instead use a single prediction for each token, and we find that we can achieve superior performance using much smaller context windows than [1]. Finally, [17,18] apply transformers to punctuation prediction using lexical features and prosodic features which are aligned using pre-trained feature extractors and alignment networks. In contrast to [17,18], we use forced-alignment from ASR and learn acoustic features from scratch from spectrogram segments corresponding each text tokens.…”
Section: Related Workmentioning
confidence: 99%
“…We instead use a single prediction for each token, and we find that we can achieve superior performance using much smaller context windows than [1]. Finally, [17,18] apply transformers to punctuation prediction using lexical features and prosodic features which are aligned using pre-trained feature extractors and alignment networks. In contrast to [17,18], we use forced-alignment from ASR and learn acoustic features from scratch from spectrogram segments corresponding each text tokens.…”
Section: Related Workmentioning
confidence: 99%
“…However, the performance of simple n-gram language models suffers when longrange lexical information is required to disambiguate between punctuation classes [10]. Joint modelling of truecasing and punctuation tasks is considered in [11,12] using deep learning models in a classification framework. Authors in [11] assume punctuation as an independent task and truecasing as conditionally dependent on punctuation given latent representation of the input.…”
Section: Related Workmentioning
confidence: 99%
“…Joint modelling of truecasing and punctuation tasks is considered in [11,12] using deep learning models in a classification framework. Authors in [11] assume punctuation as an independent task and truecasing as conditionally dependent on punctuation given latent representation of the input. However, it is treated as a multi-task problem in [12] where both truecasing and punctuation are independent given the input latent representation.…”
Section: Related Workmentioning
confidence: 99%
“…Word-based truecasing has been the dominant approach for a long time since the introduction of the task by Lita et al (2003). Word-based models can be further categorized into generative models such as HMMs (Lita et al, 2003;Gravano et al, 2009;Beaufays and Strope, 2013;Nebhi et al, 2015) and discriminative models such as Maximum-Entropy Markov Models (Chelba and Acero, 2004), Conditional Random Fields (Wang et al, 2006), and most recently Transformer neural network models (Nguyen et al, 2019;Rei et al, 2020;Sunkara et al, 2020). Word-based models need to refine the class of mixed case words because there is a combinatorial number of possibilities of case mixing for a word (e.g., LaTeX).…”
Section: Related Workmentioning
confidence: 99%