Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-2024
|View full text |Cite
|
Sign up to set email alerts
|

Neural Text Normalization with Subword Units

Abstract: Text normalization (TN) is an important step in conversational systems. It converts written text to its spoken form to facilitate speech recognition, natural language understanding and text-to-speech synthesis. Finite state transducers (FSTs) are commonly used to build grammars that handle text normalization (Sproat, 1996; Roark et al., 2012). However, translating linguistic knowledge into grammars requires extensive effort. In this paper, we frame TN as a machine translation task and tackle it with sequence-t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
28
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(28 citation statements)
references
References 11 publications
0
28
0
Order By: Relevance
“…Truecasing, as a part of text normalization, is peculiar in that its bulk can be solved simply by a few hand-written rules, with however a long tail of very difficult cases such as acronyms, unseen words. Finding a proper balance between the flexibility of neural approaches, and the controlled, more interpretable behaviour of FST-based systems, remains an open and challenging problem (Mansfield et al (2019), Sproat and Jaitly (2016), Zhang et al (2019)).…”
Section: Discussionmentioning
confidence: 99%
“…Truecasing, as a part of text normalization, is peculiar in that its bulk can be solved simply by a few hand-written rules, with however a long tail of very difficult cases such as acronyms, unseen words. Finding a proper balance between the flexibility of neural approaches, and the controlled, more interpretable behaviour of FST-based systems, remains an open and challenging problem (Mansfield et al (2019), Sproat and Jaitly (2016), Zhang et al (2019)).…”
Section: Discussionmentioning
confidence: 99%
“…It is common to model the text normalization problem as a Machine Translation problem (Mansfield et al, 2019) (Lusetti et al, 2018) (Filip et al, 2006) (Zhang et al, 2019). Given that Bidirectional LSTM with attention is a popular baseline model for machine translation task, we built a text normalization model using the same on the lines of work by (Bahdanau et al, 2014).…”
Section: Benchmark Baselinementioning
confidence: 99%
“…There is still not much work done in the area of context-aware normalisers. Mansfield et al (2019) proposed to use sequence-to-sequence models to normalise full sentences for conversational systems. Jurish (2010) proposed to use hidden markov models to choose over the normalised candidates in a sentential context.…”
Section: Related Workmentioning
confidence: 99%
“…This can be captured systematically by machine learning algorithms and applied to unseen words. Thus, the current state-of-the-art approaches to the historical normalisation rely on statistical or neural machine translation methods and define the task as a problem of translating between characters or substrings (Mansfield et al, 2019) instead of words.…”
Section: Introductionmentioning
confidence: 99%