Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2061
|View full text |Cite
|
Sign up to set email alerts
|

Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction

Abstract: Both automatic speech recognition and text to speech systems need accurate pronunciations, typically obtained by using both a lexicon dictionary and a grapheme to phoneme (G2P) model. G2Ps typically struggle with predicting pronunciations for tail words, and we hypothesized that one reason is because they try to discover general pronunciation rules without using prior knowledge of the pronunciation of related words. Our new approach expands a sequence-to-sequence G2P model by injecting prior knowledge. In addi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 6 publications
0
9
0
Order By: Relevance
“…Also it would be interesting to extend the work to derivation and compounding. Extension to derivation is rather simple, but compounding (analysing compounds just given their spelling and morphology class) is hard and would likely need memory augmented models [26]. Another venue to explore is extension to contextual settings, one could simultaneously compute morphosyntactic tagging and desired morphological form or in certain applications skip morphosyntactic tagging altogether [27].…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…Also it would be interesting to extend the work to derivation and compounding. Extension to derivation is rather simple, but compounding (analysing compounds just given their spelling and morphology class) is hard and would likely need memory augmented models [26]. Another venue to explore is extension to contextual settings, one could simultaneously compute morphosyntactic tagging and desired morphological form or in certain applications skip morphosyntactic tagging altogether [27].…”
Section: Discussion and Future Workmentioning
confidence: 99%
“…The Boulder-GWK and Boulder-PDP systems, both of which perform clustering over word representations, approach but do not exceed baseline performance. Perkoff et al (2021) found that clustering over word2vec embeddings performs poorly on the development languages, and their scores on the test set reflect clusters found with vectors based purely on orthography. The Boulder-GWK systems contain incomplete results, and partial evidence suggests that their clustering method, which combines both fastText embeddings trained on the provided bible corpora, and edit distance, can indeed outperform the baseline.…”
Section: Resultsmentioning
confidence: 95%
“…The number of clusters is a hyperparameter of the k-means clustering algorithm. In order to set this hyperparameter, Perkoff et al (2021) experiment with a graph-based method. The word types in the corpus form the nodes of a graph, where the neighborhood of a word w consists of all words sharing a maximal substring with w. The graph is split into highly connected subgraphs (HCS) containing n nodes, where the number of edges that need to be cut in order to split the graph into two disconnected components is > n/2 (Hartuv and Shamir, 2000).…”
Section: Submitted Systemsmentioning
confidence: 99%
“…Unlike inflection, derivation and compounding could involve multiple root words, so an extension would need a generalization of the above approach along with appropriate data. An alternative would be to learn these in an unsupervised way using a dictionary augmented neural network which can efficiently refer to pronunciations in a dictionary and use them to predict pronunciations of polymorphemic words using pronunciations of the base words (Bruguier et al, 2018). It would be interesting to see if using a combination of morphological side information and dictionaryaugmentation results in a further accuracy boost.…”
Section: Future Workmentioning
confidence: 99%