Proceedings of the 39th Annual Meeting on Association for Computational Linguistics - ACL '01 2001
DOI: 10.3115/1073012.1073065
|View full text |Cite
|
Sign up to set email alerts
|

Automatic detection of syllable boundaries combining the advantages of treebank and bracketed corpora training

Abstract: An approach to automatic detection of syllable boundaries is presented. We demonstrate the use of several manually constructed grammars trained with a novel algorithm combining the advantages of treebank and bracketed corpora training. We investigate the effect of the training corpus size on the performance of our system. The evaluation shows that a hand-written grammar performs better on finding syllable boundaries than does a treebank grammar.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2002
2002
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(23 citation statements)
references
References 4 publications
0
23
0
Order By: Relevance
“…Van den Bosch (1997) reports a word error rate of 2.22% on English syllabification using inductive learning. Due to the feature "cluster size", which was not used by Müller (2001a), we are able to give an extensive qualitative evalu-ation of syllable structure considering syllable positions, as well as the complexities of consonant clusters, and the position of a consonant within a cluster.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Van den Bosch (1997) reports a word error rate of 2.22% on English syllabification using inductive learning. Due to the feature "cluster size", which was not used by Müller (2001a), we are able to give an extensive qualitative evalu-ation of syllable structure considering syllable positions, as well as the complexities of consonant clusters, and the position of a consonant within a cluster.…”
Section: Discussionmentioning
confidence: 99%
“…As phoneme set, we used the symbols from the English and German SAMPA alphabet (Wells, 1997). In contrast to Müller (2001a), we did not investigate smaller training corpora, since we are interested in maximal phonological knowledge about internal word structure. Third, we train the phonological context-free grammar on the training corpus using the supervised method presented in Section 2.…”
Section: Experiments With German Datamentioning
confidence: 99%
See 1 more Smart Citation
“…With respect to either of these two model classes, each way of assigning syllable boundaries to a word corresponds to exactly one parse of that word. This makes it simple to train the models from a corpus in which syllable boundaries are provided, as in Müller (2001). We used two different corpora for our experiments, one German (from the ECI corpus of newspaper text) and one English (from the Penn WSJ corpus).…”
Section: Statistical Parsing Of Syllable Structurementioning
confidence: 99%
“…Also for Dutch, Bouma (2003) describes a 'hyphenation' method but since the 'core rule of Dutch hyphenation is that hyphenation points fall between syllables' (p. 5), this is effectively a syllabification algorithm also. The overall system uses a hand-crafted rule-based system (see Müller 2001) which produces (all possible) candidate pronunciations. Using a simple over-generative rule-based method which is then subjected to transformationbased learning (Brill 1995) to control the errors, Bouma reports an accuracy of 99.35% hyphens correct on a test set of approximately 29,000 words after training on approximately 260,000 words.…”
Section: Introductionmentioning
confidence: 99%