Proceedings of the 39th Annual Meeting on Association for Computational Linguistics - ACL '01 2001
DOI: 10.3115/1073012.1073047
|View full text |Cite
|
Sign up to set email alerts
|

Serial combination of rules and statistics

Abstract: A hybrid system is described which combines the strength of manual rulewriting and statistical learning, obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing partial disambiguation with recall close to 100% is applied first, and a trigram HMM tagger runs on its results. An experiment in Czech tagging has been performed with encouraging results.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
25
0
1

Year Published

2005
2005
2014
2014

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 44 publications
(27 citation statements)
references
References 13 publications
1
25
0
1
Order By: Relevance
“…Therefore, in contrast to our approach, their system does not capture cross-dependencies between inflectional categories, such as the dependence between the word-class and case of adjacent words. Unsurprisingly, Smith et al fail to achieve improvement over a generative HMMbased POS tagger of Hajič (2001). Meanwhile, our system outperforms the generative trigram tagger HunPos (Halácsy et al, 2007) Ceauşu (2006) uses a maximum entropy Markov model (MEMM) based system for tagging Romanian which utilizes transitional behavior between sub-labels similarly to our feature set (6).…”
Section: Related Workmentioning
confidence: 90%
“…Therefore, in contrast to our approach, their system does not capture cross-dependencies between inflectional categories, such as the dependence between the word-class and case of adjacent words. Unsurprisingly, Smith et al fail to achieve improvement over a generative HMMbased POS tagger of Hajič (2001). Meanwhile, our system outperforms the generative trigram tagger HunPos (Halácsy et al, 2007) Ceauşu (2006) uses a maximum entropy Markov model (MEMM) based system for tagging Romanian which utilizes transitional behavior between sub-labels similarly to our feature set (6).…”
Section: Related Workmentioning
confidence: 90%
“…Our tag set is the Prague Dependency Treebank (PDT; Hajič, 1998) 1,400 distinct tag types in the PDT. Czech has been treated probabilistically before, perhaps most successfully by Hajič et al (2001). 8 In contrast, we estimate conditionally (rather than by maximum likelihood for a generative HMM) and separate the training of the source and the channel.…”
Section: Czech: Model and Experimentsmentioning
confidence: 99%
“…Such approaches have largely focused on modeling the phone-or character-level processes that generate candidate lexical types, rather than tokens in context. For the full analysis of words in context, disambiguation is also required (Hakkani-Tür et al, 2000;Hajič et al, 2001). In this paper, we apply a novel source-channel model to the problem of morphological disambiguation (segmentation into morphemes, lemmatization, and POS tagging) for concatenative, templatic, and inflectional languages.…”
mentioning
confidence: 99%
“…Since a great deal of the POS information exploited by an HMM tagger is contained in sequences of function words 12 , these features of Czech hinder the performance of an HMM POS tagger. 13 Finally, Czech belongs to the Slavic language family, and is therefore further removed than French from the Germanic and Romance families of the source languages used to train the single-source taggers.…”
Section: Single-source Taggersmentioning
confidence: 99%
“…12 e.g., a "DT" is likely to be followed by a "NN" in English. 13 The Czech tagger we use for reference [13] combines a rule-based morphological analyzer with an HMM POS tagger to combat these problems; our induced HMM POS taggers, lacking any morphological analysis component, may not exploit the correct type of information for such languages.…”
Section: Single-source Taggersmentioning
confidence: 99%