Proceedings of the 39th Annual Meeting on Association for Computational Linguistics - ACL '01 2001
DOI: 10.3115/1073012.1073046
|View full text |Cite
|
Sign up to set email alerts
|

XML-based data preparation for robust deep parsing

Abstract: We describe the use of XML tokenisation, tagging and markup tools to prepare a corpus for parsing. Our techniques are generally applicable but here we focus on parsing Medline abstracts with the ANLT wide-coverage grammar. Hand-crafted grammars inevitably lack coverage but many coverage failures are due to inadequacies of their lexicons. We describe a method of gaining a degree of robustness by interfacing POS tag information with the existing lexicon. We also show that XML tools provide a sophisticated approa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
10
0

Year Published

2001
2001
2015
2015

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 9 publications
1
10
0
Order By: Relevance
“…The analyser was used to create the frequency lists for the sampling frame for the SENSEVAL-1 exercise (Kilgarriff 1998). It forms part of a prototype word sense disambiguation system (Carroll and McCarthy 2000), and is being used in research into methods for the semantic interpretation of complex nominals in medical domains (Grover and Lascarides 2001). Other, commercial applications are in corpus processing for dictionary publishing, in particular as part of the creation of 'word sketches' for lexicographers (Kilgarriff and Rundell 1999), and in product marketing, producing lemma frequency data from large corpora to inform the selection of new brand names.…”
Section: The Morphological Analyser and Generator In Applied Systemsmentioning
confidence: 99%
“…The analyser was used to create the frequency lists for the sampling frame for the SENSEVAL-1 exercise (Kilgarriff 1998). It forms part of a prototype word sense disambiguation system (Carroll and McCarthy 2000), and is being used in research into methods for the semantic interpretation of complex nominals in medical domains (Grover and Lascarides 2001). Other, commercial applications are in corpus processing for dictionary publishing, in particular as part of the creation of 'word sketches' for lexicographers (Kilgarriff and Rundell 1999), and in product marketing, producing lemma frequency data from large corpora to inform the selection of new brand names.…”
Section: The Morphological Analyser and Generator In Applied Systemsmentioning
confidence: 99%
“…To best of our knowledge, this is the first implemented system which integrates high-performance shallow processing with an advanced deep HPSG-based analysis system. There exists only very little other work that considers integration of shallow and deep NLP using an XML-based architecture, most notably (Grover and Lascarides, 2001). However, their integration efforts are largly limited to the level of POS tag information.…”
Section: Discussionmentioning
confidence: 99%
“…LXGram uses a similar approach, where morphological information output by shallow tools is used to enable the grammar to process un- [ 115 ] known words (though we do not use a shallow parser to improve efficiency). Grover and Lascarides (2001) is an earlier work that also uses the morphological information coming from shallow tools to increase the robustness of a computational grammar, namely when it comes to dealing with out-of-vocabulary words.…”
Section: Hybrid Natural Language Processingmentioning
confidence: 99%