This paper introduces our attempts to model the Chinese language using HPSG and MRS. Chinese refers to a family of various languages including Mandarin Chinese, Cantonese, Min, etc. These languages share a large amount of structure, though they may differ in orthography, lexicon, and syntax. To model these, we are building a family of grammars: ZHONG [ ]. This grammar contains instantiations of various Chinese languages, sharing descriptions where possible. Currently we have prototype grammars for Cantonese and Mandarin in both simplified and traditional script, all based on a common core. The grammars also have facilities for robust parsing, sentence generation, and unknown word handling.
This thesis describes the development of Zhong, a computational resource grammar for Chinese, in the framework of Head-driven Phrase Structure Grammar (HPSG: Pollard & Sag, 1994) using Minimal Recursion Semantics (Copestake et al., 2005). In order to increase the grammar's coverage for practical applications, a corpus-driven approach was adopted to systematically expand its lexical and syntactic coverage. The lexicon was expanded through semi-automatic learning lexical entries from an annotated Chinese corpus. Various language phenomena commonly observed in corpora have been analyzed and modeled in the grammar, especially those involving the particle 的 DE. The entire grammar and associated tools are available under an open-source license. A treebank with 798 sentences has been built with the parse trees from the grammar's output. With appropriate trees manually selected from the parses, the treebank was used as a gold standard to train a statistical model which can be used to rank the grammar's output parse trees, both to improve its performance in applications and to be helpful to grammar engineers during development and debugging. To evaluate the grammar's suitability to support applications like grammar feedback systems for second language learners, a small extension of the grammar is also built with MALrules and MAL-types to enable the parsing of sentences containing grammatical errors and detecting the specific errors. The information provided by the grammar would thus allow the feedback system to identify the errors and give appropriate suggestions to the learner.
This paper describes some of our attempts in extending Zhong, a Chinese HPSG shared-grammar. New analyses for two Chinese specific phenomena, reduplication and the SUO-DE structure, are introduced. The analysis of reduplication uses lexical rules to capture both the syntactic and semantic properties (amplification in adjectives and diminishing in verbs). Words showing non-productive reduplication are entered in the lexicon, and the semantic relations will be captured in an external resource (the Chinese Open Wordnet). The SUO-DE structure constrains the meanings of relative clauses to a gapped-object interpretation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.