Proceedings of the Third Linguistic Annotation Workshop on - ACL-IJCNLP '09 2009
DOI: 10.3115/1698381.1698416
|View full text |Cite
|
Sign up to set email alerts
|

Building a large syntactically-annotated corpus of Vietnamese

Abstract: Treebank is an important resource for both research and application of natural language processing. For Vietnamese, we still lack such kind of corpora. This paper presents up-to-date results of a project for Vietnamese treebank construction. Since Vietnamese is an isolating language and has no word delimiter, there are many ambiguities in sentence analysis. We systematically applied a lot of linguistic techniques to handle such ambiguities. Annotators are supported by automaticlabeling tools and a tree-editor … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
63
0
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 80 publications
(64 citation statements)
references
References 5 publications
0
63
0
1
Order By: Relevance
“…Recently, a group of Vietnamese computational linguists has been involved in developing a treebank for Vietnamese (Nguyen et al 2009). This is the treebank we used for our extraction system.…”
Section: Vietnamese Treebankmentioning
confidence: 99%
See 3 more Smart Citations
“…Recently, a group of Vietnamese computational linguists has been involved in developing a treebank for Vietnamese (Nguyen et al 2009). This is the treebank we used for our extraction system.…”
Section: Vietnamese Treebankmentioning
confidence: 99%
“…The current scheme contains 13 grammatical relations representing principal functional dependencies between Vietnamese words. All these dependencies use the syntactic categories defined in the Vietnamese treebank (Nguyen et al 2009) and they are divided into three groups.…”
Section: Dependency Annotation Schemamentioning
confidence: 99%
See 2 more Smart Citations
“…The corpus is nearly one third the size of the published comparable Italian, German and English corpora. Nguyen et al (2009) built a large syntactically annotated corpus for Vietnamese by constructing a treebank. To create the annotated corpus they followed the same approach that was used to create the English Penn Treebank (Marcus et al 1993): automatic parsers annotated the corpus and human annotators corrected any errors.…”
Section: Creation Of Language Resources For Under-resourced Languagesmentioning
confidence: 99%