Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - EACL '03 2003
DOI: 10.3115/1067737.1067795
|View full text |Cite
|
Sign up to set email alerts
|

Development of corpora within the CLaRK system

Abstract: CLaRK is an XML-based software system for corpora development. It incorporates several technologies: XML technology; Un i code ; Regular Cascaded Grammars; Constraints over XML Documents. The basic components of the system are: a tagger, a concordancer, an extractor, a grammar processor, a constraint engine.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2004
2004
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 0 publications
0
5
0
Order By: Relevance
“…Initially, CLaDA-BG-Dict was designed and implemented to support the verification and extension of BTB-WN. The motivation for this was that the existing version of BTB-WN was initiated in an XML format within the CLaRK System 10 - (Simov et al, 2001). The XML format used during the creation of earlier versions of BTB-WN was not a standard one.…”
Section: System Specifics and Functionalitiesmentioning
confidence: 99%
“…Initially, CLaDA-BG-Dict was designed and implemented to support the verification and extension of BTB-WN. The motivation for this was that the existing version of BTB-WN was initiated in an XML format within the CLaRK System 10 - (Simov et al, 2001). The XML format used during the creation of earlier versions of BTB-WN was not a standard one.…”
Section: System Specifics and Functionalitiesmentioning
confidence: 99%
“…Lemmatization and word sense disambiguation are performed by manually crafted rules, while part-of-speech tagging and morphological tagging are performed by tools based on support vector machines (SVMs). Different parts of the pipeline are developed as part of different systems, including the CLaRK system [20], Gtagger [6], and MaltParser [14].…”
Section: Related Workmentioning
confidence: 99%
“…The corpus contains not only simple, but also complex sentences [55]. In [56] is described how the main functionalities of the CLaRK system for corpora development are exploited in the BulTreeBank project. The latter is an XML based software for corpora development first introduced a little earlierin 2001 [54].…”
Section: Text Corporamentioning
confidence: 99%