Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from
          Raw Text to Universal Dependencies

Zeman, Daniel; Hajič, Jan

doi:10.18653/v1/k17-3

Cited by 6 publications

(4 citation statements)

References 24 publications

(66 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data To make our analysis maximally comparable across languages, we start from the Parallel Universal Dependencies (PUD) collection (Zeman et al, 2017), which contains translations for a set of 1000 English sentences. PUD only contains test corpora.…”

Section: Methodsmentioning

confidence: 99%

“…Most of the work on word order variation using Universal Dependencies (UD: de Marneffe et al, 2021) is based on curated dependency treebanks, with only a few works using dependency corpora derived from raw texts. Although the accuracy rate of NLP systems trained on UD models is reportedly very high (Hajič and Zeman, 2017;Zeman and Hajič, 2018;Straka et al, 2019;Qi et al, 2020), a certain level of noise i.e., erroneous annotations is in fact present when working with automatically annotated texts (Levshina et al, to appear; Talamo and Verkerk, to appear); furthermore, different layers of UD annotations such as Universal Parts of Speech (UPOS) and UD Relations are not always used consistently across languages, often resulting in the cross-linguistic comparison of different categories.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

2022

View full text Add to dashboard Cite

In this talk I will describe the benefit of implemented grammars as well as the challenges involved in creating them. I present an inference system that can be used to automatically generate such grammars on the basis of interlinear glossed text (IGT) corpra. The inference system, called BASIL -Building Analyses from Syntactic Inference in Local Languages, leverages typologically informed heuristics to infer syntactic and morphological information from linguistic corpora to select analyses that model the language. We will engage with the question of whether and to what extent typological features are apparent in IGT data and how effectively grammars generated with these features can model human language.Bio: Kristen Howell is a data scientist at LivePerson Inc. in Seattle, Washington. Her research interests range from grammar engineering and grammar inference to conversational NLP. Throughout this research, the common thread is multilingual NLP across typologically diverse languages. Kristen received her PhD from the University of Washington in 2020, where she engaged with typological literature to develop technology for automatically generating grammars for local languages. Recent work at LivePerson has focused on multilingual NLP, leveraging deep learning techniques for conversational AI.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

2022

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase

Talamo¹

2022

Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

View full text Add to dashboard Cite

We describe a methodology to extract with finer accuracy word order patterns from texts automatically annotated with Universal Dependency (UD) trained parsers. We use the methodology to quantify the word order entropy of determiners, quantifiers and numerals in ten Indo-European languages, using UDparsed texts from a parallel corpus of prosaic texts. Our results suggest that the combinations of different UD annotation layers, such as UD Relations, Universal Parts of Speech and lemma, and the introduction of languagespecific lists of closed-category lemmata has the two-fold effect of improving the quality of analysis and unveiling hidden areas of variability in word order patterns.

show abstract

“…Traditional readability formulas (e.g. Flesch-Kincaid Grade Level (Kincaid et al, 1975), Gunning Fog Index (Gunning, 1952)) typically use shallow source text features such as average word and sentence length and word frequency to assess the reading difficulty level of a given text. Recently, more complex lexical, syntactic, semantic and discourse text features have been used (see for instance Schwarm and Ostendorf (2005); Francois and Miltsakaki (2012);De Clercq et al (2014); De Hoste (2016), andCollins-Thompson (2014) for an overview).…”

Section: Introductionmentioning

confidence: 99%

Metrics of Syntactic Equivalence to Assess Translation Difficulty

Vanroy

Clercq

Tezcan

et al. 2021

Explorations in Empirical Translation Process Research

View full text Add to dashboard Cite

We propose three linguistically motivated metrics to quantify syntactic equivalence between a source sentence and its translation. Syntactically Aware Cross (SACr) measures the degree of word group reordering by creating syntactically motivated groups of words that are aligned. Secondly, an intuitive approach is to compare the linguistic labels of the word-aligned source and target tokens. Finally, on a deeper linguistic level, Aligned Syntactic Tree Edit Distance (ASTrED) compares the dependency structure of both sentences. To be able to compare source and target dependency labels we make use of Universal Dependencies (UD). We provide an analysis of our metrics by comparing them with translation process data in mixed models. Even though our examples and analysis focus on English as the source language and Dutch as the target language, the proposed metrics can be applied to any language for which UD models are attainable. An open-source implementation is made available.

show abstract

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Cited by 6 publications

References 24 publications

Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Tweaking UD Annotations to Investigate the Placement of Determiners, Quantifiers and Numerals in the Noun Phrase

Metrics of Syntactic Equivalence to Assess Translation Difficulty

Contact Info

Product

Resources

About