From language identification to language distance

Gamallo, Pablo; Campos, José Ramom Pichel; Alegria, Iñaki

doi:10.1016/j.physa.2017.05.011

Cited by 48 publications

(49 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Also, their system combination based on sentence-level BLEU in back-translation did not succeed. Authors provide interesting insights on language distance based on previous work by (Gamallo et al, 2017) and their results show that the Phrase-based compared to NMT achieves better results when the language distance between source and target language is lower.…”

Section: Upc-talpmentioning

confidence: 97%

Findings of the 2019 Conference on Machine Translation (WMT19)

Barrault¹,

Bojar²,

Costa-jussà³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

338

261

View full text Add to dashboard Cite

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation.

show abstract

Section: Upc-talpmentioning

confidence: 97%

Findings of the 2019 Conference on Machine Translation (WMT19)

Barrault¹,

Bojar²,

Costa-jussà³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

338

261

View full text Add to dashboard Cite

show abstract

“…On average, our system achieved F1 results of 58.9 (LAS) and 66.1 (UAS). The worst results were obtained in Romanian; this fact was expected because (a) Romanian is linguistically more distant than the other Romance languages (Gamallo et al, 2017), and (b) we did not implement any dependency rule with this language in mind. Table 2 includes the LAS and UAS values of each model (in the columns) on the target treebanks (on each row).…”

Section: Results At Conll-2017 Shared Taskmentioning

confidence: 98%

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Zeman¹,

Hajič²

2017

View full text Add to dashboard Cite

ii IntroductionThis volume contains papers describing systems submitted to the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies and an overview paper summarizing the task, its features, evaluation methodology for the main and additional metrics, and some interesting observations about the submitted systems and the task as a whole.This Shared Task (http://universaldependencies.org/conll17/) can be seen as an extension of the CoNLL 2007 Shared Task on parsing, but there are many important differences that make this year's task unique with several "firsts". Most importantly, the data for this task come from the Universal Dependencies project (http://universaldependencies.org), which provides annotated treebanks for a large number of languages using the same annotation scheme for all of them. In the shared task setting, this allows for more meaningful comparison between systems as well as languages, since differences are much more likely due to true parser differences rather than differences caused by annotation schemes. In addition, the number of languages for which training data were available is unprecedented for a single shared task: a total of 64 treebanks in 45 languages have been provided for training the systems. Additional data have been provided too, as were some baseline systems for those who wanted to try only some particular aspect of parsing. Overall, the task can be described as "closed", since only pre-approved data could be used.For evaluation, there were 81 datasets (standard datasets for the treebank languages provided for training, plus more test sets in known languages, but based on a specially created and annotated parallel corpus, and four surprise language test sets). Participants had to process all the test sets. The TIRA platform has been used for evaluation, as was the case already for the CoNLL 2015 and 2016 Shared Tasks, meaning that participants had to provide their code on a designated virtual machine to be run by the organizers to produce official results. However, test data have been published after the official evaluation period, and participants could run their systems at home to produce additional results they were allowed to include in the system description papers. There was one main evaluation metric -Labeled Attachment Score -for the main ranking table evaluating dependency parsing performance, plus additional metrics for tokenization, word and sentence segmentation, POS tagging, lemmatization and disambiguation of morphological features, and separate metrics computed for interesting subsets of the evaluation data.A total of 32 systems ran successfully and have been ranked (http://universaldependencies. org/conll17/results.html). While there are clear overall winners, we would like to thank all participants for working hard on their submissions and adapting their systems not only to the datasets available, but also to the evaluation platform. We would like to thank all of them for their effort, since it is the participants who are the core o...

show abstract

Section: Results At Conll-2017 Shared Taskmentioning

confidence: 98%

A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies

García

Gamallo

2017

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing From Raw Text to Universal Dependencies

Self Cite

View full text Add to dashboard Cite

This article describes MetaRomance, a rule-based cross-lingual parser for Romance languages submitted to CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. The system is an almost delexicalized parser which does not need training data to analyze Romance languages. It contains linguistically motivated rules based on PoS-tag patterns. The rules included in MetaRomance were developed in about 12 hours by one expert with no prior knowledge in Universal Dependencies, and can be easily extended using a transparent formalism. In this paper we compare the performance of MetaRomance with other supervised systems participating in the competition, paying special attention to the parsing of different treebanks of the same language. We also compare our system with a delexicalized parser for Romance languages, and take advantage of the harmonized annotation of Universal Dependencies to propose a language ranking based on the syntactic distance each variety has from Romance languages.

show abstract

From language identification to language distance

Cited by 48 publications

References 19 publications

Findings of the 2019 Conference on Machine Translation (WMT19)

Findings of the 2019 Conference on Machine Translation (WMT19)

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies

Contact Info

Product

Resources

About