Improving the Reproducibility of PAN’s Shared Tasks:

Potthast, Martin; Gollub, Tim; Rangel, Francisco; Rosso, Paolo; Stamatatos, Efstathios; Stein, Benno

doi:10.1007/978-3-319-11382-1_22

Cited by 28 publications

(12 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the submitted official results through TIRA (Potthast et al, 2014), we use the intersection model for all languages. Since we focus on the corpus selection, we do not perform additional preprocessing and we use the provided training datasets as they are.…”

Section: Resultsmentioning

confidence: 99%

Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

Hornby¹,

Taylor²,

Park³

2017

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing From Raw Text to Universal Dependencies

View full text Add to dashboard Cite

This paper describes UALing's approach to the CoNLL 2017 UD Shared Task using corpus selection techniques to reduce training data size. The methodology is simple: We use similarity measures to select a corpus from available training data (even from multiple corpora for surprise languages) and use the resulting corpus to complete the parsing task. The training and parsing is done with the baseline UDPipe system (Straka et al., 2016). While our approach reduces the size of training data significantly, it retains performance within 0.5% of the baseline system. Due to the reduction in training data size, our system performs faster than the naïve, complete corpus method. Specifically, our system runs in less than 10 minutes, ranking it among the fastest entries for this task. Our system is available at https://github. com/CoNLL-UD-2017/UALING.

show abstract

Section: Resultsmentioning

confidence: 99%

Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

Hornby¹,

Taylor²,

Park³

2017

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing From Raw Text to Universal Dependencies

View full text Add to dashboard Cite

show abstract

“…Key goals of any empirical evaluation are to ensure a blind evaluation, its replicability, and its reproducibility. To facilitate these goals, we employed the cloud-based evaluation platform TIRA (Potthast et al, 2014), 13 which implements the evaluation as a service paradigm (Hanbury et al, 2015). In doing so, we depart from the traditional submission of system output to shared tasks, which lacks in these regards, toward the submission of working software.…”

Section: Evaluation Methodologymentioning

confidence: 99%

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Zeman¹,

Popel²,

Straka³

et al. 2017

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing From Raw Text to Universal Dependencies

Self Cite

331

399

View full text Add to dashboard Cite

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a realworld setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

show abstract

“…All of these test sets (Nivre et al, 2017b) were hidden from the participating teams until the shared task had ended. Using the TIRA environment (Potthast et al, 2014) provided for the shared task, participants could execute runs on them, but not see the outputs or the results.…”

Section: Test Splitsmentioning

confidence: 99%

A non-projective greedy dependency parser with bidirectional LSTMs

Vilares¹,

Gómez-Rodríguez²

2017

Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing From Raw Text to Universal Dependencies

View full text Add to dashboard Cite

The LyS-FASTPARSE team presents BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kiperwasser and Goldberg (2016) is used to train a greedy parser with a dynamic oracle to mitigate error propagation. The model participated in the CoNLL 2017 UD Shared Task. In spite of not using any ensemble methods and using the baseline segmentation and PoS tagging, the parser obtained good results on both macro-average LAS and UAS in the big treebanks category (55 languages), ranking 7th out of 33 teams. In the all treebanks category (LAS and UAS) we ranked 16th and 12th. The gap between the all and big categories is mainly due to the poor performance on four parallel PUD treebanks, suggesting that some 'suffixed' treebanks (e.g. Spanish-AnCora) perform poorly on cross-treebank settings, which does not occur with the corresponding 'unsuffixed' treebank (e.g. Spanish). By changing that, we obtain the 11th best LAS among all runs (official and unofficial). The code is made available at https://github.com/CoNLL-UD-2017/LyS-FASTPARSE

show abstract

Improving the Reproducibility of PAN’s Shared Tasks:

Cited by 28 publications

References 38 publications

Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

Corpus Selection Approaches for Multilingual Parsing from Raw Text to Universal Dependencies

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

A non-projective greedy dependency parser with bidirectional LSTMs

Contact Info

Product

Resources

About