We present a language-independent clausizer (clause splitter) based on Universal Dependencies (Nivre et al., 2016), and a clause-level tagger for grammatical tense, mood, voice and modality in German. The paper recapitulates verbal inflection in German-always juxtaposed with its close relative English-and transforms the linguistic theory into a rule-based algorithm. We achieve state-of-the-art accuracies of 92.6% for tense, 79.0% for mood, 93.8% for voice and 79.8% for modality in the literary domain. Our implementation is available at https://gitlab.gwdg. de/tillmann.doenicke/tense-tagger.
In this system paper, we present a transformerbased approach to the detection of informativeness in English tweets on the topic of the current COVID-19 pandemic. Our models distinguish informative tweets, i.e. tweets containing statistics on recovery, suspected and confirmed cases and COVID-19 related deaths, from uninformative tweets. We present two transformer-based approaches as well as a Naive Bayes classifier and a support vector machine as baseline systems. The transformer models outperform the baselines by more than 0.1 in F1-score, with F1-scores of 0.9091 and 0.9036. Our models were submitted to the shared task Identification of informative COVID-19 English tweets (WNUT-2020 Task 2).
This paper describes our participating system for the Shared Task on Discourse Segmentation and Connective Identification across Formalisms and Languages. Key features of the presented approach are the formulation as a clause-level classification task, a languageindependent feature inventory based on Universal Dependencies grammar, and compositeverb-form analysis. The achieved F1 is 92% for German and English and lower for other languages. The paper also presents a clauselevel tagger for grammatical tense, aspect, mood, voice and modality in 11 languages.
This paper proposes a framework for the expression of typological statements which uses realvalued logics to capture the empirical truth value (truth degree) of a formula on a given data source, e.g. a collection of multilingual treebanks with comparable annotation. The formulae can be arbitrarily complex expressions of propositional logic. To illustrate the usefulness of such a framework, we present experiments on the Universal Dependencies treebanks for two use cases: (i) empirical (re-)evaluation of established formulae against the spectrum of available treebanks and (ii) evaluating new formulae (i.e. potential candidates for universals) generated by a search algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.