This paper describes LTL-UDE's systems for the SemEval 2019 Shared Task 6. We present results for Subtask A and C. In Subtask A, we experiment with an embedding representation of postings and use a Multi-Layer Perceptron and BERT to categorize postings. Our best result reaches the 10th place (out of 103) using BERT. In Subtask C, we applied a two-vote classification approach with minority fallback, which is placed on the 19th rank (out of 65).
A recent study by Plank et al. (2016) found that LSTM-based PoS taggers considerably improve over the current state-of-theart when evaluated on the corpora of the Universal Dependencies project that use a coarse-grained tagset. We replicate this study using a fresh collection of 27 corpora of 21 languages that are annotated with fine-grained tagsets of varying size. Our replication confirms the result in general, and we additionally find that the advantage of LSTMs is even bigger for larger tagsets. However, we also find that for the very large tagsets of morphologically rich languages, hand-crafted morphological lexicons are still necessary to reach state-of-the-art performance.
We present a detailed description of our submission to the EmpiriST shared task 2015 for tokenization and part-of-speech tagging of German social media text. As relatively little training data is provided, neither tokenization nor PoS tagging can be learned from the data alone. For tokenization, our system uses regular expressions for general cases and word lists for exceptions. For PoS tagging, adding unsupervised knowledge beyond the available training data is the most important factor for reaching acceptable tagging accuracy. A learning curve experiment shows furthermore that more in-domain training data is very likely to further increase accuracy.
Advances in the automated detection of offensive Internet postings make this mechanism very attractive to social media companies, who are increasingly under pressure to monitor and action activity on their sites. However, these advances also have important implications as a threat to the fundamental right of free expression. In this article, we analyze which Twitter posts could actually be deemed offenses under German criminal law. German law follows the deductive method of the Roman law tradition based on abstract rules as opposed to the inductive reasoning in Anglo-American common law systems. This allows us to show how legal conclusions can be reached and implemented without relying on existing court decisions. We present a data annotation schema, consisting of a series of binary decisions, for determining whether a specific post would constitute a criminal offense. This schema serves as a step towards an inexpensive creation of a sufficient amount of data for automated classification. We find that the majority of posts deemed morally offensive actually do not constitute a criminal offense and still contribute to public discourse. Furthermore, laymen can provide sufficiently reliable data to an expert reference but are, for instance, more lenient in the interpretation of what constitutes a disparaging statement. 1 Framework Decision 2008/913/JHA of 28 November 2008 on combating certain forms and expressions of racism and xenophobia by means of criminal law and national laws transposing it. 2 COM(2017) 555 final. 3 Netzwerkdurchsetzungsgesetz v. 1.9.2017 (BGBl. I S. 3352). 4 See § 1(3) 'rechtswidrige Inhalte'. 5 See § 3(2)(2),(3) of the Act. It is however doubtful whether these strict procedural requirements violate EU law, namely Art. 3, Art. 14 e-Commerce Directive (2000/31/EC) i.e. require acting 'expeditiously' after obtaining knowledge. 6 Strafgesetzbuch v. 13.11.1998 (BGBl. I S. 3322). 7 It is not guaranteed that a judge would necessarily arrive at the same conclusion, but a lawyer's expertise serves as a strong indicator for potentially punishable conduct.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.