The paper presents a system developed for the SemEval-2019 competition Task 5 hat-Eval Basile et al. (2019) (team name: LU Team) and Task 6 OffensEval Zampieri et al. (2019b) (team name: NLPR@SRPOL), where we achieved 2 nd position in Subtask C. The system combines in an ensemble several models (LSTM, Transformer, OpenAI's GPT, Random forest, SVM) with various embeddings (custom, ELMo, fastText, Universal Encoder) together with additional linguistic features (number of blacklisted words, special characters, etc.). The system works with a multi-tier blacklist and a large corpus of crawled data, annotated for general offensiveness. In the paper we do an extensive analysis of our results and show how the combination of features and embedding affect the performance of the models.
We present a novel dataset and model for a multilingual setting to approach the task of Joint Entity and Relation Extraction. The SMi-LER dataset consists of 1.1 M annotated sentences, representing 36 relations, and 14 languages. To the best of our knowledge, this is currently both the largest and the most comprehensive dataset of this type. We introduce HERBERTa, a pipeline that combines two independent BERT models: one for sequence classification, and the other for entity tagging. The model achieves micro F 1 81.49 for English on this dataset, which is close to the current SOTA on CoNLL, SpERT.
<p>Evidence-based medicine can be effective only if constantly tested against errors in medical practice. Clinical record database summarization supported by a machine allows allow to detect anomalies and therefore help detect the errors in early phases of care. Summarization system is a part of Clinical Decision Support Systems however it cannot be used directly by the stakeholder as long as s/he is not able to query the clinical record database. Natural Query Languages allow opening access to data for clinical practitioners, that usually do not have knowledge about articial query languages. Results: We have developed general purpose reporting system called Ask Data Anything (ADA) that we applied to a particular CDSS implementation. As a result, we obtained summarization system that opens the access for these of clinical researchers that were excluded from the meaningful summary of clinical records stored in a given clinical database. The most significant part of the component - NQL parser - is a hybrid of Controlled Natural Language (CNL) and pattern matching with a prior error repair phase. Equipped with reasoning capabilities due to the intensive use of semantic technologies, our hybrid approach allows one to use very simple, keyword-based (even erroneous) queries as well as complex CNL ones with the support of a predictive editor. By using ADA sophisticated summarizations of clinical data are produced as a result of NQL query execution. In this paper, we will present the main ideas underlying ADA component in the context of CDSS.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.