In recent years, Deep Learning (DL) models are becoming important due to their demonstrated success at overcoming complex learning problems. DL models have been applied effectively for different Natural Language Processing (NLP) tasks such as part-of-Speech (PoS) tagging and Machine Translation (MT). Disease Named Entity Recognition (Disease-NER) is a crucial task which aims at extracting disease Named Entities (NEs) from text. In this paper, a DL model for Disease-NER using dictionary information is proposed and evaluated on National Center for Biotechnology Information (NCBI) disease corpus and BC5CDR dataset. Word embeddings trained over general domain texts as well as biomedical texts have been used to represent input to the proposed model. This study also compares two different Segment Representation (SR) schemes, namely IOB2 and IOBES for Disease-NER. The results illustrate that using dictionary information, pre-trained word embeddings, character embeddings and CRF with global score improves the performance of Disease-NER system.
Analysing the contents of social media platforms such as YouTube, Facebook and Twitter gained interest due to the vast number of users. One of the important tasks is homophobia/transphobia detection. This paper illustrates the system submitted by our team for the homophobia/transphobia detection in social media comments shared task. A machine learning-based model has been designed and various classification algorithms have been implemented for automatic detection of homophobia in YouTube comments. TF-IDF has been used with a range of bigram models for vectorization of comments. Support Vector Machines have been used to develop the proposed model and our submission reported 0.91, 0.92, 0.88 weighted F1-scores for English, Tamil and Tamil-English datasets respectively.
Multi-threading is a methodology that has been extremely used. Modern software depends essentially on multi-threading. Operating systems, famous examples, are based on multi-threading; a user can write his document, play an audio file, and downloading a file from internet at the same time. Each of these tasks called a thread. A common problem occurs when implementing multi-threaded programs is a data-race. Data race occurs when two threads try to access a shared variable at the same time without a proper synchronization. A detector is software that determines if the program contains a data-race problem or not. In this paper, we develop a detector that has the form of a type system. We present a type system which discovers the data-race problems. We also prove the soundness of our type system.
In this paper, we present the system submitted to SemEval-2020 Task 12. The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.