Contextual word embeddings such as BERT have achieved state of the art performance in numerous NLP tasks. Since they are optimized to capture the statistical properties of training data, they tend to pick up on and amplify social stereotypes present in the data as well. In this study, we (1) propose a template-based method to quantify bias in BERT;(2) show that this method obtains more consistent results in capturing social biases than the traditional cosine based method; and (3) conduct a case study, evaluating gender bias in a downstream task of Gender Pronoun Resolution. Although our case study focuses on gender bias, the proposed technique is generalizable to unveiling other biases, including in multiclass settings, such as racial and religious biases.
We propose a neural multi-document summarization (MDS) system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences while avoiding redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multidocument summarization systems.
Abstract:Text mining works widely in the field of research techniques, which allows an individual to store text and its important terms in form of electronic document (.doc, .txt). It is difficult to remember such huge amount of text; moreover the manual approach is more time taking, unreliable and accessible to that person only. Text mining techniques optimize this approach by extracting and storing this data. Computational comparison, file read, file write are done more efficiently. With the help of Bio-Cloud, we generated more semantically similar, related and significant patterns. The give, generate and get sequence modeling is adopted. Over the other available web applications, we present our application with improved stemming, relation and average case consideration. This approach do not limit the displayed number of words as all the generated sets can be traversed with the GUI, with opted size of patterns. This method is highly applicable in bioinformatics, related information retrieval from document, sentimental analysis using social websites (Twitter and Facebook), query expansion (Google) and many more.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.