Previous research on sentiment analysis mainly focuses on binary or ternary sentiment analysis in monolingual texts. However, in today's social media such as micro-blogs, emotions are often expressed in bilingual or multilingual text called code-switching text, and people's emotions are complex, including happiness, sadness, angry, afraid, surprise, etc. Different emotions may exist together, and the proportion of each emotion in the code-switching text is often unbalanced. Inspired by the recently proposed BERT model, we investigate how to fine-tune BERT for multi-label sentiment analysis in codeswitching text in this paper. Our investigation includes the selection of pre-trained models and the finetuning methods of BERT on this task. To deal with the problem of the unbalanced distribution of emotions, a method based on data augmentation, undersampling and ensemble learning is proposed to get balanced samples and train different multi-label BERT classifiers. Our model combines the prediction of each classifier to get the final outputs. The experiment on the dataset of NLPCC 2018 shared task 1 shows the effectiveness of our model for the unbalanced code-switching text. The F1-Score of our model is higher than many previous models.
Myopia is one of the most common forms of refractive eye disease and considered as a worldwide pandemic experienced by half of the global population by 2050. During the past several decades, myopia has become a leading cause of visual impairment, whereas several factors are believed to be associated with its occurrence and development. In terms of environmental factors, air pollution has gained more attention in recent years, as exposure to ambient air pollution seems to increase peripheral hyperopia defocus, affect the dopamine pathways, and cause retinal ischemia. In this review, we highlight epidemiological evidence and potential biological mechanisms that may link exposure to air pollutants to myopia. A thorough understanding of these mechanisms is a key for establishing and implementing targeting strategies. Regulatory efforts to control air pollution through effective policies and limit individual exposure to preventable risks are required in reducing this global public health burden.
Supervised neural network models have achieved outstanding performance in the document summarization task in recent years. However, it is hard to get enough labeled training data with a high quality for these models to generate different types of summaries in reality. In this work, we mainly focus on improving the performance of the popular unsupervised Textrank algorithm that requires no labeled training data for extractive summarization. We first modify the original edge weight of Textrank to take the relative position of sentences into account, and then combine the output of the improved Textrank with K-means clustering to improve the diversity of generated summaries. To further improve the performance of our model, we innovatively incorporate external knowledge from open-source knowledge graphs into our model by entity linking. We use the knowledge graph sentence embedding and the tf-idf embedding as the input of our improved Textrank, and get the final score for each sentence by linear combination. Evaluations on the New York Times data set show the effectiveness of our knowledge-enhanced approach. The proposed model outperforms other popular unsupervised models significantly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.