Information retrieval systems for scholarly literature rely heavily not only on text matching but on semantic-and context-based features. Readers nowadays are deeply interested in how important an article is, its purpose and how influential it is in follow-up research work. Numerous techniques to tap the power of machine learning and artificial intelligence have been developed to enhance retrieval of the most influential scientific literature. In this paper, we compare and improve on four existing state-of-the-art techniques designed to identify influential citations. We consider 450 citations from the Association for Computational Linguistics corpus, classified by experts as either important or unimportant, and further extract 64 features based on the methodology of four state-of-the-art techniques. We apply the Extra-Trees classifier to select 29 best features and apply the Random Forest and Support Vector Machine classifiers to all selected techniques. Using the Random Forest classifier, our supervised model improves on the state-of-the-art method by 11.25%, with 89% Precision-Recall area under the curve. Finally, we present our deep-learning model, the Long Short-Term Memory network, that uses all 64 features to distinguish important and unimportant citations with 92.57% accuracy.
Purpose
The purpose of this paper is to analyze the scientific collaboration of institutions and its impact on institutional research performance in terms of productivity and quality. The researchers examined the local and international collaborations that have a great impact on institutional performance.
Design/methodology/approach
Collaboration dependence measure was used to investigate the impact of an institution on external information. Based on this information, the authors used “index of gain in impact through collaboration” to find the impact of collaborated publications in institutional research performance. Bibliographic data between 1996 and 2010 retrieved from Scopus were used to conduct current study. The authors carried out the case study of top institutes of Pakistan in terms of publication count to elaborate the difference between high performing institutions and those who gain disproportionally in terms of perceived quality of their output because of local or international collaboration.
Findings
The results showed that the collaboration of developing countries institutes on international level had a great impact on institutional performance and they gain more benefit than local collaboration. Altogether, the scientific collaboration has a positive impact on institutional performance as measured by the cumulative source normalized impact per paper of their publications. The findings could also help researchers to find out appropriate collaboration partners.
Originality/value
This study has revealed some salient characteristics of collaboration in academic research. It becomes apparent that collaboration intensity is not uniform, but in general, the average quality of scientific production is the variable that most often correlates positively with the collaboration intensity of universities.
A decade of in-text citation analysis based on natural language processing and machine learning techniques: an overview of empirical studies. Scientometrics.
The purpose of the study is to (a) contribute to annotating an Altmetrics dataset across five disciplines, (b) undertake sentiment analysis using various machine learning and natural language processing–based algorithms, (c) identify the best-performing model and (d) provide a Python library for sentiment analysis of an Altmetrics dataset. First, the researchers gave a set of guidelines to two human annotators familiar with the task of related tweet annotation of scientific literature. They duly labelled the sentiments, achieving an inter-annotator agreement (IAA) of 0.80 (Cohen’s Kappa). Then, the same experiments were run on two versions of the dataset: one with tweets in English and the other with tweets in 23 languages, including English. Using 6388 tweets about 300 papers indexed in Web of Science, the effectiveness of employed machine learning and natural language processing models was measured by comparing with well-known sentiment analysis models, that is, SentiStrength and Sentiment140, as the baseline. It was proved that Support Vector Machine with uni-gram outperformed all the other classifiers and baseline methods employed, with an accuracy of over 85%, followed by Logistic Regression at 83% accuracy and Naïve Bayes at 80%. The precision, recall and F1 scores for Support Vector Machine, Logistic Regression and Naïve Bayes were (0.89, 0.86, 0.86), (0.86, 0.83, 0.80) and (0.85, 0.81, 0.76), respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.