Objective. This study aimed to identify the primary research areas, countries, and organizational involvement in publications on neurological disorders through an analysis of human-assigned keywords. These results were then compared with unsupervised and machine-algorithm-based extracted terms from the title and abstract of the publications to gain knowledge about deficiencies of both techniques. This has enabled us to understand how far machine-derived terms through titles and abstracts can be a substitute for human-assigned keywords of scientific research articles. Design/Methodology/Approach. While significant research areas on neurological disorders were identified from the author-provided keywords of downloaded publications of Web of Science and PubMed, these results were compared by the terms extracted from titles and abstracts through unsupervised based models like VOSviewer and machine-algorithm-based techniques like YAKE and CounterVectorizer. Results/Discussion. We observed that the post-covid-19 era witnessed more research on various neurological disorders, but authors still chose more generic terms in the keyword list than specific ones. The unsupervised extraction tool, like VOSviewer, identified many other extraneous and insignificant terms along with significant ones. However, our self-developed machine learning algorithm using CountVectorizer and YAKE provided precise results subject to adding more stop-words in the dictionary of the stop-word list of the NLTK tool kit. Conclusion. We observed that although author provided keywords play a vital role as they are assigned in a broader sense by the author to increase readability, these concept terms lacked specificity for in-depth analysis. We suggested that the ML algorithm being more compatible with unstructured data was a valid alternative to the author-generated keywords for more accurate results. Originality/Value. To our knowledge, this is the first-ever study that compared the results of author-provided keywords with machine-extracted terms with real datasets, which may be an essential lead in the machine learning domain. Replicating these techniques with large datasets from different fields may be a valuable knowledge resource for experts and stakeholders.
This study utilizes GPT (Generative Pre-Trained Transformer) language model-based AI writing tools to create a set of 80 academic writing samples based on the eight themes of the experiential sessions of the LTC 2023. These samples, each between 2000 and 2500 words long, are then analyzed using both conventional plagiarism detection tools and selected AI detection tools. The study finds that traditional syntactic similarity-based anti-plagiarism tools struggle to detect AI-generated text due to the differences in syntax and structure between machine-generated and human-written text. However, the researchers discovered that AI detector tools can be used to catch AI-generated content based on specific characteristics that are typical of machine-generated text. The paper concludes by posing the question of whether we are entering an era in which AI detectors will be used to prevent AI-generated content from entering the scholarly communication process. This research sheds light on the challenges associated with AI-generated content in the academic research literature and offers a potential solution for detecting and preventing plagiarism in this context.
Purpose The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly. Design/methodology/approach By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts. Findings Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment. Practical implications Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP. Originality/value To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.