Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review (SLR) on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We selected and analyzed 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application, highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and highlighting promising areas for future study. Our artifacts are publicly available at https://github.com/xinyi-hou/LLM4SE_SLR .
Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early stages. To bridge this gap, we conducted a systematic literature review (SLR) on LLM4SE, with a particular focus on understanding how LLMs can be exploited to optimize processes and outcomes. We selected and analyzed 395 research papers from January 2017 to January 2024 to answer four key research questions (RQs). In RQ1, we categorize different LLMs that have been employed in SE tasks, characterizing their distinctive features and uses. In RQ2, we analyze the methods used in data collection, preprocessing, and application, highlighting the role of well-curated datasets for successful LLM for SE implementation. RQ3 investigates the strategies employed to optimize and evaluate the performance of LLMs in SE. Finally, RQ4 examines the specific SE tasks where LLMs have shown success to date, illustrating their practical contributions to the field. From the answers to these RQs, we discuss the current state-of-the-art and trends, identifying gaps in existing research, and highlighting promising areas for future study. Our artifacts are publicly available at https://github.com/xinyi-hou/LLM4SE_SLR .
The procedure for automated accounting of publications based on the use of Rest API of the ORCID database is proposed. The relevance of publication accounting is described. The importance of using various technologies for creating bibliographic data repositories is substantiated. The possibility of using API technology in the most famous publication databases such as Web of science, SCOPUS, Crossref, Google Scholar, and ORCID was analyzed. The possibility of using the ORCID database is substantiated. The scheme for downloading publications from the ORCID database by specified registration numbers based on services implemented in the Python and MatLab programming languages is given. The received data in JSON or XML is subject to further parsing. MatLab functions for obtaining a structure from XML (JSON) data formats are provided.In addition, the algorithm for finding duplicate publications during their accounting is considered. Approaches to avoid duplication of publications in databases based on the application of the Levenstein algorithm for similarity assessment are formulated. It is proposed to transliterate the Cyrillic alphabet into the Latin alphabet to ensure clarity and correct comparison of textual data. A MySql database was developed to collect and update data on publishing activity. The title of the publication table of the database is supplemented with a special attribute, which stores the results of the conversion of Cyrillic names into corresponding Latin names. It is recommended to use indexing of database table fields (INDEX) by various attributes, which allowed to significantly increase the efficiency of searching, processing and comparing data. It is proposed to use the Soundex() function as a MySQL DBMS tool to determine the level of consonance of publication topics by additional parameters. The practical implementation of the algorithm for finding duplicate publications and their numbering confirmed the constructiveness of the proposed approach which was confirmed when filling the database. This article is of interest to software developers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.