Software requirement changes, code changes, software reuse, and testing are important activities in software engineering that involve the traceability links between software requirements and code. Software requirement documents, design documents, code documents, and test case documents are the intermediate products of software development. The lack of interrelationship between these documents can make it extremely difficult to change and maintain the software. Frequent requirements and code changes are inevitable in software development. Software reuse, change impact analysis, and testing also require the relationship between software requirements and code. Using these traceability links can improve the efficiency and quality of related software activities. Existing methods for constructing these links need to be better automated and accurate. To address these problems, we propose to embed software requirements and source code into feature vectors containing their semantic information based on four neural networks (NBOW, RNN, CNN, and self-attention). Accurate traceability links from requirements to code are established by comparing the similarity between these vectors. We develop a prototype tool RCT based on this method. These four networks’ performances in constructing links are explored on 18 open-source projects. The experimental results show that the self-attention network performs best, with an average Recall@50 value of 0.687 on the 18 projects, which is higher than the other three neural network models and much higher than previous approaches using information retrieval and machine learning.
Code clone refers to two or more identical or similar source code fragments. Research on code clone detection has lasted for decades. Investigation and evaluation of existing clone detection techniques indicate that they are resilient to function-level clone detection. Still, there may be room for further research in block-level clone detection. Particularly, type-3 clones that include large gaps, are ongoing challenges. To solve these problems, we propose a clone detection method based on multiple code features. It aims to improve the recall rate of code block clone detection and overcome large-gap and hard-to-detect type-3 clones. This method first splits the source code files based on the program’s structural features and context features to obtain code blocks. The collection of code blocks obtained in this way is complete, and the large gaps in clone pairs will also be removed. In addition, we only need to compute the similarity between code blocks with the same structural features, which can also significantly save time and resources. The similarity is obtained by calculating the proportion of the same tokens between two code blocks. Moreover, since different types of tokens have different weights in similarity calculation, we use supervised learning to obtain a classifier model between token features and code clone. We divide the tokens into 13 types and train the machine learning model with the manually confirmed clone or non-clone pair. Finally, we develop a prototype system and compare our tools with existing tools under the Mutation Framework and in several actual C projects. The experimental results also demonstrate the advancement and practicality of our prototype.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.