Context: Requirements trace recovery (RTR) is always time-consuming, tedious, and fallible. There has been a growing interest in applying information retrieval (IR) to automate the process of recover trace links between requirements artifacts and other software artifacts. Objective: In this review, our objective is to identify the state-of-the-art of how IR has been explored to automate RTR and provide an overview of the research at the intersection of these two fields. Method: A systematic mapping study has been conducted, searching the main scientific databases. The search retrieved 1587 citations and 34 articles are retained as primary studies. Results: The results show the most active authors and publication distribution. It presents four kinds of IR models and 21 enhancement strategies applied to perform RTR. Besides, the lists of 37 experimental datasets and 9 measures, commonly used together to evaluate IR-based RTR approaches, are provided. Conclusions: Vector Space Model (VSM) and Latent Semantic Index (LSI) are the most two studied IR models used in RTR. CoEST becomes the most popular, convenient and stable source of datasets. Precision and Recall are the most common measures used to evaluate the performance of IR methods. Overall, IR-based RTR is becoming an increasingly mature cross research field.Keywords-requirements trace recovery, information retrieval, systematic mapping study retrieved automatically from the databases.
1) Inclusion and exclusion criteria.Once the potentially relevant studies have been obtained, their actual relevance needs to be assessed. We defined the following inclusion and exclusion criteria to select studies from the search results based on the SMS guidelines [5].Inclusion criteria: I1: The time span of the study is 2012.1-2021.12. I2: The research topic of study must be IR-based RTR. I3: The study is not a review paper. I4: The papers are written in English. I5: When two papers with the same technology and topic are provided by the same author, we select the one that is described in greater detail.Exclusion criteria: E1: The time span of the study is not during 2012.1-2021.12.E2: The research topic of study is not IR-based RTR.E3: The study is a review paper. E4: The paper is not written in English. E5: When two papers with the same technology and topic are provided by the same author, we exclude the one that is described less thoroughly.2) Search scope. Time period. We specify the time period of the published studies for this SMS from January 2012 to December 2021, which is when we started this SMS.Electronic databases. Based on the suggestion in [5] and the access authority of our institution, the following databases are selected as the primary study sources: IEEE, Google Scholar, Elsevier, EI Compendex, and Springer.