The development of precision medicine in oncology to define profiles of patients who could benefit from specific and relevant anti-cancer therapies is essential. An increasing number of specific eligibility criteria are necessary to be eligible to targeted therapies. This study aimed to develop an automated algorithm based on natural language processing to detect patients and tumor characteristics to reduce the time-consuming prescreening for trial inclusions. Hence, 640 anonymized multidisciplinary team meeting (MTM) reports concerning lung cancer were extracted from one teaching hospital data warehouse in France and annotated. To automate the extraction of 52 bioclinical information corresponding to 8 major eligibility criteria, regular expressions were implemented and evaluated. The performance parameters were satisfying: macroaverage F1-score 93%; rates reached 98% for precision and 92% for recall. In MTM, fill rates variabilities among patients and tumors information remained important (from 31.4% to 100%). The least reported characteristics and the most difficult to automatically collect were genetic mutations and rearrangement test results.
Defining profiles of patients that could benefit from relevant anti-cancer treatments is essential. An increasing number of specific criteria are necessary to be eligible to specific anti-cancer therapies. This study aimed to develop an automated algorithm able to detect patient and tumor characteristics to reduce the time-consuming prescreening for trial inclusions without delay. Hence, 640 anonymized multidisciplinary team meetings (MTM) reports concerning lung cancers from one French teaching hospital data warehouse between 2018 and 2020 were annotated. To automate the extraction of eight major eligibility criteria, corresponding to 52 classes, regular expressions were implemented. The RegEx’s evaluation gave a F1-score of 93% in average, a positive predictive value (precision) of 98% and sensitivity (recall) of 92%. However, in MTM, fill rates variabilities among patient and tumor information remained important (from 31% to 100%). Genetic mutations and rearrangement test results were the least reported characteristics and also the hardest to automatically extract. To ease prescreening in clinical trials, the PreScIOUs study demonstrated the additional value of rule based and machine learning based methods applied on lung cancer MTM reports.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.