Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Purpose: Chain-of-thought prompting is a method to make a Large Language Model (LLM) generate intermediate reasoning steps when solving a complex problem to increase its performance. OpenAI's o1 preview is an LLM that has been trained with reinforcement learning to create such a chain-of-thought internally, prior to giving a response and has been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate its performance for text mining in oncology. Methods: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o and o1 preview were instructed to do the same classification based on the publications' abstracts. Results: For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76 - 0.83) and 0.91 (0.89 - 0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95 - 0.98) and 0.99 (0.99 - 1.00), respectively. Conclusion: o1 preview outperformed GPT-4o for extracting if people with localized and or metastatic disease were eligible for a trial from its abstract. o1 previews's performance was close to human annotation but could still be improved when dealing with cancer screening and prevention trials as well as by adhering to the desired output format. While research on additional tasks is necessary, it is likely that reasoning models could become the new state of the art for text mining in oncology and various other tasks in medicine.
Purpose: Chain-of-thought prompting is a method to make a Large Language Model (LLM) generate intermediate reasoning steps when solving a complex problem to increase its performance. OpenAI's o1 preview is an LLM that has been trained with reinforcement learning to create such a chain-of-thought internally, prior to giving a response and has been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate its performance for text mining in oncology. Methods: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o and o1 preview were instructed to do the same classification based on the publications' abstracts. Results: For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76 - 0.83) and 0.91 (0.89 - 0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95 - 0.98) and 0.99 (0.99 - 1.00), respectively. Conclusion: o1 preview outperformed GPT-4o for extracting if people with localized and or metastatic disease were eligible for a trial from its abstract. o1 previews's performance was close to human annotation but could still be improved when dealing with cancer screening and prevention trials as well as by adhering to the desired output format. While research on additional tasks is necessary, it is likely that reasoning models could become the new state of the art for text mining in oncology and various other tasks in medicine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.