2023
DOI: 10.1101/2023.06.26.23291912
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalizable and Automated Classification of TNM Stage from Pathology Reports with External Validation

Abstract: Cancer staging is an essential clinical attribute informing patient prognosis and clinical trial eligibility. However, it is not routinely recorded in structured electronic health records. Here, we present a generalizable method for the automated classification of TNM stage directly from pathology report text. We train a BERT-based model using publicly available pathology reports across approximately 7,000 patients and 23 cancer types. We explore the use of different model types, with differing input sizes, pa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…Many studies have so far used machine learning algorithms for breast cancer classification and diagnosis [25]. Other studies tried to use different machine learning methods [26], combination of machine learning and rule-based approach [27], or very recently use large language models [28] to predict TNM stages of breast cancer from pathology text reports. The objective of the study was to propose a simple machine learning based model that can automatically (and with minimal data preparation) classify clinical/surgical pathology reports of breast cancer based on TNM stages.…”
Section: Discussionmentioning
confidence: 99%
“…Many studies have so far used machine learning algorithms for breast cancer classification and diagnosis [25]. Other studies tried to use different machine learning methods [26], combination of machine learning and rule-based approach [27], or very recently use large language models [28] to predict TNM stages of breast cancer from pathology text reports. The objective of the study was to propose a simple machine learning based model that can automatically (and with minimal data preparation) classify clinical/surgical pathology reports of breast cancer based on TNM stages.…”
Section: Discussionmentioning
confidence: 99%
“…Fijacko et al performed multinomial classification of abstract titles using the ChatGPT-4 application programming interface (API), through a python function call with predefined prompts, demonstrating the effectiveness of LLM-based approaches in bibliometric analysis [44]. Using optical character recognition to convert pathology reports into a textual format, Kefeli and Tatonetti trained several BERT-based models for TNM stage and cancer type classification [45,46]. Fang and Wang used several BERT models pre-trained on scientific literature for multi-label topic classification, achieving F1-scores over 90% [47].…”
Section: Text Classificationmentioning
confidence: 99%