2023
DOI: 10.2139/ssrn.4418621
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Benchmark Pathology Report Text Corpus with Cancer Type Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…First, we selected 9,523 pathology reports from The Cancer Genome Atlas (TCGA) [9,12]. The availability of TNM annotation in the TCGA metadata varied: 6,887 reports were documented with known tumor size (T), 5,678 reports with known regional lymph node involvement (N), and 4,608 reports with known metastasis (M).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…First, we selected 9,523 pathology reports from The Cancer Genome Atlas (TCGA) [9,12]. The availability of TNM annotation in the TCGA metadata varied: 6,887 reports were documented with known tumor size (T), 5,678 reports with known regional lymph node involvement (N), and 4,608 reports with known metastasis (M).…”
Section: Resultsmentioning
confidence: 99%
“…Reports were initially stored in PDF format; in previous work, we converted the TCGA pathology report corpus to machine-readable plain text using OCR, performed extensive curation, and fully characterized the final TCGA report set. The final dataset spanned 9,523 reports, with 1:1 patient:report ratio [9,18]. TNM staging annotation was contained within the clinical metadata provided by TCGA [17].…”
Section: Tcga Pathology Report Dataset Construction With Tnm Annotationmentioning
confidence: 99%
See 1 more Smart Citation