The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site.
Eosinophilic renal neoplasms have a wide spectrum of histologic presentations, and several studies have demonstrated a subtype of renal cell carcinomas (RCCs) associated with the tuberous sclerosis complex (TSC)/mammalian target of rapamycin pathway. A review of our institutional archives led to the identification of 18 cases of renal eosinophilic tumors with unusual morphology. Immunohistochemical analysis demonstrated that these could be separated into 3 groups: group 1 had solid architecture and morphology similar to chromophobe RCC but was negative for CK20 and vimentin, and had weak focal staining for CK7 and P504S; group 2 had solid architecture and morphology similar to either renal oncocytoma or chromophobe RCC, eosinophilic variant and had diffuse staining of CK7 and P504S, absent to weak staining of CK20, and negative staining for vimentin; and group 3 had solid, cystic and papillary architecture and was negative for CK7, except for 1 case, along with moderate to strong staining of CK20, P504S, and vimentin. The cases were then sent for next-generation sequencing to determine whether molecular pathogenic variants were present. In group 1, all 3 cases had mutations in TSC2. In group 2, pathogenic variants were identified in 3 genes: TSC1, TSC2, and MTOR. In group 3, genetic alterations and pathogenic variants were identified in TSC1 and TSC2. Our results support TSC/MTOR-associated neoplasms as a distinct group that exhibits heterogenous morphology and immunohistochemical staining.
Objectives Numerous studies on malignant mesothelioma (MM) highlight the prognostic importance of histologic subtype, nuclear grade, and necrosis. This study compares these parameters in paired biopsy and resection specimens of pleural MM. Methods Histologic subtype, percentage of epithelioid morphology, nuclear grade, and the presence or absence of necrosis were compared in 429 paired biopsies and resection specimens of pleural MM from 19 institutions. Results Histologic subtype was concordant in 81% of cases (κ = 0.58). When compared with resection specimens, epithelioid morphology at biopsy had a positive predictive value (PPV) of 78.9% and a negative predictive value (NPV) of 93.5%; sarcomatoid morphology showed high PPV (92.9%) and NPV (99.3%), and biphasic morphology PPV was 89.7% and NPV was 79.7%. Agreement of the percentage of epithelioid morphology was fair (κ = 0.27). Nuclear grade and necrosis were concordant in 75% (κ = 0.59) and 81% (κ = 0.53) of cases, respectively. Nuclear grade showed moderate (κ = 0.53) and substantial (κ = 0.67) agreement from patients with and without neoadjuvant therapy, respectively, and necrosis showed moderate (κ = 0.47 and κ = 0.60) agreement, respectively, in the same subsets of paired specimens. Conclusions Paired biopsy-resection specimens from pleural MM show overall moderate agreement in pathologic parameters. These findings may help guide postbiopsy management and triage of patients with MM.
The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. This site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the digital image characteristics constituting this histologic batch effect. As an example, we show that patient ethnicity within the TCGA breast cancer cohort can be inferred from histology due to site-level batch effect, which must be accounted for to ensure equitable application of DL. Batch effect also leads to overoptimistic estimates of model performance, and we propose a quadratic programming method to guide validation that abrogates this bias.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.