2020
DOI: 10.1038/s41523-020-00180-x
|View full text |Cite
|
Sign up to set email alerts
|

Breast cancer gene expression datasets do not reflect the disease at the population level

Abstract: Publicly available tumor gene expression datasets are widely reanalyzed, but it is unclear how representative they are of clinical populations. Estimations of molecular subtype classification and prognostic gene signatures were calculated for 16,130 patients from 70 breast cancer datasets. Collated patient demographics and clinical characteristics were sparse for many studies. Considerable variations were observed in dataset size, patient/tumor characteristics, and molecular composition. Results were compared … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 22 publications
0
10
0
Order By: Relevance
“…In addition, although datasets provide a valuable resource to test hypotheses for individual genes/signatures, there are variations in terms of size, patient characteristics and molecular composition of datasets and they do not necessarily reflect the studied cohort of BC patients. This is reported in the literature [45]. However, this indicates the need for more studies to test the value of these two proteins as prognostic markers in BC.…”
Section: Discussionmentioning
confidence: 88%
“…In addition, although datasets provide a valuable resource to test hypotheses for individual genes/signatures, there are variations in terms of size, patient characteristics and molecular composition of datasets and they do not necessarily reflect the studied cohort of BC patients. This is reported in the literature [45]. However, this indicates the need for more studies to test the value of these two proteins as prognostic markers in BC.…”
Section: Discussionmentioning
confidence: 88%
“…We set out to validate our results in the TCGA breast cancer cohort 52 . Albeit not population-based, and as previously reported, having a bias towards larger tumors with higher grade and stage 53 , this dataset represents a comparably large tumor collection with publicly available RNA-seq data. Generally, the SCAN-B results were confirmed in TCGA.…”
Section: Resultsmentioning
confidence: 99%
“…In fact, the PAM50 panel and intrinsic subtypes gene signature prototypes were obtained from bulk tissue data, and this can introduce a bias due to sampling procedures, as also discussed in [ 35 ]. Furthermore, one has to take into account that currently available large expression sets are poorly able to reflect BRCA at the population level [ 36 ]; thus, larger curated datasets will be required to refine predictions. In Supplementary Figure S2, the relative abundance of each SURFACER subtype into classic PAM50-assigned patients clusters is showed.…”
Section: Resultsmentioning
confidence: 99%
“…Most of the differences showed between TCGA and METABRIC cohorts KM curves may be in part dependent to the difference in both sample composition and the completeness of associated clinical data. In fact, it was previously observed that molecular subtypes composition between TCGA and METABRIC datasets is variable, and it does not adequately capture BRCA real subtype distribution at the population level [ 36 ].…”
Section: Resultsmentioning
confidence: 99%