2019
DOI: 10.1101/625624
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Establishing reference samples for detection of somatic mutations and germline variants with NGS technologies

Abstract: 71We characterized two reference samples for NGS technologies: a human triple-negative 72 breast cancer cell line and a matched normal cell line. Leveraging several whole-genome 73 sequencing (WGS) platforms, multiple sequencing replicates, and orthogonal mutation 74 detection bioinformatics pipelines, we minimized the potential biases from sequencing 75 technologies, assays, and informatics. Thus, our "truth sets" were defined using evidence from 76 21 repeats of WGS runs with coverages ranging from 50X to 10… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1

Relationship

4
2

Authors

Journals

citations
Cited by 9 publications
(18 citation statements)
references
References 27 publications
0
18
0
Order By: Relevance
“…For full-spectrum analysis of the somatic mutation detection problem, we used the first comprehensive whole-genome characterized reference tumor-normal paired breast cancer cell lines (HCC1395 and HCC1395BL), developed by the Somatic Mutation Working Group of the SEQC-II consortium 12,13 . We leveraged high-confidence somatic mutations (39,536 SNVs and 2,020 INDELs) derived by the consortium as our ground truth set ( Suppl.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…For full-spectrum analysis of the somatic mutation detection problem, we used the first comprehensive whole-genome characterized reference tumor-normal paired breast cancer cell lines (HCC1395 and HCC1395BL), developed by the Somatic Mutation Working Group of the SEQC-II consortium 12,13 . We leveraged high-confidence somatic mutations (39,536 SNVs and 2,020 INDELs) derived by the consortium as our ground truth set ( Suppl.…”
Section: Resultsmentioning
confidence: 99%
“…We used several different training models in our analysis. First, we used the already available model published recently 11 which was trained using in silico spike-ins from the DREAM Challenge Stage 3 dataset 13 . Despite the large discrepancy between the sample types, sequencing platforms, coverages, spike-in mutation frequencies, and heterogeneity of the samples used to train the DREAM3 model, this model outperformed other conventional techniques across the real cancer datasets of diverse characteristics by more than ∼4% by the mean F1-score averaged across different samples for both SNVs and INDELs ( Figure 1a ).…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations