2016
DOI: 10.1186/s13029-016-0060-z
|View full text |Cite
|
Sign up to set email alerts
|

PureCN: copy number calling and SNV classification using targeted short read sequencing

Abstract: BackgroundMatched sequencing of both tumor and normal tissue is routinely used to classify variants of uncertain significance (VUS) into somatic vs. germline. However, assays used in molecular diagnostics focus on known somatic alterations in cancer genes and often only sequence tumors. Therefore, an algorithm that reliably classifies variants would be helpful for retrospective exploratory analyses. Contamination of tumor samples with normal cells results in differences in expected allelic fractions of germlin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
118
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 125 publications
(118 citation statements)
references
References 27 publications
0
118
0
Order By: Relevance
“…Variants were called with Pindel, Socrates, PureCN, and MuTect using the default 0.5% lower threshold. 1922 Variants were then annotated, restricted to non-synonymous mutations involving protein-coding regions, and filtered using common reference databases and internal controls to remove germline variants and artifacts. All ALK resistance mutations were re-genotyped with Genome Analysis Toolkit.…”
Section: Methodsmentioning
confidence: 99%
“…Variants were called with Pindel, Socrates, PureCN, and MuTect using the default 0.5% lower threshold. 1922 Variants were then annotated, restricted to non-synonymous mutations involving protein-coding regions, and filtered using common reference databases and internal controls to remove germline variants and artifacts. All ALK resistance mutations were re-genotyped with Genome Analysis Toolkit.…”
Section: Methodsmentioning
confidence: 99%
“…We then sought to build a model to predict purity by learning the statistics of the input data, as opposed to hand-crafting a statistical model, as with previous methods [10,11]. Additionally, given that expert human annotators use scatterplots to visualize the data, we reasoned that the very same scatterplots would and should serve as the best data representation for our model.…”
Section: Resultsmentioning
confidence: 99%
“…We built and trained a CNN regression model to output tumor purity based upon the very scatterplot images of input data that an expert human annotator would use for prediction. We found that with our data representation, combined with a CNN and data augmentation, we could predict purity better than an existing algorithm [10].…”
Section: Introductionmentioning
confidence: 90%
See 2 more Smart Citations