2020
DOI: 10.1101/2020.08.05.237206
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Aggregation Tool for Genomic Concepts (ATGC): A deep learning framework for somatic mutations and other sparse genomic measures

Abstract: Deep learning has the ability to extract meaningful features from data given enough training examples. Large scale genomic data are well suited for this class of machine learning algorithms; however, for many of these data the labels are at the level of the sample instead of at the level of the individual genomic measures. To leverage the power of deep learning for these types of data we turn to a multiple instance learning framework, and present an easily extensible tool built with TensorFlow and Keras. We sh… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 66 publications
0
5
0
Order By: Relevance
“…As the data set and scientific understanding continue to grow, process improvement efforts continuously refine these centralized filters to identify and address lower frequency clonal hematopoiesis variants, center- or platform-specific hotspot artifacts, and differences in panel performance across sites. An example of this is the development of a computational model to enable the comparison of tumor mutation burden measurements across the many different testing platforms within the GENIE consortium ( 36 ). These systems and processes are readily expandible to the comprehensive whole-exome, genome, and transcriptome sequencing as well as other types of genomic data that are increasingly affecting the management of patients with cancer.…”
Section: Discussionmentioning
confidence: 99%
“…As the data set and scientific understanding continue to grow, process improvement efforts continuously refine these centralized filters to identify and address lower frequency clonal hematopoiesis variants, center- or platform-specific hotspot artifacts, and differences in panel performance across sites. An example of this is the development of a computational model to enable the comparison of tumor mutation burden measurements across the many different testing platforms within the GENIE consortium ( 36 ). These systems and processes are readily expandible to the comprehensive whole-exome, genome, and transcriptome sequencing as well as other types of genomic data that are increasingly affecting the management of patients with cancer.…”
Section: Discussionmentioning
confidence: 99%
“…Variables of interest extracted from the database included demographic data, genomic alterations with their OncoKB annotations for therapeutic evidence level, presence of The Cancer Genome Atlas PanCancer pathway alterations, and estimation of TMB. 11 Demographic data collected for each patient included patient age at sequencing, sex, and race as recorded by the submitting institution. Race was analyzed in this study given the large variation in cancer incidence between races and the potential for differential variant factors by race.…”
Section: Methodsmentioning
confidence: 99%
“…Overall, the data modalities available through the AACR project GENIE registry were inadequate to accurately computationally derive mutation cellular fractions as allele-specific copy numbers could not be computed. Estimation of the number of non-synonymous somatic mutations per Mb (tumor mutation burden; TMB) was calibrated using ATGC, a machine learning model that incorporates positional and sequence related contexts to identify somatic variants (27). For mutation signature analyses, we restricted the dataset to 19,057 patients with a TMB of at least 1 mut/Mb.…”
Section: Next-generation Sequencing Tumor Mutation Burden and Mutatio...mentioning
confidence: 99%