2021
DOI: 10.1101/2021.08.15.21261805
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Leveraging sequences missing from the human genome to diagnose cancer

Abstract: Cancer diagnosis using cell-free DNA (cfDNA) can significantly improve treatment and survival but has several technical limitations. Here, we show that tumor-associated mutations create neomers, DNA sequences 11-18bp in length that are absent in the human genome, that can accurately detect cancer subtypes and features. We show that we can detect twenty-one different tumor-types with higher accuracy than state-of-the-art methods using a neomer-based classifier. Refinement of this classifier via supervised learn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

5
2

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 92 publications
0
11
0
Order By: Relevance
“…We extracted every sixteen base-pair (bp) kmer found in each sample and split samples in ten groups or folds, with the proportion of cases over the total samples in each fold being maintained. The choice of kmer length was informed from previous studies in which we found that the performance of the kmer-based models increased as a function of kmer length up to sixteen bp length (Georgakopoulos-Soares, Barnea, et al 2021). For each fold, we examined which subset of the total kmers detected constituted frequentmers, using the number of samples in which each kmer was found as the recurrency threshold (see Methods).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We extracted every sixteen base-pair (bp) kmer found in each sample and split samples in ten groups or folds, with the proportion of cases over the total samples in each fold being maintained. The choice of kmer length was informed from previous studies in which we found that the performance of the kmer-based models increased as a function of kmer length up to sixteen bp length (Georgakopoulos-Soares, Barnea, et al 2021). For each fold, we examined which subset of the total kmers detected constituted frequentmers, using the number of samples in which each kmer was found as the recurrency threshold (see Methods).…”
Section: Resultsmentioning
confidence: 99%
“…Kmers have been previously used to describe new features or characteristics of an organism related to the presence or absence of a specific contiguous subsequence. For example, the subset of kmers that do not appear in a genome are referred to as nullomers (Acquisti et al 2007; Georgakopoulos-Soares, Yizhar-Barnea, et al 2021; Koulouras and Frith 2021) and the subset of kmers that are found in a single species are referred to as quasi-primes (Mouratidis et al 2023). Using kmer strategies, we may efficiently mine the human genome for differences that distinguish patients with disease from healthy individuals in effort to establish unique biological signatures.…”
Section: Introductionmentioning
confidence: 99%
“…Identification of kmers was performed as previously defined in (Georgakopoulos-Soares, Yizhar-Barnea, et al 2021). For nucleic kmers and nullomers, the lengths of six to twelve bps were used for eukaryotes, bacteria, and archaea, whereas for viruses due to their smaller genome size, the lengths of three to six bps were used.…”
Section: Methodsmentioning
confidence: 99%
“…For peptide kmers, oligopeptide lengths of up to seven amino acids were used across all the species. Nullomer and nullpeptide detection was performed as previously described in (Georgakopoulos-Soares, Yizhar-Barnea, et al 2021) for each species at each kmer length.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation