2017
DOI: 10.1371/journal.pone.0173288
|View full text |Cite
|
Sign up to set email alerts
|

On DNA numerical representations for genomic similarity computation

Abstract: Genomic signal processing (GSP) refers to the use of signal processing for the analysis of genomic data. GSP methods require the transformation or mapping of the genomic data to a numeric representation. To date, several DNA numeric representations (DNR) have been proposed; however, it is not clear what the properties of each DNR are and how the selection of one will affect the results when using a signal processing technique to analyze them. In this paper, we present an experimental study of the characteristi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
36
0
2

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 51 publications
(39 citation statements)
references
References 35 publications
1
36
0
2
Order By: Relevance
“…In this work, a similar algorithm is implemented to analyse nucleotide sequences: each nucleotide position in a sequence is represented as a four elements vector, the Voss representation (Voss, 1992), encoding the probability of each base according to previously aligned reads. This numerical representation of DNA sequence is appropriate for the comparison of DNA sequences (Mendizabal-Ruiz et al, 2017) and their classification (Mendizabal-Ruiz et al, 2018). In molecular biology, similar algorithm has been applied to the clustering of amino acid sequences (Olshen et al, 2005) where vector quantization is used to estimate the probability density of amino acids.…”
Section: Methodsmentioning
confidence: 99%
“…In this work, a similar algorithm is implemented to analyse nucleotide sequences: each nucleotide position in a sequence is represented as a four elements vector, the Voss representation (Voss, 1992), encoding the probability of each base according to previously aligned reads. This numerical representation of DNA sequence is appropriate for the comparison of DNA sequences (Mendizabal-Ruiz et al, 2017) and their classification (Mendizabal-Ruiz et al, 2018). In molecular biology, similar algorithm has been applied to the clustering of amino acid sequences (Olshen et al, 2005) where vector quantization is used to estimate the probability density of amino acids.…”
Section: Methodsmentioning
confidence: 99%
“…This representation is one-dimensional [87,85]. This mapping can be obtained by substituting the four nucleotides (T, C, A, G) of a biological sequence for integers (0, 1, 2, 3), respectively, e.g., let s = (G, A, G, A, G, T, G, A, C, C, A), thus, d = (3, 2, 3, 2, 3, 0, 3, 2, 1, 1, 2), as exposed in Equation (5).…”
Section: Integer Representationmentioning
confidence: 99%
“…In [31,32] three physio-chemical based representations of DNA sequences (atomic, molecular mass, and Electron-Ion Interaction Potential, EIIP) were considered for genomic analysis, and the authors concluded that the choice of numerical representation did not have any effect on the results. The latest study comparing different numerical representation techniques [33] concluded that multi-dimensional representations (such as Chaos Game Representation) yielded better genomic comparison results than one-dimensional representations. However, in general there is no agreement on whether or not the choice of numerical representation for DNA sequences makes a difference on the genome comparison results, or what are the numerical representations that are best suited for analyzing genomic data.…”
Section: Numerical Representations Of Dna Sequencesmentioning
confidence: 99%