2022
DOI: 10.7717/peerj.13544
|View full text |Cite
|
Sign up to set email alerts
|

An efficient numerical representation of genome sequence: natural vector with covariance component

Abstract: Background The characterization and comparison of microbial sequences, including archaea, bacteria, viruses and fungi, are very important to understand their evolutionary origin and the population relationship. Most methods are limited by the sequence length and lack of generality. The purpose of this study is to propose a general characterization method, and to study the classification and phylogeny of the existing datasets. Methods We present a new alignment-free method to represent and compare biological … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 29 publications
0
2
0
Order By: Relevance
“…The other approach is based on alignment-free, which mainly include methods based on k-mer frequency ( Sims et al., 2009 ), the length of common substrings ( Leimeister and Morgenstern, 2014 ), graphical representation ( Jeffrey, 1990 ), micro-alignments ( Yi and Jin, 2013 ), and the number of word matches ( Bromberg et al., 2016 ). Our team proposed k-mer natural vector to compare genomic sequences ( Deng et al., 2011 ), which has been successfully applied to many classification and phylogenetic tasks ( Sun et al., 2021 ; Sun et al., 2022a ; Sun et al., 2022b ). K-mer natural vector characterizes the statistical distribution of k-mers.…”
Section: Resultsmentioning
confidence: 99%
“…The other approach is based on alignment-free, which mainly include methods based on k-mer frequency ( Sims et al., 2009 ), the length of common substrings ( Leimeister and Morgenstern, 2014 ), graphical representation ( Jeffrey, 1990 ), micro-alignments ( Yi and Jin, 2013 ), and the number of word matches ( Bromberg et al., 2016 ). Our team proposed k-mer natural vector to compare genomic sequences ( Deng et al., 2011 ), which has been successfully applied to many classification and phylogenetic tasks ( Sun et al., 2021 ; Sun et al., 2022a ; Sun et al., 2022b ). K-mer natural vector characterizes the statistical distribution of k-mers.…”
Section: Resultsmentioning
confidence: 99%
“…Fourth, though applicable to any type of ordinal predictor, MANOCCA is currently restricted to continuous outcomes and unstructured observations. Other fields of application interested in changes in outcome relationships, such as the study of mutation mechanisms in genomic sequences [ 46 ], the study of longitudinal data (changes of covariances across different timepoints) or covariances across related individuals, would require further work to be studied using MANOCCA.…”
Section: Discussionmentioning
confidence: 99%