2012
DOI: 10.1089/cmb.2011.0070
|View full text |Cite
|
Sign up to set email alerts
|

Separating Significant Matches from Spurious Matches in DNA Sequences

Abstract: Word matches are widely used to compare genomic sequences. Complete genome alignment methods often rely on the use of matches as anchors for building their alignments, and various alignment-free approaches that characterize similarities between large sequences are based on word matches. Among matches that are retrieved from the comparison of two genomic sequences, a part of them may correspond to spurious matches (SMs), which are matches obtained by chance rather than by homologous relationships. The number of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 36 publications
0
4
0
Order By: Relevance
“…If one wants to estimate phylogenetic distances between genomic sequences based on spaced-word matches between them, one needs to distinguish between matches representing true homologies and random background matches ( Devillers and Schbath, 2012 ). One possible way of reducing the number of background spaced-word matches would be to use a sufficiently high weight w , i.e.…”
Section: Algorithmmentioning
confidence: 99%
“…If one wants to estimate phylogenetic distances between genomic sequences based on spaced-word matches between them, one needs to distinguish between matches representing true homologies and random background matches ( Devillers and Schbath, 2012 ). One possible way of reducing the number of background spaced-word matches would be to use a sufficiently high weight w , i.e.…”
Section: Algorithmmentioning
confidence: 99%
“…The total is about ο ( λnm 2 ). In this study, the fast K-Nearest Neighbor Graph (K-NNG) construction method 48 49 is applied to the construction of the weighted sample graph, which reduces the time complexity from ο ( λnm 2 ) to ο ( λnm 1 14 ).…”
Section: Methodsmentioning
confidence: 99%
“…Scoring functions represent the core of ranking methods and are used to assign a relevance index to each feature/gene. The scoring functions mainly include the Z-score 11 and Welch t-test 12 from the t-test family, the Bayesian t-test 13 from the Bayesian scoring family, and the Info gain 14 method from the theory-based scoring family. However, the filter-ranking methods ignore the correlations among gene subset, so the selected gene subset may contain redundant information.…”
Section: Related Workmentioning
confidence: 99%
“…Average lengths of the repeats are given in Gu et al ( 2000 ). Recently, heuristics have been proposed and implemented (Devillers and Schbath, 2012 ; Rizk et al, 2013 ; Chikhi and Medvedev, 2014 ).…”
Section: Introductionmentioning
confidence: 99%