2016
DOI: 10.1155/2016/4986707
|View full text |Cite
|
Sign up to set email alerts
|

PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets

Abstract: Identifying conserved patterns in DNA sequences, namely, motif discovery, is an important and challenging computational task. With hundreds or more sequences contained, the high-throughput sequencing data set is helpful to improve the identification accuracy of motif discovery but requires an even higher computing performance. To efficiently identify motifs in large DNA data sets, a new algorithm called PairMotifChIP is proposed by extracting and combining pairs of l-mers in the input with relatively small Ham… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 30 publications
0
9
0
Order By: Relevance
“…In SamSelect, we set w and k to 12 and 1, respectively. With this setting, in addition to the guarantee of good space and time performance, we would also like to obtain more motif information, as the probability analysis shows that count 1 (12-mer) for a motif instance is significantly larger than that for a background substring [ 29 ].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In SamSelect, we set w and k to 12 and 1, respectively. With this setting, in addition to the guarantee of good space and time performance, we would also like to obtain more motif information, as the probability analysis shows that count 1 (12-mer) for a motif instance is significantly larger than that for a background substring [ 29 ].…”
Section: Methodsmentioning
confidence: 99%
“…2 . The initial value of threshold f is set to the sum of N r and N m , where N r and N m are count k ( w -mer) for a background substring and a motif instance for a random case, respectively; the calculation method of N r and N m is given in [ 29 ]. For any two overlapped w -mers, if the length of the overlap is greater than or equal to w /2, we combine the two w -mers into one substring.…”
Section: Methodsmentioning
confidence: 99%
“…Considering current efforts in analyzing large amounts of data, Yu et al ( 2016 ) proposed a new algorithm, PairMotifChIP, for this purpose. This tool can identify motifs by extracting combining pairs of an “l” width in the input sequences that have small Hamming distance, distinguishing the motifs from random overrepresented sequences by probabilistic analysis and then combines the remaining sequences to form motifs.…”
Section: Eukaryotic Promoters and Transcription Factors: The Blocks Tmentioning
confidence: 99%
“…This tool can identify motifs by extracting combining pairs of an “l” width in the input sequences that have small Hamming distance, distinguishing the motifs from random overrepresented sequences by probabilistic analysis and then combines the remaining sequences to form motifs. This tool runs very fast and does not require previous user information (Yu et al, 2016 ). Caldonazzo Garbelini et al ( 2018 ) created a new approach for motif discovery by making use of a genetic algorithm to escape from optimal local solutions.…”
Section: Eukaryotic Promoters and Transcription Factors: The Blocks Tmentioning
confidence: 99%
“…The recent years have witnessed the proposal of some motif discovery algorithms based on new strategies aimed at efficiently processing large datasets. PairMotifChIP [23] discovers motifs by mining and merging pairs of similar substrings in the input sequences. It spends a large portion of running time on the former operation, which shows quadratic growth as the dataset size increases.…”
Section: Introductionmentioning
confidence: 99%