2018
DOI: 10.1186/s12859-018-2242-y
|View full text |Cite
|
Sign up to set email alerts
|

SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets

Abstract: BackgroundGiven a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
1
1

Relationship

1
9

Authors

Journals

citations
Cited by 17 publications
(7 citation statements)
references
References 37 publications
0
7
0
Order By: Relevance
“…Based on the iterative optimization algorithm, the number of initial motifs and the amount of calculation of refinement of initial motifs increase with the increase of data size. For example, samselect [26], meme chip [27], and micsa [28], select a part of the sequence from the entire data set for motif discovery, which will greatly reduce the running time, but it is difficult to identify the motif with a weak signal. Pairmotifchip [29] mining and merging similar substring pairs from the input sequence to get the motif.…”
Section: Dna Sequence Specificity Predictionmentioning
confidence: 99%
“…Based on the iterative optimization algorithm, the number of initial motifs and the amount of calculation of refinement of initial motifs increase with the increase of data size. For example, samselect [26], meme chip [27], and micsa [28], select a part of the sequence from the entire data set for motif discovery, which will greatly reduce the running time, but it is difficult to identify the motif with a weak signal. Pairmotifchip [29] mining and merging similar substring pairs from the input sequence to get the motif.…”
Section: Dna Sequence Specificity Predictionmentioning
confidence: 99%
“…From the literature perspective, most of the k-mers size should be within 7 to 15 of the real data set as mentioned in the article, samselect [17]. In order to attain accuracy within this limit, we are supposed to add and subtract some positions.…”
Section: Fixing K-mers and Mutations(d)mentioning
confidence: 99%
“…Some algorithms, such as the web version of MEME-ChIP [19] and MICSA [24], randomly select a portion of the entire dataset (e.g., 600 input sequences) to discover motifs, which may result in the loss of infrequent motifs. qPMS10 [25] and SamSelect [26] are specialized algorithms for selecting sample sequences. They select some sample sequences and then perform motif discovery on the selected sample sequences by running the existing qPMS algorithms.…”
Section: Introductionmentioning
confidence: 99%