SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets

Yu, Qiang; Wei, Dingbang; Huo, Hongwei

doi:10.1186/s12859-018-2242-y

Cited by 17 publications

(7 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the iterative optimization algorithm, the number of initial motifs and the amount of calculation of refinement of initial motifs increase with the increase of data size. For example, samselect [26], meme chip [27], and micsa [28], select a part of the sequence from the entire data set for motif discovery, which will greatly reduce the running time, but it is difficult to identify the motif with a weak signal. Pairmotifchip [29] mining and merging similar substring pairs from the input sequence to get the motif.…”

Section: Dna Sequence Specificity Predictionmentioning

confidence: 99%

DNA Sequence Specificity Prediction Algorithm Based on Artificial Intelligence

Zhai

Tuerxun

2022

Mathematical Problems in Engineering

View full text Add to dashboard Cite

DNA sequence specificity refers to the ability of DNA sequences to bind specific proteins. These proteins play a central role in gene regulation such as transcription and alternative splicing. Obtaining DNA sequence specificity is very important for establishing the regulatory model of the biological system and identifying pathogenic variants. Motifs are sequence patterns shared by fragments of DNA sequences that bind to specific proteins. At present, some motif mining algorithms have been proposed, which perform well under the condition of given motif length. This research is based on deep learning. As for the description of motif level, this paper constructs an AI based method to predict the length of the motif. The experimental results show that the prediction accuracy on the test set is more than 90%.

show abstract

Section: Dna Sequence Specificity Predictionmentioning

confidence: 99%

DNA Sequence Specificity Prediction Algorithm Based on Artificial Intelligence

Zhai

Tuerxun

2022

Mathematical Problems in Engineering

View full text Add to dashboard Cite

show abstract

“…From the literature perspective, most of the k-mers size should be within 7 to 15 of the real data set as mentioned in the article, samselect [17]. In order to attain accuracy within this limit, we are supposed to add and subtract some positions.…”

Section: Fixing K-mers and Mutations(d)mentioning

confidence: 99%

Quorum Planted Motif Discovery and Motif Finding Using S2f and Fff Algorithms

sivarajan

reddy

2022

Preprint

View full text Add to dashboard Cite

A comprehensive understanding of transcription factor binding sites (TFBSs) is a key problem in contemporary biology, which is a critical issue in gene regulation. In the process of identifying a pattern of TFBSs in every DNA sequence, motif discovery reveals the basic regulatory relationship and compassionate the evolutionary system of every species. In this case, however, it is a challenge to recognize the high-quality motif ( ℓ , d) . We intend to address the above problem to the motif discovery and the motif finding using approximate qPMS algorithms such as S2F (Segmentation to Filtration) and FFF (Firefly with FREEZE). To this end, whole DNA sequences are segmented in two sections where the first part is sliced by base and sub k-mers , and the motif is calculated based on the accuracy. The motif that is recognized in the first portion is given as an input to the FFF algorithm to identify the TFBSs locations in the second portion. The algorithm performance is tested on both simulated and real datasets. In particular, real datasets like Escherichia coli cyclic AMP receptor protein(CRP), mouse Embryonic Stem Cell (mESC), and human species ChIP-seq dataset are explored. Results from the experiments show that S2F and FFF algorithms can identify the motifs and appear faster compared to previous state-of-the-art PMS and QPMS algorithms.

show abstract

“…Some algorithms, such as the web version of MEME-ChIP [19] and MICSA [24], randomly select a portion of the entire dataset (e.g., 600 input sequences) to discover motifs, which may result in the loss of infrequent motifs. qPMS10 [25] and SamSelect [26] are specialized algorithms for selecting sample sequences. They select some sample sequences and then perform motif discovery on the selected sample sequences by running the existing qPMS algorithms.…”

Section: Introductionmentioning

confidence: 99%

A New Efficient Algorithm for Quorum Planted Motif Search on Large DNA Datasets

Zhang

2019

IEEE Access

Self Cite

View full text Add to dashboard Cite

Quorum planted (l, d) motif search (qPMS) is a challenging computational problem in bioinformatics, mainly for the identification of regulatory elements such as transcription factor binding sites in DNA sequences. Large DNA datasets play an important role in identifying high-quality (l, d) motifs, while most existing qPMS algorithms are too time-consuming to complete the calculation of qPMS in a reasonable time. We propose an approximate qPMS algorithm called APMS to deal with large DNA datasets mainly by accelerating neighboring substring search and filtering redundant substrings. Experimental results on them show that APMS can not only identify the implanted (l, d) motifs, but also run orders of magnitude faster than the state-of-the-art qPMS algorithms. The source code of APMS and the python wrapper for the code are freely available at https://github.com/qyu071/apms. INDEX TERMS Quorum planted (l, d) motif search, large DNA datasets, transcription factor binding sites.

show abstract

SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets

Cited by 17 publications

References 37 publications

DNA Sequence Specificity Prediction Algorithm Based on Artificial Intelligence

DNA Sequence Specificity Prediction Algorithm Based on Artificial Intelligence

Quorum Planted Motif Discovery and Motif Finding Using S2f and Fff Algorithms

A New Efficient Algorithm for Quorum Planted Motif Search on Large DNA Datasets

Contact Info

Product

Resources

About