Article:Lones, Michael A. and Tyrrell, Andy M. orcid.org/0000-0002- 8533-2404 (2007) Regulatory motif discovery using a population clustering evolutionary algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics. pp. 403-414. ISSN 1545-5963 https://doi.org/10. 1109/tcbb.2007.1044 eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item.
TakedownIf you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. Abstract-This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences.Index Terms-Evolutionary computation, population-based data clustering, motif discovery, transcription factor binding sites, musclespecific gene expression.
Ç
1I NTRODUCTIONA motif, in the context of biological sequence analysis, is a pattern of nucleotide bases or amino acids which captures a biologically meaningful feature common to a group of nucleic acid or protein sequences. Examples of motifs include protein domains and binding sites within amino acid sequences, and regulatory, splicing and localization signals within DNA and RNA sequences. Motif discovery is the process of identifying motifs within biological sequences.In this paper, we focus upon the problem of identifying regulatory motifs within the promoter sequences of coexpressed genes. The identification of regulatory motifs is an important problem in contemporary biology since it underlies efforts to understand and reconstruct the regulatory networks that are central to the functioning of biological organisms. However, it is also a particularly hard problem, made difficult by a low signal-to-noise ratio resulting from the poor conservation and short length of...