insofar as their discovery may lead to major understanding of evolutive processes in living organisms.Since Stormo [37] reviewed strategies to find motifs with computer algorithms, a large amount of algorithms have been developed. A major classification of these algorithms is done according to the type of DNA data sequence used to find motifs. Although there is no universal consensus on how to divide algorithms based on the input data, three major groups are generally considered [7]:1) The algorithms that use information from coregulated genes from a single genome.2) The algorithms that use information of a single gene in multiple species.3) The algorithms that use information from phylogenetic footprints. Despite the large amount of works found and grouped in classes 2) (see [31], [38], [30]) and 3) (see [1], [12], [46]), this paper is focused on providing a general overview on current algorithms that make use of the information that promoter sequences of coregulated genes generate. Actually, these methods can also be subdivided into multiple strategies, but this work only examines those based on dictionaries, ensembles and artificial intelligence-based techniques, as they represent the classical and the leading ones, respectively.The rest of the paper is structured as follows. Section II discusses the most relevant works recently published related to dictionary-based algorithms. On the other hand, Section III presents ensemble algorithms used to find motifs in DNA. As for Section IV, it presents the latest AI works in the DNA motifs discovery field. Finally, Section V provides a brief summary of strengths and weaknesses of the reviewed strategies.II. DICTIONARY-BASED ALGORITHMS These are enumerative algorithms which, in contrast to heuristic methods, exhaustively cover the space of all possible motifs for a specific motif model description. The methodology progressively considers over-represented words, from short to long. The over-representativeness of a long word is computed as the weighted average of the short words in the current dictionary which could be part of the long word. Although this methodology is, in essence,
1074Abstract-Many approaches are currently devoted to find DNA motifs in nucleotide sequences. However, this task remains challenging for specialists nowadays due to the difficulties they find to deeply understand gene regulatory mechanisms, especially when analyzing binding sites in DNA. These sites or specific nucleotide sequences are known to be responsible for transcription processes. Thus, this work aims at providing an updated overview on strategies developed to discover meaningful motifs in DNA-related sequences, and, in particular, their attempts to find out relevant binding sites. From all existing approaches, this work is focused on dictionary, ensemble, and artificial intelligence-based algorithms since they represent the classical and the leading ones, respectively.