2 Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference and reconstruction of species trees under horizontal gene transfer and recombination events.The interactive web service allows researchers to explore the performance of alignmentfree tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-theart tools, accelerating the development of new, more accurate AF solutions.
BACKGROUNDComparative analysis of DNA and amino acid sequences is of fundamental importance in biological research, particularly in molecular biology and genomics. It is the first and key step in molecular evolutionary analysis, gene function and regulatory region prediction, sequence assembly, homology searching, molecular structure prediction, gene discovery and protein structure-function relationships analysis. Traditionally, sequence comparison was based on pairwise or multiple sequence alignment (MSA). Software tools for sequence alignment, such as BLAST [1] and CLUSTAL [2], are the most widely used bioinformatics methods.Although alignment-based approaches generally remain the references for sequence