are generally too slow for iteratively searching through large sequence databases such as UniProt or NCBI's nonredundant (nr) database. Here we present HMM-HMM-based lightningfast iterative sequence search (HHblits), which extends HHsearch to enable fast, iterative sequence searches. The profile-profile alignment prefilter of HHblits reduces the number of full HMM-HMM alignments from many millions to a few thousand, making it faster than PSI-BLAST but still as sensitive as HHsearch (Supplementary Fig. 1).For iterative searches, HHblits needs a database of HMMs that covers the entire sequence space. We devised a very fast method, kClust (M. Hauser, C.E. Mayer and J.S., unpublished data), for clustering large sequence databases down to 20-30% maximum pairwise sequence identity while requiring almost full-length alignability (>80% coverage of longer sequences). This strict coverage criterion enriches for orthologous sequences with the same domain architecture 7 : of the UniProt20 clusters containing more than two Swiss-Prot sequences with enzyme commission numbers, 98.4% had all four enzyme commission digits conserved ( Supplementary Fig. 2). kClust is sufficiently fast (~1,000 times faster than BLAST) to allow for regular reclustering of the updated UniProt and nr databases. UniProt20 (the version from July 2011) contained 15 million sequences in 2.6 million HMMs, with an average of 5.5 sequences per cluster.HHblits first converts the query sequence (or MSA) to an HMM. This is conventionally done by adding pseudocounts of amino acids that are physicochemically similar to the amino acid in the query. In contrast, HHblits calculates pseudocounts that depend on the local sequence context (that is, the 13 positions around each residue). This method had improved the sensitivity and alignment quality of the resulting profile considerably 8 . HHblits then searches the HMM database and adds the sequences from HMMs below a defined expected value (E value) threshold to the query MSA, from which the HMM for the next search iteration is built ( Fig. 1a and Supplementary Fig. 3). For speed and sensitivity, the prefilter is crucial. The key idea was to implement profile-profile comparison as a sequence-to-profile comparison by discretizing the vectors of 20 amino acid probabilities in each HMM column into an alphabet of 219 letters. Each letter represents a typical profile column ( Supplementary Fig. 4). We approximate the database HMMs by sequences over this extended alphabet, ignoring the insertion and deletion probabilities of the HMMs (Supplementary Fig. 5). Before prefiltering, we calculate the score of each query HMM column with each of the 219 letters, which results in a 219-row extended sequence profile. The prefiltering consists of two steps (Supplementary Fig. 3 Building protein multiple-sequence alignments (MSAs) by iterative sequence searches is of fundamental importance in computational biology, as MSAs are a key intermediate step in the sequence-based prediction of evolutionarily conserved properties, such as tert...