Inferring evolutionary relationships among highly divergent protein sequences is a daunting task. In particular, when pairwise sequence alignments between protein sequences fall <25% identity, the phylogenetic relationships among sequences cannot be estimated with statistical certainty. Here, we show that phylogenetic profiles generated with the Gestalt Domain Detection Algorithm-Basic Local Alignment Tool (GDDA-BLAST) are capable of deriving, ab initio, phylogenetic relationships for highly divergent proteins in a quantifiable and robust manner. Notably, the results from our computational case study of the highly divergent family of retroelements accord with previous estimates of their evolutionary relationships. Taken together, these data demonstrate that GDDA-BLAST provides an independent and powerful measure of evolutionary relationships that does not rely on potentially subjective sequence alignment. We demonstrate that evolutionary relationships can be measured with phylogenetic profiles, and therefore propose that these measurements can provide key insights into relationships among distantly related and/or rapidly evolving proteins.ab initio ͉ retroelements ͉ reverse transcriptase ͉ GDDA-BLAST T he ''protein problem'' has remained unsolved despite decades of research (1, 2). In principle, one expects that the primary amino acid sequence of a protein determines its structure, function, and evolutionary (SF&E) characteristics. Yet, there still is no reliable method for predicting the native state structure of a protein and its function given only its sequence. In addition, inferring the evolutionary relationships among highly divergent protein and/or rapidly evolving sequences is a daunting task. In general, when pairwise sequence alignments between protein sequences fall below Ϸ25% identity (i.e., the ''twilight zone''), the assignment of positional homology is so difficult that it becomes impossible to safely estimate phylogenetic relationships (1, 3, 4). However, a small number of conserved residues (Ϸ8% identity) can coordinate the 3-D fold and/or function of proteins (5-7). Conversely, two proteins that share 88% identity can still retain independent structure and function (8).The aforementioned studies point out that quantitatively measuring data spaces in the protein world (i.e., the sequence, structure, and functional space that proteins occupy) is a fundamental question facing evolutionary/computational biologists, with further questions arising. Is there any equation that quantitatively connects these protein spaces to protein evolution? Which residues within amino acid sequences best reflect the evolutionary history of a given protein? Do proteins with similar sequence and structure necessarily share a common ancestor? Furthermore, if sequence and structure similarity suggest an evolutionary history, can weak similarities be strengthened by functional connections? All of these questions are essentially connected to the protein data space; however, to date they have not been clearly solved either ...