Can language relatedness be established without cognate words? This question has remained unresolved since the nineteenth century, leaving language prehistory beyond etymologically established families largely undefined. We address this problem through a theory of universal syntactic characters. We show that not only does syntax allow for comparison across distinct traditional language families, but that the probability of deeper historical relatedness between such families can be statistically tested through a dedicated algorithm which implements the concept of ‘possible languages’ suggested by a formal syntactic theory. Controversial clusters such as e.g. Altaic and Uralo-Altaic are significantly supported by our test, while other possible macro-groupings, e.g. Indo-Uralic or Basque-(Northeast) Caucasian, prove to be indistinguishable from a randomly generated distribution of language distances. These results suggest that syntactic diversity, modelled through a generative biolinguistic framework, can be used to provide a proof of historical relationship between different families irrespectively of the presence of a common lexicon from which regular sound correspondences can be determined; therefore, we argue that syntax may expand the time limits imposed by the classical comparative method.
This article is part of the theme issue ‘Reconstructing prehistoric languages’.