This paper uses a novel data-driven probabilistic approach to address the centuryold Inner-Outer hypothesis of Indo-Aryan. I develop a Bayesian hierarchical mixedmembership model to assess the validity of this hypothesis using a large data set of automatically extracted sound changes operating between Old Indo-Aryan and Modern Indo-Aryan speech varieties. I employ different prior distributions in order to model sound change, one of which, the logistic normal distribution, has not received much attention in linguistics outside of Natural Language Processing, despite its many attractive features. I find evidence for cohesive dialect groups that have made their imprint on contemporary Indo-Aryan languages, and find that when a logistic normal prior is used, the distribution of dialect components across languages is largely compatible with a core-periphery pattern similar to that proposed under the Inner-Outer hypothesis. like the ones mentioned above (cf. Bloomfield 1933:360). However, it is not always clear how to proceed in the face of substantial irregularity and uncertainty. Additionally, it can be difficult to distinguish between shared genetic innovations and parallel developments, though quantitative methods have helped in assessing whether the sharing of multiple features across languages is more likely than chance.This paper seeks to enhance the comparative method with probabilistic tools in order to address an unresolved hypothesis regarding the history of the Indo-Aryan languages. I employ a data-driven Bayesian methodology in order to uncover shared dialectal patterns across a subset of these languages. In particular, I attempt to operationalize an old hypothesis (the so-called Inner-Outer hypothesis) that two large dialect groups existed, and that communication within these groups was greater historically than communication between them. The admixture (alternatively, mixed-membership) model used in this paper serves to reduce the dimensionality of a large set of linguistic features (consisting of sound changes operating between Old Indo-Aryan [OIA, i.e., Sanskrit] and modern Indo-Aryan languages) to two dimensions corresponding to two latent dialectal components, making it possible to assess whether individual languages draw most of their features from one of the two groups, or receive features uniformly from both. Additionally, I evaluate the degree to which the Inner-Outer hypothesis is recapitulated in the language-level distribution over component makeup. I find evidence for two dialect groups of considerable cohesion and integrity, and find partial evidence for the Inner-Outer hypothesis in the language-level distribution of dialect components. This paper concludes with a discussion of the implications of this model and future research directions.
2.