The small alpine district of East Tyrol (Austria) has an exceptional demographic history. It was contemporaneously inhabited by members of the Romance, the Slavic and the Germanic language groups for centuries. Since the Late Middle Ages, however, the population of the principally agrarian-oriented area is solely Germanic speaking. Historic facts about East Tyrol's colonization are rare, but spatial density-distribution analysis based on the etymology of place-names has facilitated accurate spatial mapping of the various language groups' former settlement regions. To test for present-day Y chromosome population substructure, molecular genetic data were compared to the information attained by the linguistic analysis of pasture names. The linguistic data were used for subdividing East Tyrol into two regions of former Romance (A) and Slavic (B) settlement. Samples from 270 East Tyrolean men were genotyped for 17 Y-chromosomal microsatellites (Y-STRs) and 27 single nucleotide polymorphisms (Y-SNPs). Analysis of the probands' surnames revealed no evidence for spatial genetic structuring. Also, spatial autocorrelation analysis did not indicate significant correlation between genetic (Y-STR haplotypes) and geographic distance. Haplogroup R-M17 chromosomes, however, were absent in region A, but constituted one of the most frequent haplogroups in region B. The R-M343 (R1b) clade showed a marked and complementary frequency distribution pattern in these two regions. To further test East Tyrol's modern Y-chromosomal landscape for geographic patterning attributable to the early history of settlement in this alpine area, principal coordinates analysis was performed. The Y-STR haplotypes from region A clearly clustered with those of Romance reference populations and the samples from region B matched best with Germanic speaking reference populations. The combined use of onomastic and molecular genetic data revealed and mapped the marked structuring of the distribution of Y chromosomes in an alpine region that has been culturally homogeneous for centuries.
This paper outlines the construction of the corpus Alpenwort, a large, genre-based corpus of German texts on alpinism. We report on issues related to building the corpus from the Austrian Alpine Club Journal (1869–2010). First, a general description of our data and the project phases from digitization and annotation to publication is given. We focus on the most interesting challenges that the diverse layouts and the extensive use of Fraktur typefacing posed for optical layout recognition and optical character recognition (OCR) as well as post correction. The corrected data was lemmatized and annotated with part-of-speech information including named entities as well as TEI-conformant metadata. The resulting 19.9-million-word corpus is designed to be queried using CQPweb and Hyperbase and can be accessed freely online. Lastly, we give a short roadmap of current and future expansions and improvements as corpus data has been and is being enhanced in follow-up projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.