While efforts to document endangered languages have steadily increased, the phonetic analysis of endangered language data remains a challenge. The transcription of large documentation corpora is, by itself, a tremendous feat. Yet, the process of segmentation remains a bottleneck for research with data of this kind. This paper examines whether a speech processing tool, forced alignment, can facilitate the segmentation task for small data sets, even when the target language differs from the training language. The authors also examined whether a phone set with contextualization outperforms a more general one. The accuracy of two forced aligners trained on English (HMALIGN and P2FA) was assessed using corpus data from Yolox ochitl Mixtec. Overall, agreement performance was relatively good, with accuracy at 70.9% within 30 ms for HMALIGN and 65.7% within 30 ms for P2FA. Segmental and tonal categories influenced accuracy as well. For instance, additional stop allophones in HMALIGN's phone set aided alignment accuracy. Agreement differences between aligners also corresponded closely with the types of data on which the aligners were trained. Overall, using existing alignment systems was found to have potential for making phonetic analysis of small corpora more efficient, with more allophonic phone sets providing better agreement than general ones.
Transcription bottlenecks", created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, eschews linguistic resources but is instead more dependent on large-data settings. We open source a Yoloxóchitl Mixtec EL corpus. First, we review our method in building an end-to-end ASR system in a way that would be reproducible by the ASR community. We then propose a novice transcription correction task and demonstrate how ASR systems and novice transcribers can work together to improve EL documentation. We believe this combinatory methodology would mitigate the transcription bottleneck and transcriber shortage that hinders EL documentation.
While Mixtec languages are well-known for their tonal systems, there remains relatively little work focusing on their consonant inventories. This paper provides an in-depth phonetic description of the consonant system of the Yoloxóchitl Mixtec language (Oto-Manguean, ISO 639-3 code xty), a Guerrero Mixtec variety. The language possesses a number of contrasts common among Mixtec languages, such as voiceless unaspirated stops, prenasalized stops, and a strong tendency for words to conform to a minimally bimoraic structure. Using a controlled set of data, we focus on how WORD SIZE and WORD POSITION influence the acoustic properties of different consonant types. We examine closure duration, VOT, and formant transitions with the stop series, spectral moments with the fricative series, the timing between oral and nasal closure with the prenasalized stop series, and both formant transitions and qualitative variability with the glide series. The general effect of WORD SIZE is discussed in relation to work on POLYSYLLABIC SHORTENING
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.