This paper presents aspects of a computational model of the morphology of Plains Cree based on the technology of finite state transducers (FST). The paper focuses in particular on the modeling of nominal morphology. Plains Cree is a polysynthetic language whose nominal morphology relies on prefixes, suffixes and circumfixes. The model of Plains Cree morphology is capable of handling these complex affixation patterns and the morphophonological alternations that they engender. Plains Cree is an endangered Algonquian language spoken in numerous communities across Canada. The language has no agreed upon standard orthography, and exhibits widespread variation. We describe problems encountered and solutions found, while contextualizing the endeavor in the description, documentation and revitalization of First Nations Languages in Canada.
Language communities and linguists conducting fieldwork often confront a lack of linguistic resources. This dearth can be substantially mitigated with the production of simple technologies. We illustrate the utility and design of a finite state parser, a widespread technology, for the Odawa dialect of Ojibwe (Algonquian, United States and Canada).
CreditsWe would like to thank Rand Valentine, Mary Ann Corbiere, Alan Corbiere, Lena Antonsen, Miikka Silfverberg, Ryan Johnson, Katie Schmirler, Sarah Giesbrecht, and Atticus Harrigan for fruitful discussions during the development of this tool. We would also like to thank two anonymous reviewers for their helpful comments.
Communities of lesser resourced languages like North Sámi benefit from language tools such as spell checkers and grammar checkers to improve literacy. Accurate error feedback is dependent on well-tokenised input, but traditional tokenisation as shallow preprocessing is inadequate to solve the challenges of real-world language usage. We present an alternative where tokenisation remains ambiguous until we have linguistic context information available. This lets us accurately detect sentence boundaries, multiwords and compound error detection. We describe a North Sámi grammar checker with such a tokenisation system, and show the results of its evaluation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.