This paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian.
This paper empirically studies the impact of socioeconomic status and type of settlement on Chuvash language knowledge in the Chuvash Republic, Russia. In addition to presenting our survey results of 2,848 schoolchildren from September, 2012 to October, 2013, this research uses logit regressions to test the effect of social class, family income, parental education, rural origin, ethnicity, parental language proficiency, population size and distance to the capital city (Shupashkar/Cheboksary) on Chuvash language knowledge. In contrast to most of the previous literature, we do not analyze the effect of migration on language; the surveyed schoolchildren were usually born in Chuvashia, Russia. Our findings suggest that socioeconomic status, embodied principally in schoolchildren’s rural origin, has a negative impact on Chuvash knowledge. Schoolchildren living and studying in bigger towns and cities, and near the capital, are less likely to have a good Chuvash knowledge. These results are robust to different indicators of the key explanatory variables and econometric methods.
This paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.
Proposta rebuda el 15 d'octubre 2017 i acceptada per a publicació el 3 de desembre 2017.
Una eina per a una llengua en procés d'estandardització: el traductor automàtic català-sardMachine translation from Catalan to Sardinian: a translation tool for a language in the process of standardisation Resum Aquest article presenta el desenvolupament d'un sistema de traducció automàtica en codi obert basat en regles del català al sard mitjançant la plataforma Apertium, parant una atenció especial a la creació del diccionari bilingüe i de les regles de selecció lèxica i transferència estructural. Es mostren alguns problemes derivats de l'estat actual del sard estàndard. S'ha obtingut una tassa d'error per paraula (WER) del 20,5% i una tassa d'error per paraula independent de la posició (PER) del 13,9%. Mitjançant l'anàlisi qualitativa de la traducció de quatre articles enciclopèdics, s'analitzen les causes d'aquests resultats.
Paraules clausard, català, traducció automàtica, estandardització lingüística, Apertium, RBMT
AbstractThis article describes the development of a free/open-source rule-based machine translation system for Catalan to Sardinian based on the Apertium platform. Special attention is given to the components of the system related with transfer (structural and lexical) and lexical selection, drawing attention to issues stemming from the current state of the Sardinian written norm. The system has a word-error rate (WER) of 20.5% and a position-independent worderror rate (PER) of 13.9%. We analyse the remaining errors by doing a qualitative analysis of the translation of four articles from the encyclopaedic domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.