This research begins by distinguishing a small number of "central" languages from the "noncentral languages", where centrality is measured by the extent to which a given language is supported by natural language processing tools and research. We analyse the conditions under which noncentral language projects (NCLPs) and central language projects are conducted. We establish a number of important differences which have far-reaching consequences for NCLPs. In order to overcome the difficulties inherent in NCLPs, traditional research strategies have to be reconsidered. Successful styles of scientific cooperation, such as those found in open-source software development or in the development of the Wikipedia, provide alternative views of how NCLPs might be designed. We elaborate the concepts of free software and software pools and argue that NCLPs, in their own interests, should embrace an open-source approach for the resources they develop and pool these resources together with other similar open-source resources. The expected advantages of this approach are so important All trademarks are hereby acknowledged. 123 268 O. Streiter et al.that we suggest that funding organizations put it as sine qua non condition into project contracts.
In this study we illustrate how data sets, defined and set up independently in digital archive projects, can be linked to mutually enrich each other. The data linked are the digital tombstone archive ThakBong and the visionary use of the 1956 census by Chen and Fried 1 , published in 1968 as 'The Distribution of Family Names in Taiwan'. We explain, through which assumptions the dimensions of the two data sets can be mapped, so that values missing in one set can be completed by the second or, estimated values can be replaced by more reliable values. Conflicting values in both set trigger hypotheses concerning the validity of the data sets.introduction Visionary research allows for applications that few people had in mind at the time the research was conducted. When Chen and Fried worked in 1964 through millions of ten-year-old census sheets and produced a book filled on 969 pages only with numbers, who but the researchers might not have thought of a waste of time and money? Fifty years later, using a relational database and an intelligent
Abstract-We present a new paradigm for Computer Assisted Language Learning (CALL). This paradigm aims at the development of linguistically and pedagogically competent Web-browsers for autonomous exploration of a second language (L2). We introduce Gymn@zilla, a browser-like application which converts Web-pages automatically into language lessons. Gymn@zilla combines annotated reading of Web-pages with exploration tools and the possibility to create personal wordlists and practice them in dynamically created exercises. The relevance and potentials of theses components for the acquisition of a second language are discussed. Possible usage scenarios range from individual language learning over school and university classes to daily working scenarios in non-native speaking environments.
The research described in this paper is rooted in the endeavors to combine the advantages of corpus-based and rule-based MT approaches in order to improve the performance of MT systems—most importantly, the quality of translation. The authors review the ongoing activities in the field and present a case study, which shows how translation knowledge can be drawn from parallel corpora and compiled into the lexicon of a rule-based MT system. These data are obtained with the help of three procedures: (1) identification of hence unknown one-word translations, (2) statistical rating of the known one-word translations, and (3) extraction of new translations of multiword expressions (MWEs) followed by compilation steps which create new rules for the MT engine. As a result, the lexicon is enriched with translation equivalents attested for different subject domains, which facilitates the tuning of the MT system to a specific subject domain and improves the quality and adequacy of translation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.