From Xerox to Aspell: A First Prototype of a North Sámi Speller Based on TWOL Technology

Gaup, Børre; Moshagen, Sjur; Omma, Thomas; Palismaa, Maaren; Pieski, Tomi; Trosterud, Trond

doi:10.1007/11780885_37

Cited by 4 publications

(3 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In cases like this, the underlying linguistic data should be conceptualized, designed, and developed independently of the service components or algorithmic components, and then an arrangement can be negotiated by which the linguistic data are released freely but the algorithmic components remain closed. Morphological analyzers for some noncentral languages (Sámi, for instance) have been developed under this kind of licensing scheme: open-source lexica and rule sets combined with the closed-source Xerox Finite State Tools (Trosterud 2005;Gaup et al 2005). If none of these arrangements are negotiable, then one must proceed under the imposed conditions, but without any expectation that the data developed will be preserved in the long run.…”

Section: Licensingmentioning

confidence: 99%

Implementing NLP projects for noncentral languages: instructions for funding bodies, strategies for developers

Streiter

Scannell

Stuflesser

2006

Machine Translation

View full text Add to dashboard Cite

This research begins by distinguishing a small number of "central" languages from the "noncentral languages", where centrality is measured by the extent to which a given language is supported by natural language processing tools and research. We analyse the conditions under which noncentral language projects (NCLPs) and central language projects are conducted. We establish a number of important differences which have far-reaching consequences for NCLPs. In order to overcome the difficulties inherent in NCLPs, traditional research strategies have to be reconsidered. Successful styles of scientific cooperation, such as those found in open-source software development or in the development of the Wikipedia, provide alternative views of how NCLPs might be designed. We elaborate the concepts of free software and software pools and argue that NCLPs, in their own interests, should embrace an open-source approach for the resources they develop and pool these resources together with other similar open-source resources. The expected advantages of this approach are so important All trademarks are hereby acknowledged. 123 268 O. Streiter et al.that we suggest that funding organizations put it as sine qua non condition into project contracts.

show abstract

Section: Licensingmentioning

confidence: 99%

Implementing NLP projects for noncentral languages: instructions for funding bodies, strategies for developers

Streiter

Scannell

Stuflesser

2006

Machine Translation

View full text Add to dashboard Cite

show abstract

“…Finite-state language models have been used in spell-checking and correction for a while, one of the most recent approaches that is the basis of our system as well is Pirinen et al (2014). Within the Sámi language context, the work has been done from Gaup et al (2005) onwards.…”

Section: Earlier Workmentioning

confidence: 99%

You can’t suggest that?!

Kaalep¹,

Pirinen

Moshagen

2022

NLY

View full text Add to dashboard Cite

In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them.The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi.The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors.The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors.Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail.We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable.

show abstract

“…en nyere versjon av stavekontrollen fra 2007 14 , cf. also (Gaup et al 2006), og seks føringsgrammatikkmoduler, se figur 1.…”

Section: Regelbasert Metode (Gramdivvun)unclassified

Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp

Wiechetek¹,

Pirinen²,

Gaup³

et al. 2022

NLY

View full text Add to dashboard Cite

Machine learning is the dominating paradigm in natural language processing nowadays. It requires vast amounts of manually annotated or synthetically generated text data. In the GiellaLT infrastructure, on the other hand, we have worked with rule-based methods, where the linguistis have full control over the development the tools. In this article we uncover the myth of machine learning being cheaper than a rule- based approach by showing how much work there is behind data generation, either via corpus annotation or creating tools that automatically mark-up the corpus. Earlier we have shown that the correction of grammatical errors, in particular compound errors, benefit from hybrid methods. Agreement errors, on the other other hand, are to a higher degree dependent on the larger grammatical context. Our experiments show that machine learning methods for this error type, even when supplemented by rule-based methods generating massive data, can not compete with the state-of-the-art rule-based approach.

show abstract

From Xerox to Aspell: A First Prototype of a North Sámi Speller Based on TWOL Technology

Cited by 4 publications

References 1 publication

Implementing NLP projects for noncentral languages: instructions for funding bodies, strategies for developers

Implementing NLP projects for noncentral languages: instructions for funding bodies, strategies for developers

You can’t suggest that?!

Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp

Contact Info

Product

Resources

About

From Xerox to Aspell: A First Prototype of a North Sámi Speller Based on TWOL Technology

Cited by 4 publications

References 1 publication

Implementing NLP projects for noncentral languages: instructions for funding bodies, strategies for developers

Implementing NLP projects for noncentral languages: instructions for funding bodies, strategies for developers

You can’t suggest that?!

Mii *eai leat gal vuollánan – Vi *ha neimen ikke gitt opp

Contact Info

Product

Resources

About

Mii eai leat gal vuollánan – Vi ha neimen ikke gitt opp