This article focuses on the variability of one of the subtypes of multiword expressions, namely those consisting of a verb and a particle or a verb and its complement(s). We build on evidence from Estonian, an agglutinative language with free word order, analysing the behaviour of verbal multi-word expressions (opaque and transparent idioms, support verb constructions and particle verbs). Using this data we analyse such phenomena as the order of the components of a multi-word expression, lexical substitution and morphosyntactic flexibility.
Ülevaade. Artikkel räägib püsiühendite automaattöötlusest arvutilingvistikas. Püsiühendi all mõeldakse siin kahe või enama sõna(vormi) ühendit, mida mingi tähenduse väljendamiseks on tavaks koos kasutada; selle de nitsiooni alla mahuvad nii idiomaatilised kui ka kollokatiivsed ühendid. Arvutilingvistikas on püsiühendid probleemiks, sest nad komplitseerivad teksti alt-üles analüüsimudelit, mille järgi lause struktuuri ja tähenduse ehituskiviks on üksiksõna. Artikkel annab üle-vaate püsiühendite automaattöötluse kolmest etapist -püsiühendite tuvastamisest, nende leksikoni koostamisest ja püsiühendite märgen-damisest tekstis. Nende ülesannete lahendamiseks on arvutilingvistikas välja töötatud tüüpilised meetodid, kuid need meetodid on eesti keele kui vaba sõnajärjega morfoloogiliselt keeruka keele analüüsil rakendatavad ainult teatud reservatsioonide ja modi katsioonidega. Artiklis analüüsitaksegi eesti keele "erivajadusi" selles vallas.* Võtmesõnad: arvutilingvistika, püsiühendid, püsiühendite tuvastamine, püsiühendite leksikon, püsiühendite märgendamine, eesti keel
In this article, we study correction of spelling errors, specifically on how the spelling errors are made and how can we model them computationally in order to fix them.The article describes two different approaches to generating spelling correction suggestions for three Uralic languages: Estonian, North Sámi and South Sámi.The first approach of modelling spelling errors is rule-based, where experts write rules that describe the kind of errors are made, and these are compiled into finite-state automaton that models the errors.The second is data-based, where we show a machine learning algorithm a corpus of errors that humans have made, and it creates a neural network that can model the errors.Both approaches require collection of error corpora and understanding its contents; therefore we also describe the actual errors we have seen in detail.We find that while both approaches create error correction systems, with current resources the expert-build systems are still more reliable.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.