Building and Using Comparable Corpora 2013
DOI: 10.1007/978-3-642-20128-8_8
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Corpus and Language Comparison on Comparable Corpora

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 3 publications
0
14
0
Order By: Relevance
“…An item (here, a word) x i has the probability P ( x i | x i −( n − 1) , …, x i −1 ). The text database was extracted from the Leipzig Corpora Collection, a ready to use corpora [33]. It contains a word frequency list as well as a word bi-grams list (co-occurrences as next neighbors) containing observed frequency counts, which were generated from approximately 1 million sentences publicly accessible.…”
Section: Methodsmentioning
confidence: 99%
“…An item (here, a word) x i has the probability P ( x i | x i −( n − 1) , …, x i −1 ). The text database was extracted from the Leipzig Corpora Collection, a ready to use corpora [33]. It contains a word frequency list as well as a word bi-grams list (co-occurrences as next neighbors) containing observed frequency counts, which were generated from approximately 1 million sentences publicly accessible.…”
Section: Methodsmentioning
confidence: 99%
“…The study is based on three sub-corpora generated in accordance with a set of common criteria: language, genre and size. As the corpora of differ-ent languages but similar genre allow language comparison (Eckart and Quasthoff, 2010), the compiled sub-corpora are of a similar size and composition and with matching structural features. Each sub-corpus contains the original texts in a specialised narrow domain of mechanical engineering, in particular authentic and highly regarded educational textbooks in three different languages and the Croatian National Termbank STRUNA (Table 1).…”
Section: Methodsmentioning
confidence: 99%
“…The prediction model was implemented using a frequency list and a bi-gram list from the Leipzig Corpora Collection 48 , which were based on approximately 1 million English sentences. After each selection, the www.nature.com/scientificreports/ suggestions were retrieved via structured query language (SQL).…”
Section: Spellersmentioning
confidence: 99%