2018
DOI: 10.1017/s1351324917000481
|View full text |Cite
|
Sign up to set email alerts
|

Measuring bilingual corpus comparability

Abstract: Comparable corpora serve as an important substitute for parallel resources in cases of under-resourced language pairs. Previous work mostly aims to find a better strategy to exploit existing comparable corpora, while ignoring the variety in corpus quality. The quality of comparable corpora affects a lot its usability in practice, a fact that has been justified by several studies. However, researchers have not been able to establish a widely accepted and fully validated framework to measure corpus quality. We w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…This research compared the results of clustering and topic modelling as ways to represent a corpus content using keywords statistics. In Li et al (2018), the authors compared the performance of several bilingual vocabulary overlap measures on a specifically designed corpus with known comparability levels and found that frequencies of words with a simple Presence/Absence weighting scheme outperformed other approaches.…”
Section: Related Researchmentioning
confidence: 99%
See 1 more Smart Citation
“…This research compared the results of clustering and topic modelling as ways to represent a corpus content using keywords statistics. In Li et al (2018), the authors compared the performance of several bilingual vocabulary overlap measures on a specifically designed corpus with known comparability levels and found that frequencies of words with a simple Presence/Absence weighting scheme outperformed other approaches.…”
Section: Related Researchmentioning
confidence: 99%
“…One of the common approaches to corpus comparability is to define it as the domain similarity and to rely on the vocabulary overlap as the measure of comparability. A brief summary of possible interpretations of the concept and comparability measures can be found in Li et al (2018). The authors give a domain-based definition to crosslinguistically comparable corpora: "document sets in different languages that cover similar topics".…”
Section: Introductionmentioning
confidence: 99%