Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2022
DOI: 10.18653/v1/2022.acl-short.20
|View full text |Cite
|
Sign up to set email alerts
|

Estimating the Entropy of Linguistic Distributions

Abstract: Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language. However, entropy must typically be estimated from observed data because researchers do not have access to the underlying probability distribution that gives rise to these data. While entropy estimation is a well-studied problem in other fields, there is not yet a comprehensive exploration of the efficacy of entropy estimators for use with linguistic data. In this work, we fill this void, studying… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 5 publications
0
2
0
Order By: Relevance
“…We note that some recent work (Meister et al, 2021;Arora et al, 2022) has also described estimating statistics of random variables induced by language models. This involved sampling without replacement and using importance weighting, but they seem to surpass Monte-Carlo estimates only with many samples and peaky distributions.…”
Section: Likelihoodmentioning
confidence: 97%
“…We note that some recent work (Meister et al, 2021;Arora et al, 2022) has also described estimating statistics of random variables induced by language models. This involved sampling without replacement and using importance weighting, but they seem to surpass Monte-Carlo estimates only with many samples and peaky distributions.…”
Section: Likelihoodmentioning
confidence: 97%
“…Precisely, the previous literature has largely dealt with entropy estimators proposed for sequences of i.i.d. random variables [ 16 , 18 , 19 , 20 , 21 ]. However, it is not clear that real data arising from experimental observation can be described with i.i.d.…”
Section: Introductionmentioning
confidence: 99%