Proceedings of the 16th Meeting on the Mathematics of Language 2019
DOI: 10.18653/v1/w19-5710
|View full text |Cite
|
Sign up to set email alerts
|

Sentence Length

Abstract: The distribution of sentence length in ordinary language is not well captured by the existing models. Here we survey previous models of sentence length and present our random walk model that offers both a better fit with the data and a better understanding of the distribution. We develop a generalization of KL divergence, discuss measuring the noise inherent in a corpus, and present a hyperparameter-free Bayesian model comparison method that has strong conceptual ties to Minimal Description Length modeling. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Average sentence length can be used as a measure of grammatical complexity based on the assumption that longer sentence has a more complex syntactic and semantic structure than shorter sentences. It also shows richness and descriptiveness of sentences in the corpus [19]- [21].…”
Section: Average Sentence Lengthmentioning
confidence: 99%
“…Average sentence length can be used as a measure of grammatical complexity based on the assumption that longer sentence has a more complex syntactic and semantic structure than shorter sentences. It also shows richness and descriptiveness of sentences in the corpus [19]- [21].…”
Section: Average Sentence Lengthmentioning
confidence: 99%
“…According to the sentence length distribution of various language corpora, a well-written sentence contains 15-20 words on average [7]. The average sequence length of a SMILES string is typically 3 times longer than a natural language, whereas the token space is at least 1000 times smaller than any developed language [8].…”
Section: Introductionmentioning
confidence: 99%