2020
DOI: 10.48550/arxiv.2012.00742
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Spectral Analysis of Word Statistics

Abstract: Given a random text over a finite alphabet, we study the frequencies at which fixed-length words occur as subsequences. As the data size grows, the joint distribution of word counts exhibits a rich asymptotic structure. We investigate all linear combinations of subword statistics, and fully characterize their different orders of magnitude using diverse algebraic tools.Moreover, we establish the spectral decomposition of the space of word statistics of each order. We provide explicit formulas for the eigenvecto… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 14 publications
0
2
0
Order By: Relevance
“…In Appendix A we apply these general results to string matching and give a rather detailed treatment of the degenerate cases of linear combinations of unconstrained subsequence counts. See also [12] for further algebraic aspects of both non-degenerate and degenerate cases. Problem 9.2.…”
Section: Constrained Pattern Matching In Wordsmentioning
confidence: 99%
See 1 more Smart Citation
“…In Appendix A we apply these general results to string matching and give a rather detailed treatment of the degenerate cases of linear combinations of unconstrained subsequence counts. See also [12] for further algebraic aspects of both non-degenerate and degenerate cases. Problem 9.2.…”
Section: Constrained Pattern Matching In Wordsmentioning
confidence: 99%
“…(This case is somewhat simpler than the general case since we only have to consider finite-dimensional vector spaces below, but otherwise the general case is similar.) See also [12], which contains a much deeper algebraic study of the asymptotic variance σ 2 (f ) and the vector spaces below, and in particular a spectral decomposition that refines (A.9).…”
mentioning
confidence: 99%