2018
DOI: 10.1103/physreve.98.012315
|View full text |Cite
|
Sign up to set email alerts
|

Zipf and Heaps laws from dependency structures in component systems

Abstract: Complex natural and technological systems can be considered, on a coarse-grained level, as assemblies of elementary components: for example, genomes as sets of genes or texts as sets of words. On one hand, the joint occurrence of components emerges from architectural and specific constraints in such systems. On the other hand, general regularities may unify different systems, such as the broadly studied Zipf and Heaps laws, respectively concerning the distribution of component frequencies and their number as a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
22
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 26 publications
(22 citation statements)
references
References 49 publications
0
22
0
Order By: Relevance
“…Similarly, the theoretical relation that is often used in linguistics to connect Zipf's law and Heaps' law is based on an equivalent random sampling framework [33][34][35][36]. Interestingly, also when these statistical patterns are generated with more complex models explicitly based on networks of component dependencies [23,24], thus with a strong intrinsic correlation structure, they do not significantly deviate from the random sampling prediction [24]. This surprising phenomenology suggests that average statistical laws, such as Zipf's and Heaps' laws, do not contain enough information about the microscopic dynamics to clearly distinguish between alternative generative mechanisms.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Similarly, the theoretical relation that is often used in linguistics to connect Zipf's law and Heaps' law is based on an equivalent random sampling framework [33][34][35][36]. Interestingly, also when these statistical patterns are generated with more complex models explicitly based on networks of component dependencies [23,24], thus with a strong intrinsic correlation structure, they do not significantly deviate from the random sampling prediction [24]. This surprising phenomenology suggests that average statistical laws, such as Zipf's and Heaps' laws, do not contain enough information about the microscopic dynamics to clearly distinguish between alternative generative mechanisms.…”
Section: Discussionmentioning
confidence: 99%
“…This surprising phenomenology suggests that average statistical laws, such as Zipf's and Heaps' laws, do not contain enough information about the microscopic dynamics to clearly distinguish between alternative generative mechanisms. High-order statistical observables, such as two-point correlations between components [24] or fluctuation scalings [52] could thus be necessary to actually select the more appropriate model for a given empirical component system.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Introduction comparison method and its parameters. For example in k-mer based methods, the 48 choice of k determines sensitivity and precision of the classification, such that sensitivity 49 increases and precision decreases with increasing values for k, and vice versa. As we will 50 show, false positive predictions often need to be corrected heuristically by removing all 51 species/taxa with abundance below a given arbitrary threshold (see Materials and 52 Methods section for an overview on different algorithms of taxonomy classification).…”
mentioning
confidence: 99%
“…Our primary goal was to find the so-called 183 core families [47], i.e. the protein domains which are present in the overwhelming 184 majority of the bacterium proteomes but occurring just few times in each of 185 them [40,48]. In order to analyze the occurrences of PFAM in proteomes, we converted 186…”
mentioning
confidence: 99%