In this paper we consider the stochastic analysis of information ranking algorithms of large interconnected data sets, e.g. Google's PageRank algorithm for ranking pages on the World Wide Web. The stochastic formulation of the problem results in an equation of the form R d = Q + N i=1 C i R i , where N, Q, {R i } i≥1 , and {C, C i } i≥1 are independent nonnegative random variables, the {C, C i } i≥1 are identically distributed, and the {R i } i≥1 are independent copies of R; ' d =' stands for equality in distribution. We study the asymp-totic properties of the distribution of R that, in the context of PageRank, represents the frequencies of highly ranked pages. The preceding equation is interesting in its own right since it belongs to a more general class of weighted branching processes that have been found to be useful in the analysis of many other algorithms. Our first main result shows that if E N E[C α ] = 1, α > 0, and Q, N satisfy additional moment conditions, then R has a power law distribution of index α. This result is obtained using a new approach based on an extension of Goldie's (1991) implicit renewal theorem. Furthermore, when N is regularly varying of index α > 1, E N E[C α ] < 1, and Q, C have higher moments than α, then the distributions of R and N are tail equivalent. The latter result is derived via a novel sample path large deviation method for recursive random sums. Similarly, we characterize the situation when the distribution of R is determined by the tail of Q. The preceding approaches may be of independent interest, as they can be used for analyzing other functionals on trees. We also briefly discuss the engineering implications of our results.
This paper studies the distribution of a family of rankings, which includes Google's PageRank, on a directed configuration model. In particular, it is shown that the distribution of the rank of a randomly chosen node in the graph converges in distribution to a finite random variable scriptR* that can be written as a linear combination of i.i.d. copies of the attracting endogenous solution to a stochastic fixed‐point equation of the form R=scriptD∑i=1NscriptCiscriptRi+Q, where (Q,N,{scriptCi}) is a real‐valued vector with N∈{0,1,2,…}, P(|Q|>0)>0, and the {scriptRi} are i.i.d. copies of scriptR, independent of (Q,N,{scriptCi}). Moreover, we provide precise asymptotics for the limit scriptR*, which when the in‐degree distribution in the directed configuration model has a power law imply a power law distribution for scriptR* with the same exponent. © 2016 Wiley Periodicals, Inc. Random Struct. Alg., 51, 237–274, 2017
Given two distributions F and G on the nonnegative integers we propose an algorithm to construct in-and out-degree sequences from samples of i.i.d. observations from F and G, respectively, that with high probability will be graphical, that is, from which a simple directed graph can be drawn. We then analyze a directed version of the configuration model and show that, provided that F and G have finite variance, the probability of obtaining a simple graph is bounded away from zero as the number of nodes grows. We show that conditional on the resulting graph being simple, the in-and out-degree distributions are (approximately) F and G for large size graphs. Moreover, when the degree distributions have only finite mean we show that the elimination of self-loops and multiple edges does not significantly change the degree distributions in the resulting simple graph.1. Introduction. In order to study complex systems such as the World Wide Web (WWW) 1 or the Twitter network we propose a model for generating a simple directed random graph with prescribed degree distributions. The ability to match degree distributions to real graphs is perhaps the first characteristic one would desire from a model, and although several models that accomplish this for undirected graphs have been proposed in the recent literature [8,10,11,20], not much has been done for the directed case. In the WWW example that motivates this work, vertices represent webpages and the edges represent the links between them; for the Twitter graph vertices represent people and an edge from one vertex to another means that the first person is "following" the second. Empirical studies (e.g., [9,15]) suggest that both the in-degree and out-degree, number of links pointing to a page and the number of outbound links of a page, respectively, follow a power-law distribution, a characteristic often referred to as the scale-free property.
Consider distributional fixed point equations of the formwhere f (·) is a possibly random real valued function, N ∈ {0, 1, 2, 3, . . . }∪{∞}, {C i } N i=1 are real valued random weights and {R i } i≥1 are iid copies of R, independent of (N, C 1 , . . . , C N ); D = represents equality in distribution. Fixed point equations of this type are of utmost importance for solving many applied probability problems, ranging from the average case analysis of algorithms to statistical physics. We develop an Implicit Renewal Theorem that enables the characterization of the power tail behavior of the solutions R to many equations of multiplicative nature that fall into this category. This result extends the prior work in [16], which assumed nonnegative weights {C i }, to general real valued weights. We illustrate the developed theorem by deriving the power tail asymptotics of the solution R to the
Given two distributions F and G on the nonnegative integers we propose an algorithm to construct in-and out-degree sequences from samples of i.i.d. observations from F and G, respectively, that with high probability will be graphical, that is, from which a simple directed graph can be drawn. We then analyze a directed version of the configuration model and show that, provided that F and G have finite variance, the probability of obtaining a simple graph is bounded away from zero as the number of nodes grows. We show that conditional on the resulting graph being simple, the in-and out-degree distributions are (approximately) F and G for large size graphs. Moreover, when the degree distributions have only finite mean we show that the elimination of self-loops and multiple edges does not significantly change the degree distributions in the resulting simple graph.1. Introduction. In order to study complex systems such as the World Wide Web (WWW) 1 or the Twitter network we propose a model for generating a simple directed random graph with prescribed degree distributions. The ability to match degree distributions to real graphs is perhaps the first characteristic one would desire from a model, and although several models that accomplish this for undirected graphs have been proposed in the recent literature [8,10,11,20], not much has been done for the directed case. In the WWW example that motivates this work, vertices represent webpages and the edges represent the links between them; for the Twitter graph vertices represent people and an edge from one vertex to another means that the first person is "following" the second. Empirical studies (e.g., [9,15]) suggest that both the in-degree and out-degree, number of links pointing to a page and the number of outbound links of a page, respectively, follow a power-law distribution, a characteristic often referred to as the scale-free property.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.