Inside PageRank

Bianchini, Monica; Gori, Marco; Scarselli, Franco

doi:10.1145/1052934.1052938

Cited by 414 publications

(259 citation statements)

References 18 publications

Supporting

Mentioning

250

Contrasting

Unclassified

Order By: Relevance

“…A set of hyperlinked webpages can be formally represented as a graph where pages are nodes and hyperlinks are the directed arcs of the graph. PageRank is an algorithm that attaches a score denoting a degree of page authority to a website based only on the topology of web connectivity (Bianchini 2005).…”

Section: Construction Of the Affect Lexiconmentioning

confidence: 99%

Inferring Group Processes from Computer-Mediated Affective Text Analysis

Schryver¹,

Begoli²,

Jose

et al. 2011

View full text Add to dashboard Cite

Section: Construction Of the Affect Lexiconmentioning

confidence: 99%

Inferring Group Processes from Computer-Mediated Affective Text Analysis

Schryver¹,

Begoli²,

Jose

et al. 2011

View full text Add to dashboard Cite

“…2 A second reason why inverse PageRank is a heuristic is that maximizing coverage may not always be the best strategy. To illustrate, let us propagate trust via splitting, without any dampening.…”

Section: Inverse Pagerankmentioning

confidence: 99%

“…Yet, it may be 2 The general problem of identifying the minimal set of pages that yields maximum coverage is equivalent to the independent set problem [7] on directed graphs as shown next. The web graph can be transformed in a directed graph G = (V, E ), where an edge (p, q) ∈ E signals that page q can be reached from page p. We argue that such transformation does not change the complexity class of the algorithm, since it involves breadth-first search that has polynomial execution time.…”

Section: High Pagerankmentioning

confidence: 99%

Combating Web Spam with TrustRank

GYONGYI

GARCIAMOLINA

PEDERSEN

2004

Proceedings 2004 VLDB Conference

243

247

View full text Add to dashboard Cite

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

show abstract

“…where the dimensions of the problem could be much smaller than in the case of web matrices; see, for example, [1,18,17,19,22] and [14] where quite small matrices (around 10 or 20) are involved. In these applications, the matrix could be reducible, thus leading to the same problem as with Google's matrix, and a kind of regularization by a parameter c followed by an extrapolation procedure could be of interest.…”

Section: Numerical Experimentsmentioning

confidence: 99%

Rational extrapolation for the PageRank vector

Brezinski

Redivo–Zaglia

2008

Math. Comp.

View full text Add to dashboard Cite

Abstract. An important problem in web search is to determine the importance of each page. From the mathematical point of view, this problem consists in finding the nonnegative left eigenvector of a matrix corresponding to its dominant eigenvalue 1. Since this matrix is neither stochastic nor irreducible, the power method has convergence problems. So, the matrix is replaced by a convex combination, depending on a parameter c, with a rank one matrix. Its left principal eigenvector now depends on c, and it is the PageRank vector we are looking for. However, when c is close to 1, the problem is ill-conditioned, and the power method converges slowly. So, the idea developed in this paper consists in computing the PageRank vector for several values of c, and then to extrapolate them, by a conveniently chosen rational function, at a point near 1. The choice of this extrapolating function is based on the mathematical expression of the PageRank vector as a function of c. Numerical experiments end the paper. The problemThe mathematical problem behind web search is the computation of the nonnegative left eigenvector of a p × p matrix P corresponding to its dominant eigenvalue 1, where p is the number of pages in Google (8.06 billion at the end of March 2005). Since P is not stochastic (some rows of P may contain only zeros due to the so-called dangling nodes), it is replaced by the matrix P = P + dw T with w ∈ R p a probability vector, that is, such that w ≥ 0 and (w, e) = 1 with e = (1, . . . , 1) T , and d = (d i ) ∈ R p the vector with d i = 1 if deg(i) = 0, and 0 otherwise, where deg(i) is the outdegree of the page i, that is, the number of pages it points to.Since the matrix P is not irreducible, it is replaced by the matrixwhere c is a parameter between 0 and 1, and E = ev T with e = (1, . . . , 1) T ∈ R p and v is a probability vector. Such a modification of the matrix corresponds to adding to all pages a new set of outgoing transitions with small probabilities. The probability distribution given by the vector v can differ from a uniformly distributed vector, and the resultant PageRank can be biased to give preference to certain kinds

show abstract

Inside PageRank

Cited by 414 publications

References 18 publications

Inferring Group Processes from Computer-Mediated Affective Text Analysis

Inferring Group Processes from Computer-Mediated Affective Text Analysis

Combating Web Spam with TrustRank

Rational extrapolation for the PageRank vector

Contact Info

Product

Resources

About