2015
DOI: 10.1561/106.00000003
|View full text |Cite
|
Sign up to set email alerts
|

The Graph Structure in the Web – Analyzed on Different Aggregation Levels

Abstract: Knowledge about the general graph structure of the World Wide Web is important for understanding the social mechanisms that govern its growth, for designing ranking methods, for devising better crawling algorithms, and for creating accurate models of its structure. In this paper, we analyze a large web graph. The graph was extracted from a large publicly accessible web crawl that was gathered by the Common Crawl Foundation in 2012. The graph covers over 3.5 billion web pages and 128.7 billion hyperlinks. We an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

7
79
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 95 publications
(86 citation statements)
references
References 26 publications
7
79
0
Order By: Relevance
“…If the graph is already encoded, then loading is faster. We loaded the Hyperlink Graph [58], a graph with 128B edges, in about 13 hours (with the larger machine) and the database required 1.4TB of space.…”
Section: Scalability Updates and Bulk Loadingmentioning
confidence: 99%
“…If the graph is already encoded, then loading is faster. We loaded the Hyperlink Graph [58], a graph with 128B edges, in about 13 hours (with the larger machine) and the database required 1.4TB of space.…”
Section: Scalability Updates and Bulk Loadingmentioning
confidence: 99%
“…Suppose the number of nodes in a Web graph is n, then the height of the corresponding K 2 -tree will be h ¼ log k n d e. For operations such as forward or backward navigation on the Web graph, there will be lots of top-down traversals accompanied with backtracking processes. For example, on the data set CNR-2000 [14] of Web graph which has 325,557 nodes and 3,216,152 edges, we made an experiment to visit all the neighbors of a given node in it. We found that the K 2 -tree was recursively visited 450 times in average, before all the neighbors of a given node were found.…”
Section: Limitations Of K 2 -Tree In Representing Web Graphsmentioning
confidence: 99%
“…With the rapid growth of emerging applications like social network analysis, semantic Web analysis, bioinformatics network analysis, and Internet of Things, it is urgent to require the processing capability on large scale graphs with billions of vertices [8][9][10][11][12][13][14]. In order to analyze the structure, behavior and evolution of the World Wide Web, researchers often model the Web as a directed graph by treating Web pages as nodes and treating the links between pages as directed edges.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus the reduced matrix G R = G rr +G pr +G qr allows to obtain precise information about the group of proteins taking into account their environment given by the global network.The concept of reduced Google matrix G R was introduced in[19] on the basis of the following observation. At present directed networks of real systems can be very large (about 4.2 millions for the English Wikipedia edition in 2013[18] or 3.5 billion web pages for a publicly accessible web crawl that was gathered by the Common Crawl Foundation in 2012[38]). In certain cases one may be interested in the particular interactions among a small reduced subset of N r nodes with N r N instead of the interactions in the entire network.…”
mentioning
confidence: 99%