Inferring Web communities from link topology

Gibson, David; Kleinberg, Jon; Raghavan, Prabhakar

doi:10.1145/276627.276652

Cited by 526 publications

(308 citation statements)

References 16 publications

Supporting

Mentioning

303

Contrasting

Unclassified

Order By: Relevance

“…The first was HITS, which is described in [11], and offers some enlightening practical remarks. The ARC system, described in [7], augments Kleinberg's link-structure analysis by considering also the anchor text, the text which surrounds the hyperlink in the pointing page.…”

Section: Introductionmentioning

confidence: 99%

The stochastic approach for link-structure analysis (SALSA) and the TKC effect

2000

View full text Add to dashboard Cite

This article is being produced via complete electronic text/image integration, to enable full database storage. It has therefore to meet specific standards regarding structure and presentation.This proof is an automatically produced, low-resolution laser-printer output. The final product will naturally meet our high-quality standards regarding resolution and page lay-out.Please use this proof solely for checking of typesetting/editing and completeness of text and figures. Changes to the article as accepted for publication will not be considered at this stage. Elsevier Science N.L. Computer Networks 00 (2000) COMPNW2288The stochastic approach for link-structure analysis (SALSA) and the TKC effect 1 R. Lempel Ł , S. Moran Department of Computer Science, The Technion, Haifa 32000, Israel AbstractToday, when searching for information on the World Wide Web, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web sites whose contents match the query. For broad topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the World Wide Web. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web sites: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he devised an algorithm aimed at finding authoritative sites. We present SALSA, a new stochastic approach for link structure analysis, which examines random walks on graphs derived from the link structure. We show that both SALSA and Kleinberg's mutual reinforcement approach employ the same meta-algorithm. We then prove that SALSA is equivalent to a weighted in-degree analysis of the link-structure of World Wide Web subgraphs, making it computationally more efficient than the mutual reinforcement approach. We compare the results of applying SALSA to the results derived through Kleinberg's approach. These comparisons reveal a topological phenomenon called the TKC effect (Tightly Knit Community) which, in certain cases, prevents the mutual reinforcement approach from identifying meaningful authorities. 

show abstract

Section: Introductionmentioning

confidence: 99%

The stochastic approach for link-structure analysis (SALSA) and the TKC effect

2000

View full text Add to dashboard Cite

show abstract

“…These vectors are the principal eigenvectors of the matrices AA T and A T A. Work in [54,77] has shown that the concepts of hubs and authorities is a fundamental structural feature of the web. The CLEVER system [29] builds on the algorithmic framework of hub and authorities.…”

Section: Mining the Webmentioning

confidence: 99%

On Approximation Algorithms for Data Mining Applications

Afrati

2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…An interesting problem associated with the Web is the definition and delineation of so called Web communities [9], [10], [11], [12], [13], [14]. A web community is loosely defined to be a collection of content creators that share a common interest or topic and manifests itself as a highly interconnected aggregate or subgraph [9].…”

Section: Web Communitiesmentioning

confidence: 99%

“…In order to coalesce concepts A, C, D and the concept labelled I, we need to introduce edges (4, 2) and (5, 9) to relation I in order to make the bipartite graph defined by {1, 3, 4, 5, 7}, {2, 6, 9} complete. To coalesce concepts B, E, F, and H, we need to add (2, 12), (6,11) and (8,11) to the original relation to make the bipartite graph defined by {2, 6, 8, 9}, {10, 11, 12} complete. …”

Section: A Simple Examplementioning

confidence: 99%

Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web

Rome

Haralick

2005

Formal Concept Analysis

View full text Add to dashboard Cite

Abstract. An interesting problem associated with the World Wide Web (Web) is the definition and delineation of so called Web communities. The Web can be characterized as a directed graph whose nodes represent Web pages and whose edges represent hyperlinks. An authority is a page that is linked to by high quality hubs, while a hub is a page that links to high quality authorities. A Web community is a highly interconnected aggregate of hubs and authorities. We define a community core to be a maximally connected bipartite subgraph of the Web graph.We observe that the web subgraph can be viewed as a formal context and that web communities can be modeled by formal concepts. Additionally, the notions of hub and authority are captured by the extent and intent, respectively, of a concept. Though Formal Concept Analysis (FCA) has previously been applied to the Web, none of the FCA based approaches that we are aware of consider the link structure of the Web pages. We utilize notions from FCA to explore the community structure of the Web graph. We discuss the problem of utilizing this structure to locate and organize communities in the form of a knowledge base built from the resulting concept lattice and discuss methods to reduce the complexity of the knowledge base by coalescing similar Web communities. We present preliminary experimental results obtained from real Web data that demonstrate the usefulness of FCA for improving Web search.

show abstract

Inferring Web communities from link topology

Cited by 526 publications

References 16 publications

The stochastic approach for link-structure analysis (SALSA) and the TKC effect

The stochastic approach for link-structure analysis (SALSA) and the TKC effect

On Approximation Algorithms for Data Mining Applications

Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web

Contact Info

Product

Resources

About