Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management 2020
DOI: 10.1145/3340531.3412758
|View full text |Cite
|
Sign up to set email alerts
|

Little Ball of Fur

Abstract: Sampling graphs is an important task in data mining. In this paper, we describe Little Ball of Fur a Python library that includes more than twenty graph sampling algorithms. Our goal is to make node, edge, and exploration-based network sampling techniques accessible to a large number of professionals, researchers, and students in a single streamlined framework. We created this framework with a focus on a coherent application public interface which has a convenient design, generic input data requirements, and r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 27 publications
(10 citation statements)
references
References 40 publications
0
9
0
Order By: Relevance
“…For a budget of n nodes in a sample, RN selects n random nodes from 饾憠, while RE selects enough edges from 饾惛 until the number of unique endpoints of edges (nodes) equals n (Ribeiro &Towsley, 2010). However, large networks seldom have all nodes and edges initially accessible or at least feasibly reachable (Rozemberczki et al, 2020b). A common reason is saving extremely large networks in relatively slow-acess storage mediums.Furthermore, organizations use systems with limited memory and therefore can only load and view a small proportion of a stored network.…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…For a budget of n nodes in a sample, RN selects n random nodes from 饾憠, while RE selects enough edges from 饾惛 until the number of unique endpoints of edges (nodes) equals n (Ribeiro &Towsley, 2010). However, large networks seldom have all nodes and edges initially accessible or at least feasibly reachable (Rozemberczki et al, 2020b). A common reason is saving extremely large networks in relatively slow-acess storage mediums.Furthermore, organizations use systems with limited memory and therefore can only load and view a small proportion of a stored network.…”
Section: Introductionmentioning
confidence: 99%
“…The quality of sampled nodes can be evaluated by a metric that summarizes connectivity, clustering, degrees, or other characteristics of the sample. Many algorithms follow this general heuristic of starting at a single node 饾憼 and iteratively adding unvisited nodes 饾懀 in 饾憠 adjacent to nodes already in the sample (Rozemberczki et al, 2020b). To compare different exploration-based sampling algorithms, establishing some time or resource bound 饾惖 per sample creates an even field for comparisons.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations