It is common in the study of networks to investigate intermediate-sized (or “meso-scale”) features to try to gain an understanding of network structure and function. For example, numerous algorithms have been developed to try to identify “communities,” which are typically construed as sets of nodes with denser connections internally than with the remainder of a network. In this paper, we adopt a complementary perspective that “communities” are associated with bottlenecks of locally-biased dynamical processes that begin at seed sets of nodes, and we employ several different community-identification procedures (using diffusion-based and geodesic-based dynamics) to investigate community quality as a function of community size. Using several empirical and synthetic networks, we identify several distinct scenarios for “size-resolved community structure” that can arise in real (and realistic) networks: (i) the best small groups of nodes can be better than the best large groups (for a given formulation of the idea of a good community); (ii) the best small groups can have a quality that is comparable to the best medium-sized and large groups; and (iii) the best small groups of nodes can be worse than the best large groups. As we discuss in detail, which of these three cases holds for a given network can make an enormous difference when investigating and making claims about network community structure, and it is important to take this into account to obtain reliable downstream conclusions. Depending on which scenario holds, one may or may not be able to successfully identify “good” communities in a given network (and good communities might not even exist for a given community quality measure), the manner in which different small communities fit together to form meso-scale network structures can be very different, and processes such as viral propagation and information diffusion can exhibit very different dynamics. In addition, our results suggest that, for many large realistic networks, the output of locally-biased methods that focus on communities that are centered around a given seed node might have better conceptual grounding and greater practical utility than the output of global community-detection methods. They also illustrate subtler structural properties that are important to consider in the development of better benchmark networks to test methods for community detection.
set of signals (usually time series) at each of a collection of pixels (in two dimensions) or voxels (in three dimensions). Building from such data, various forms of higher-level data representations are employed in neuroimaging. Traditionally, two-and three-dimensional images have, naturally, been the norm, but increasingly in recent years there has emerged a substantial interest in network-based representations.1.1. Motivation. Let G = (V, E) denote a graph, based on d = |V | vertices. In this setting, the vertices v ∈ V correspond to regions of interest (ROIs) in the brain, often pre-defined through considerations of the underlying neurobiology (e.g., the putamen or the cuneus). Edges {u, v} ∈ E between vertices u and v are used to denote a measure of association between the corresponding ROIs. Depending on the imaging modality used, the notion of 'association' may vary. For example, in diffusion tensor imaging (DTI), associations are taken to be representative of structural connectivity between brain regions. On the other hand, in functional magnetic resonance imaging (fMRI), associations are instead thought to represent functional connectivity, in the sense that the two regions of the brain participate together in the achievement of some higher-order function, often in the context of performing some task (e.g., counting from 1 to 10).With neuroimaging now a standard tool in clinical neuroscience, and with the advent of several major neuroscience research initiatives -perhaps most prominent being the recently announced Brain Research Accelerated by Innovative Neurotechnologies (BRAIN) initiative -we are quickly moving towards a time in which we will have available databases composed of large collections of secondary data in the form of network-based data objects. Faced with databases in which networks are a fundamental unit of data, it will be necessary to have in place the statistical tools to answer such questions as, "What is the 'average' of a collection of networks?" and "Do these networks differ, on average, from a given nominal network?," as well as "Do two collections of networks differ on average?" and "What factors (e.g., age, gender, etc.) appear to contribute to differences in networks?", or finally, say, "Has there been a change in the networks for a given subpopulation from yesterday to today?" In order to answer these and similar questions, we require network-based analogues of classical tools for statistical estimation and hypothesis testing.While these classical tools are among the most fundamental and ubiquitous in use in practice, their extension to network-based datasets, however, is not immediate and, in fact, can be expected to be highly non-trivial. The main challenge in such an extension is due to the simple fact that networks 2 are not Euclidean objects (for which classical methods were developed)rather, they are combinatorial objects, defined simply through their sets of vertices and edges. Nevertheless, our work here in this paper demonstrates that networks can be associated with ce...
No abstract
The modified Bessel function of the first kind, I ν (x), arises in numerous areas of study, such as physics, signal processing, probability, statistics, etc. As such, there has been much interest in recent years in deducing properties of functionals involving I ν (x), in particular, of the ratio I ν+1 (x)/I ν (x), when ν, x ≥ 0. In this paper we establish sharp upper and lower bounds on H(ν, x) = ∞ k=1 I ν+k (x)/I ν (x) for ν, x ≥ 0 that appears as the complementary cumulative hazard function for a Skellam(λ, λ) probability distribution in the statistical analysis of networks. Our technique relies on bounding existing estimates of I ν+1 (x)/I ν (x) from above and below by quantities with nicer algebraic properties, namely exponentials, to better evaluate the sum, while optimizing their rates in the regime when ν + 1 ≤ x in order to maintain their precision. We demonstrate the relevance of our results through applications, providing an improvement for the well-known asymptotic exp(−x)I ν (x) ∼ 1/ √ 2πx as x → ∞, upper and lower bounding P [W = ν] for W ∼ Skellam(λ 1 , λ 2 ), and deriving a novel concentration inequality on the Skellam(λ, λ) probability distribution from above and below.
Our work in this paper is inspired by a statistical observation that is both elementary and broadly relevant to network analysis in practice -that the uncertainty in approximating some true network graph G = (V, E) by some estimated graph Ĝ = (V, Ê) manifests as errors in the status of (non)edges that must necessarily propagate to any estimates of network summaries η(G) we seek. Motivated by the common practice of using plug-in estimates η( Ĝ) as proxies for η(G), our focus is on the problem of characterizing the distribution of the discrepancy D = η( Ĝ) − η(G), in the case where η(•) is a subgraph count. Specifically, we study the fundamental case where the statistic of interest is |E|, the number of edges in G. Our primary contribution in this paper is to show that in the empirically relevant setting of large graphs with low-rate measurement errors, the distribution of D E = | Ê| − |E| is well-characterized by a Skellam distribution, when the errors are independent or weakly dependent. Under an assumption of independent errors, we are able to further show conditions under which this characterization is strictly better than that of an appropriate normal distribution. These results derive from our formulation of a general result, quantifying the accuracy with which the difference of two sums of dependent Bernoulli random variables may be approximated by the difference of two independent Poisson random variables, i.e., by a Skellam distribution. This general result is developed through the use of Stein's method, and may be of some general interest. We finish with a discussion of possible extension of our work to subgraph counts η(G) of higher order.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.