To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network-including physics, chemistry, molecular biology, and medicine-information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.clustering ͉ compression ͉ information theory ͉ map of science ͉ bibiometrics B iological and social systems are differentiated, multipartite, integrated, and dynamic. Data about these systems, now available on unprecedented scales, often are schematized as networks. Such abstractions are powerful (1, 2), but even as abstractions they remain highly complex. It therefore is helpful to decompose the myriad nodes and links into modules that represent the network (3-5). A cogent representation will retain the important information about the network and reflect the fact that interactions between the elements in complex systems are weighted, directional, interdependent, and conductive. Good representations both simplify and highlight the underlying structures and the relationships that they depict; they are maps (6, 7).To create a good map, the cartographer must attain a fine balance between omitting important structures by oversimplification and obscuring significant relationships in a barrage of superfluous detail. The best maps convey a great deal of information but require minimal bandwidth: the best maps are also good compressions. By adopting an information-theoretic approach, we can measure how efficiently a map represents the underlying geography, and we can measure how much detail is lost in the process of simplification, which allows us to quantify and resolve the cartographer's tradeoff. Network Maps and Coding TheoryIn this article, we use maps to describe the dynamics across the links and nodes in directed, weighted networks that represent the local interactions among the subunits of a system. These local interactions induce a system-wide flow of information that characterizes the behavior of the full system (8-12). Consequently, if we want to understand how network structure relates to system behavior, we need to understand the flow of information on the network. We therefore identify the modules that compose the network by finding an efficiently coarse-grained description of how information flows on the ne...
Many real-world networks are so large that we must simplify their structure before we can extract useful information about the systems they represent. As the tools for doing these simplifications proliferate within the network literature, researchers would benefit from some guidelines about which of the so-called community detection algorithms are most appropriate for the structures they are studying and the questions they are asking. Here we show that different methods highlight different aspects of a network's structure and that the the sort of information that we seek to extract about the system must guide us in our decision. For example, many community detection algorithms, including the popular modularity maximization approach, infer module assignments from an underlying model of the network formation process. However, we are not always as interested in how a system's network structure was formed, as we are in how a network's extant structure influences the system's behavior. To see how structure influences current behavior, we will recognize that links in a network induce movement across the network and result in systemwide interdependence. In doing so, we explicitly acknowledge that most networks carry flow. To highlight and simplify the network structure with respect to this flow, we use the map equation. We present an intuitive derivation of this flow-based and information-theoretic method and provide an interactive on-line application that anyone can use to explore the mechanics of the map equation. The differences between the map equation and the modularity maximization approach are not merely conceptual. Because the map equation attends to patterns of flow on the network and the modularity maximization approach does not, the two methods can yield dramatically different results for some network structures. To illustrate this and build our understanding of each method, we partition several sample networks. We also describe an algorithm and provide source code to efficiently decompose large weighted and directed networks based on the map equation.
Carbon dioxide (CO 2 ) evasion from streams and rivers to the atmosphere represents a substantial flux in the global carbon cycle 1-3 . The proportions of CO 2 emitted from streams and rivers that come from terrestrially derived CO 2 or from CO 2 produced within freshwater ecosystems through aquatic metabolism are not well quantified. Here we estimated CO 2 emissions from running waters in the contiguous United States, based on freshwater chemical and physical characteristics and modelled gas transfer velocities at 1463 United States Geological Survey monitoring sites. We then assessed CO 2 production from aquatic metabolism, compiled from previously published measurements of net ecosystem production from 187 streams and rivers across the contiguous United States. We find that CO 2 produced by aquatic metabolism contributes about 28% of CO 2 evasion from streams and rivers with flows between 0.0001 and 19,000 m 3 s −1 . We mathematically modelled CO 2 flux from groundwater into running waters along a stream-river continuum to evaluate the relationship between stream size and CO 2 source. Terrestrially derived CO 2 dominates emissions from small streams, and the percentage of CO 2 emissions from aquatic metabolism increases with stream size. We suggest that the relative role of rivers as conduits for terrestrial CO 2 e ux and as reactors mineralizing terrestrial organic carbon is a function of their size and connectivity with landscapes.Inland waters play a central role in the global carbon (C) cycle by transforming, outgassing and storing more than half of the C they receive from terrestrial ecosystems before delivery to oceans 1-3 . Terrestrial C inputs to freshwaters are often of similar magnitude to terrestrial net ecosystem production (NEP; refs 1,2,4). Consequently, ignoring inland waters in landscape C budgets may overestimate terrestrial CO 2 uptake and storage 1,5 . In fact, not accounting for terrestrial C exports to and emissions from freshwaters could bias terrestrial NEP and net ecosystem exchange measurements by 4-60% (refs 6-8). Despite small areal coverage, running waters are hotspots for CO 2 emissions 3,9 , with high rates of outgassing relative to lake and terrestrial ecosystems on an areal basis 3,10,11 . Given their significant role in landscape C transformations, transport and emissions, there is a fundamental need to understand rates and drivers of C cycling in running waters.A mechanistic understanding of the processes regulating CO 2 emissions from streams and rivers is necessary for sound predictions of the present and future role of freshwaters in global C cycling and the climate system. Inland waters are often supersaturated with CO 2 due to inputs of terrestrially derived CO 2 and in situ aquatic mineralization of terrestrial OC (refs 12-15) (hereafter, 'internal production') as well as abiotic CO 2 production (Supplementary Section 1). CO 2 concentrations and emissions from running waters will thus vary with changes in land cover, climate, terrestrial ecosystem processes, land-water c...
To understand the structure of a large-scale biological, social, or technological network, it can be helpful to decompose the network into smaller subunits or modules. In this article, we develop an information-theoretic foundation for the concept of modularity in networks. We identify the modules of which the network is composed by finding an optimal compression of its topology, capitalizing on regularities in its structure. We explain the advantages of this approach and illustrate them by partitioning a number of real-world and model networks.clustering ͉ compression ͉ information theory M any objects in nature, from proteins to humans, interact in groups that compose social (1), technological (2), or biological systems (3). The groups form a distinct intermediate level between the microscopic and macroscopic descriptions of the system, and group structure may often be coupled to aspects of system function including robustness (3) and stability (4). When we map the interactions among components of a complex system to a network with nodes connected by links, these groups of interacting objects form highly connected modules that are only weakly connected to one other. We can therefore comprehend the structure of a dauntingly complex network by identifying the modules or communities of which it is composed (5-10). When we describe a network as a set of interconnected modules, we are highlighting certain regularities of the network's structure while filtering out the relatively unimportant details. Thus, a modular description of a network can be viewed as a lossy compression of that network's topology, and the problem of community identification as a problem of finding an efficient compression of the structure.This view suggests that we can approach the challenge of identifying the community structure of a complex network as a fundamental problem in information theory (11-13). We provide the groundwork for an information-theoretic approach to community detection and explore the advantages of this approach relative to other methods for community detection. Fig. 1 illustrates our basic framework for identifying communities. We envision the process of describing a complex network by a simplified summary of its module structure as a communication process. The link structure of a complex network is a random variable X; a signaler knows the full form of the network X and aims to convey much of this information in a reduced fashion to a signal receiver. To do so, the signaler encodes information about X as some simplified description Y. She sends the encoded message through a noiseless communication channel. The signal receiver observes the message Y and then ''decodes'' this message, using it to make guesses Z about the structure of the original network X.There are many different ways to describe a network X by a simpler description Y. Which of these is best? The answer to this question of course depends on what you want to do with the description. Nonetheless, information theory offers an appealing general answer to th...
Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking and spreading analysis, although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and although we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.