The changing world of scholarly communication and the emerging new wave of ‘Open Science’ or ‘Open Research’ has brought to light a number of controversial and hotly debated topics. Evidence-based rational debate is regularly drowned out by misinformed or exaggerated rhetoric, which does not benefit the evolving system of scholarly communication. This article aims to provide a baseline evidence framework for ten of the most contested topics, in order to help frame and move forward discussions, practices, and policies. We address issues around preprints and scooping, the practice of copyright transfer, the function of peer review, predatory publishers, and the legitimacy of ‘global’ databases. These arguments and data will be a powerful tool against misinformation across wider academic research, policy and practice, and will inform changes within the rapidly evolving scholarly publishing system.
Many modern network datasets arise from processes of interactions in a population, such as phone calls, email exchanges, co-authorships, and professional collaborations. In such interaction networks, the edges comprise the fundamental statistical units, making a framework for edge-labeled networks more appropriate for statistical analysis. In this context we initiate the study of edge exchangeable network models and explore its basic statistical properties. Several theoretical and practical features make edge exchangeable models better suited to many applications in network analysis than more common vertex-centric approaches. In particular, edge exchangeable models allow for sparse structure and power law degree distributions, both of which are widely observed empirical properties that cannot be handled naturally by more conventional approaches. Our discussion culminates in the Hollywood model, which we identify here as the canonical family of edge exchangeable distributions. The Hollywood model is computationally tractable, admits a clear interpretation, exhibits good theoretical properties, and performs reasonably well in estimation and prediction as we demonstrate on real network datasets. As a generalization of the Hollywood model, we further identify the vertex components model as a nonparametric subclass of models with a convenient stick breaking construction.
We study a family of Markov processes on P (k) , the space of partitions of the natural numbers with at most k blocks. The process can be constructed from a Poisson point process onν , where ν is the distribution of the paintbox based on the probability measure ν on P m , the set of ranked-mass partitions of 1, and (k) ν is the product measure on. We show that these processes possess a unique stationary measure, and we discuss a particular set of reversible processes for which transition probabilities can be written down explicitly.
Exchangeable models for countable vertex-labeled graphs cannot replicate the large sample behaviors of sparsity and power law degree distribution observed in many network datasets. Out of this mathematical impossibility emerges the question of how network data can be modeled in a way that reflects known empirical behaviors and respects basic statistical principles. We address this question by observing that edges, not vertices, act as the statistical units in networks constructed from interaction data, making a theory of edge-labeled networks more natural for many applications. In this context we introduce the concept of edge exchangeability, which unlike its vertex exchangeable counterpart admits models for networks with sparse and/or power law structure. Our characterization of edge exchangeable networks gives rise to a class of nonparametric models, akin to graphon models in the vertex exchangeable setting. Within this class, we identify a tractable family of distributions with a clear interpretation and suitable theoretical properties, whose significance in estimation, prediction, and testing we demonstrate.Is there a notion of probabilistic symmetry whose ergodic measures [...] describe useful statistical models for sparse graphs with network properties? (Orbanz and Roy, 2015, p. 459) We address this question as part of our broader development of edge exchangeable network models, which are most appropriate when the edges are the statistical units, as they are for the interaction processes we study. We define this setup more precisely in Sections 2-3 and go on to establish basic properties of edge exchangeable models throughout Sections 4-8. Interaction dataDefinition 2.1 (Interaction data). For a set P, we write fin(P) to denote the set of all finite (ordered) multisets of P. An interaction process for a population P is a correspondence I : I → fin(P) between a set I indexing interactions and finite multisets of P. P :#P =#P {I : S → fin(P ) : ρI = I for some bijection ρ : P → P }.The equivalence class E I corresponds to an edge-labeled network structure, as in Figure 1(c). Note that E I does not depend on the specific set P, and so we may disregard P, or treat it implicitly as P = N, in our discussion. We write E S for the set of networks with edges labeled in S ⊆ N.For any S ⊆ S, we define the restriction of E ∈ E S to E S by E| S , the edgelabeled network obtained by removing any edges labeled in S \ S . If E = E I for some interaction process I : S → fin(P), then E| S is the edge-labeled network induced by the restricted process I| S : S → fin(P), s → I(s).Remark 2.2. For clarity we reserve the term graph to specifically refer to a vertexlabeled structure, such as the objects given in (1), (2), and Figure 1(b). We use the term network for the generic unlabeled structure in Figure 1(a) and edge-labeled network for the object defined in (5) and shown in Figure 1(c).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.