Aim To produce a spatial clustering of Europe on the basis of species occurrence data for the land mammal fauna. Location Europe defined by the following boundaries: 11°W, 32°E, 71°N, 35°N. Methods Presence/absence records of mammal species collected by the Societas Europaea Mammalogica with a resolution of 50 × 50 km were used in the analysis. After pre‐processing, the data provide information on 124 species in 2183 grid cells. The data were clustered using the k‐means and probabilistic expectation maximization (EM) clustering algorithms. The resulting geographical pattern of clusters was compared against climate variables and against an environmental stratification of Europe based on climate, geomorphology and soil characteristics (EnS). Results The mammalian presence/absence data divide naturally into clusters, which are highly connected spatially and most strongly determined by the small mammals with the highest grid cell incidence. The clusters reflect major physiographic and environmental features and differ significantly in the values of basic climate variables. The geographical pattern is a fair match for the EnS stratification and is robust between non‐overlapping subsets of the data, such as trophic groups. Main conclusions The pattern of clusters is regarded as reflecting the spatial expression of biologically distinct, metacommunity‐like entities influenced by deterministic forces ultimately related to the physical environment. Small mammals give the most spatially coherent clusters of any subgroup, while large mammals show stronger relationships to climate variables. The spatial pattern is mainly due to small mammals with high grid cell incidence and is robust to noise from other subsets. The results support the use of spatially resolved environmental reconstructions based on fossil mammal data, especially when based on species with the highest incidence.
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards potentially interesting subsets of attributes that have some other type of dependency structure.We consider the problem of finding all subsets of attributes that have low complexity. The complexity is measured by either the entropy of the projection of the data on the subset, or the entropy of the data for the subset when modeled using a Bayesian tree, with downward or upward pointing edges. We show that the entropy measure on sets has a monotonicity property, and thus a levelwise approach can find all low-entropy itemsets. We also show that the treebased measures are bounded above by the entropy of the corresponding itemset, allowing similar algorithms to be used for finding low-entropy trees. We describe algorithms for finding all subsets satisfying an entropy condition. We give an extensive empirical evaluation of the performance of the methods both on synthetic and on real data. We also discuss the search for high-entropy subsets and the computation of the Vapnik-Chervonenkis dimension of the data.
The NERC and CEH trademarks and logos ('the Trademarks') are registered trademarks of NERC in the UK and other countries, and may not be used without the prior written consent of the Trademark owner. Please cite as: Heikinheimo, H., Eronen, J.T., Sennikov, A., Preston, C.D., Uotila, P., Mannila, H., Fortelius, M. 2012. Converge in distribution patterns of Europe's plants and mammals is due to environmental forcing. Journal of Biogeography 39, 1633-1644
No abstract
We consider the problem of relating itemsets mined on binary attributes of a data set to numerical attributes of the same data. An example is biogeographical data, where the numerical attributes correspond to environmental variables and the binary attributes encode the presence or absence of species in different environments. From the viewpoint of itemset mining, the task is to select a small collection of interesting itemsets using the numerical attributes; from the viewpoint of the numerical attributes, the task is to constrain the search for local patterns (e.g. clusters) using the binary attributes. We give a formal definition of the problem, discuss it theoretically, give a simple constant-factor approximation algorithm, and show by experiments on biogeographical data that the algorithm can capture interesting patterns that would not have been found using either itemset mining or clustering alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.