Modern conservation operates at the nexus of biological and social influences. While the importance of social and cultural factors are often mentioned, defining, measuring and comparing them remains a significant challenge. Here, we explore a novel method to quantify cultural interest in all extant reptile species using Wikipedia-a large, open-access online encyclopaedia. We analysed all page views of reptile species viewed during 2014 in all of Wikipedia's language editions. We compared species' page view numbers across languages and in relationship to their spatial distribution, phylogeny, threat status and various other biological attributes. We found that while the top three species with respect to page views are shared across major language editions, beyond these, page view ranks of species tend to be specific to particular language editions. Interest within a language is mostly focused on reptiles found in the regions where the language is spoken. Overall, interest is greater for reptiles that are venomous, endangered, widely distributed, larger sized and that have been described earlier. However, within individual families not all the above factors predict page views. Most families contain at least one species in the top 5% of page views, but 29 families (with 1450 species) have no 'high interest species' in them. Overall, our analyses elucidate novel patterns of human interests in nature over large geographical, cultural and taxonomic spectra using big-data techniques. Such approaches hold much promise for incorporating social perceptions in future conservation practices.
A major challenge in designing neural network (NN) systems is to determine the best structure and parameters for the network given the data for the machine learning problem at hand. Examples of parameters are the number of layers and nodes, the learning rates, and the dropout rates. Typically, these parameters are chosen based on heuristic rules and manually fine-tuned, which may be very time-consuming, because evaluating the performance of a single parametrization of the NN may require several hours. This paper addresses the problem of choosing appropriate parameters for the NN by formulating it as a box-constrained mathematical optimization problem, and applying a derivative-free optimization tool that automatically and effectively searches the parameter space. The optimization tool employs a radial basis function model of the objective function (the prediction accuracy of the NN) to accelerate the discovery of configurations yielding high accuracy. Candidate configurations explored by the algorithm are trained to a small number of epochs, and only the most promising candidates receive full training. The performance of the proposed methodology is assessed on benchmark sets and in the context of predicting drug-drug interactions, showing promising results. The optimization tool used in this paper is open-source.
Semantic Web systems provide open interfaces for end-users to access data via a powerful high-level query language, SPARQL. But users unfamiliar with either the details of SPARQL or properties of the target dataset may find it easier to query by example-give examples of the information they want (or examples of both what they want and what they do not want) and let the system reverse engineer the desired query from the examples. This approach has been heavily used in the setting of relational databases. We provide here an investigation of the reverse engineering problem in the context of SPARQL. We first provide a theoretical study, formalising variants of the reverse engineering problem and giving tight bounds on its complexity. We next explain an implementation of a reverse engineering tool for positive examples. An experimental analysis of the tool shows that it scales well in the data size, number of examples, and in the size of the smallest query that fits the data. We also give evidence that reverse engineering tools can provide benefits on real-life datasets. * The authors would like to thank Michael Benedikt for many fruitful discussions about the results presented in this paper. M. Arenas was funded by Millennium Nucleus Center for Semantic Web Research under Grant NC120004, and G. I. Diaz by Becas Chile of CONICYT Chile. Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author's site if the Material is used in electronic media.
Although RDF graph data often come with an associated schema, recent studies have proven that real RDF data rarely conform to their perceived schemas. Since a number of data management decisions, including storage layouts, indexing, and efficient query processing, use schemas to guide the decision making, it is imperative to have an accurate description of the structuredness of the data at hand (how well the data conform to the schema).In this paper, we have approached the study of the structuredness of an RDF graph in a principled way: we propose a framework for specifying structuredness functions, which gauge the degree to which an RDF graph conforms to a schema. In particular, we first define a formal language for specifying structuredness functions with expressions we call rules. This language allows a user to state a rule to which an RDF graph may fully or partially conform. Then we consider the issue of discovering a refinement of a sort (type) by partitioning the dataset into subsets whose structuredness is over a specified threshold. In particular, we prove that the natural decision problem associated to this refinement problem is NP-complete, and we provide a natural translation of this problem into Integer Linear Programming (ILP). Finally, we test this ILP solution with three real world datasets and three different and intuitive rules, which gauge the structuredness in different ways. We show that the rules give meaningful refinements of the datasets, showing that our language can be a powerful tool for understanding the structure of RDF data, and we show that the ILP solution is practical for a large fraction of existing data.
We study the definability problem for first-order logic, denoted by FO-D ef . The input of FO-D ef is a relational database instance I and a relation R ; the question to answer is whether there exists a first-order query Q (or, equivalently, a relational algebra expression Q ) such that Q evaluated on I gives R as an answer. Although the study of FO-D ef dates back to 1978, when the decidability of this problem was shown, the exact complexity of FO-D ef remains as a fundamental open problem. In this article, we provide a polynomial-time algorithm for solving FO-D ef that uses calls to a graph-isomorphism subroutine (or oracle). As a consequence, the first-order definability problem is found to be complete for the class GI of all problems that are polynomial-time Turing reducible to the graph isomorphism problem, thus closing the open question about the exact complexity of this problem. The technique used is also applied to a generalized version of the problem that accepts a finite set of relation pairs, and whose exact complexity was also open; this version is also found to be GI -complete.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.