Collective classification can significantly improve accuracy by exploiting relationships among instances. Although several collective inference procedures have been reported, they have not been thoroughly evaluated for their commonalities and differences. We introduce novel generalizations of three existing algorithms that allow such algorithmic and empirical comparisons. Our generalizations permit us to examine how cautiously or aggressively each algorithm exploits intermediate relational data, which can be noisy. We conjecture that cautious approaches that identify and preferentially exploit the more reliable intermediate data should outperform aggressive approaches. We explain why caution is useful and introduce three parameters to control the degree of caution. An empirical evaluation of collective classification algorithms, using two base classifiers on three data sets, supports our conjecture.
Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation-for the nodes, links, and features-can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.The majority of research in machine learning assumes independently and identically distributed data. This independence assumption is often violated in relational data, which encode dependencies among data instances. For instance, people are often linked by business associations, and information about one person can be highly informative for a prediction task involving an associate of that person. More generally, relational data can be described as a set of nodes, which can be connected by one or more types of relations (or "links"). Relational information is seemingly ubiquitous; it is present in domains such as the Internet and the worldwide web (Faloutsos et al.
No abstract
Abstract. The Semantic Web's need for machine understandable content has led researchers to attempt to automatically acquire such content from a number of sources, including the web. To date, such research has focused on "document-driven" systems that individually process a small set of documents, annotating each with respect to a given ontology. This paper introduces OntoSyphon, an alternative that strives to more fully leverage existing ontological content while scaling to extract comparatively shallow content from millions of documents. OntoSyphon operates in an "ontology-driven" manner: taking any ontology as input, OntoSyphon uses the ontology to specify web searches that identify possible semantic instances, relations, and taxonomic information. Redundancy in the web, together with information from the ontology, is then used to automatically verify these candidate instances and relations, enabling OntoSyphon to operate in a fully automated, unsupervised manner. A prototype of OntoSyphon is fully implemented and we present experimental results that demonstrate substantial instance learning in a variety of domains based on independently constructed ontologies. We also introduce new methods for improving instance verification, and demonstrate that they improve upon previously known techniques.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.